如何使用R在距离列表(X/Y)中选择/查找坐标?

I have a data frame with list of X/Y locations (>2000 rows). What I want is to select or find all the rows/locations based on a max distance. For example, from the data frame select all the locations that are between 1-100 km from each other. Any suggestions on how to do this?

我有一个带有X/Y位置列表的数据框(>2000行)。我想要的是根据最大距离选择或查找所有行/位置。例如，从数据帧中选择距离彼此1-100公里的所有位置。有什么建议吗?

2 个解决方案

#1

You need to somehow determine the distance between each pair of rows. The simplest way is with a corresponding distance matrix

你需要确定每对行之间的距离。最简单的方法是用相应的距离矩阵。

# Assuming Thresh is your threshold
thresh <- 10

# create some sample data
set.seed(123)
DT <- data.table(X=sample(-10:10, 5, TRUE), Y=sample(-10:10, 5, TRUE))

# create the disance matrix
distTable <- matrix(apply(createTable(DT), 1, distance), nrow=nrow(DT))

# remove the lower.triangle since we have symmetry (we don't want duplicates)
distTable[lower.tri(distTable)] <- NA

# Show which rows are above the threshold
pairedRows <- which(distTable >= thresh, arr.ind=TRUE)
colnames(pairedRows) <- c("RowA", "RowB")  # clean up the names

Starting with:

开始:

> DT
    X   Y
1: -4 -10
2:  6   1
3: -2   8
4:  8   1
5:  9  -1

We get:

我们得到:

> pairedRows
     RowA RowB
[1,]    1    2
[2,]    1    3
[3,]    2    3
[4,]    1    4
[5,]    3    4
[6,]    1    5
[7,]    3    5

These are the two functions used for creating the distance matrix

这是用来创建距离矩阵的两个函数。

# pair-up all of the rows
createTable <- function(DT)   
  expand.grid(apply(DT, 1, list), apply(DT, 1, list))

# simple cartesian/pythagorean distance 
distance <- function(CoordPair)
  sqrt(sum((CoordPair[[2]][[1]] - CoordPair[[1]][[1]])^2, na.rm=FALSE))

#2

I'm not entirely clear from your question, but assuming you mean you want to take each row of coordinates and find all the other rows whose coordinates fall within a certain distance:

我并不是很清楚你的问题，但假设你是说你想要取每一排的坐标，然后找出所有的坐标落在一定距离内的其他行:

# Create data set for example

set.seed(42)
x <- sample(-100:100, 10)
set.seed(456)
y <- sample(-100:100, 10)

coords <- data.frame(
  "x" = x,
  "y" = y)

# Loop through all rows

lapply(1:nrow(coords), function(i) {
  dis <- sqrt(
    (coords[i,"x"] - coords[, "x"])^2 + # insert your preferred 
    (coords[i,"y"] - coords[, "y"])^2   # distance calculation here
  ) 
  names(dis) <- 1:nrow(coords)          # replace this part with an index or 
                                        # row names if you have them
  dis[dis > 0 & dis <= 100]             # change numbers to preferred threshold
})

[[1]]
2        6        7        9       10 
25.31798 95.01579 40.01250 30.87070 73.75636 

[[2]]
1         6         7         9        10 
25.317978 89.022469 51.107729  9.486833 60.539243 

[[3]]
5        6        8 
70.71068 91.78780 94.86833 

[[4]]
5       10 
40.16217 99.32774 

[[5]]
3        4        6       10 
70.71068 40.16217 93.40771 82.49242 

[[6]]
1        2        3        5        7        8        9       10 
95.01579 89.02247 91.78780 93.40771 64.53681 75.66373 97.08244 34.92850 

[[7]]
1        2        6        9       10 
40.01250 51.10773 64.53681 60.41523 57.55867 

[[8]]
3        6 
94.86833 75.66373 

[[9]]
1         2         6         7        10 
30.870698  9.486833 97.082439 60.415230 67.119297 

[[10]]
1        2        4        5        6        7        9 
73.75636 60.53924 99.32774 82.49242 34.92850 57.55867 67.11930

#1