一组经纬度点之间的最大距离

时间:2022-08-08 16:03:28

I have a set of lng/lat coordinates. What would be an efficient method of calculating the greatest distance between any two points in the set (the "maximum diameter" if you will)?

我有一套lng/lat坐标。计算集合中任意两点之间的最大距离的有效方法是什么?

A naive way is to use Haversine formula to calculate the distance between each 2 points and get the maximum, but this doesn't scale well obviously.

一种简单的方法是用Haversine公式计算每两个点之间的距离,得到最大值,但这显然不是很明显。

Edit: the points are located on a sufficiently small area, measuring the area in which a person carrying a mobile device was active in the course of a single day.

编辑:这些点位于一个足够小的区域,测量一个携带移动设备的人在一天内活动的区域。

4 个解决方案

#1


9  

I think that the following could be a useful approximation, which scales linearly instead of quadratically with the number of points, and is quite easy to implement:

我认为下面这个近似可以作为一个有用的近似,它是线性的,而不是用点的数量来求二次的,而且很容易实现:

  1. calculate the center of mass M of the points
  2. 计算点的质心
  3. find the point P0 that has the maximum distance to M
  4. 找到最大距离为M的点P0。
  5. find the point P1 that has the maximum distance to P0
  6. 求到P0的最大距离的点P1
  7. approximate the maximum diameter with the distance between P0 and P1
  8. 用P0和P1之间的距离估计最大直径

This can be generalized by repeating step 3 N times, and taking the distance between PN-1 and PN

这可以通过重复步骤3n次,取PN-1和PN之间的距离来推广

Step 1 can be carried out efficiently approximating M as the average of longitudes and latitudes, which is OK when distances are "small" and the poles are sufficiently far away. The other steps could be carried out using the exact distance formula, but they are much faster if the points' coordinates can be approximated as lying on a plane. Once the "distant pair" (hopefully the pair with the maximum distance) has been found, its distance can be re-calculated with the exact formula.

第一步可以有效地将M近似为经纬度的平均值,这在距离“小”和极点距离足够远时是可以的。其他的步骤可以用精确的距离公式来完成,但是如果这些点的坐标可以近似地表示为在一个平面上,那么它们会快得多。一旦找到“远方的一对”(希望是距离最大的一对),就可以用精确的公式重新计算它的距离。

An example of approximation could be the following: if φ(M) and λ(M) are latitude and longitude of the center of mass calculated as Σφ(P)/n and Σλ(P)/n,

近似的一个例子可能是以下:如果φ(M)和λ(M)的纬度和经度质心计算Σφ(P)/ n和Σλ/ n(P),

  • x(P) = (λ(P) - λ(M) + C) cos(φ(P))
  • x(P)=(λ(P)-λ(M)+ C)因为(φ(P))
  • y(P) = φ(P) - φ(M) [ this is only for clarity, it can also simply be y(P) = φ(P) ]
  • y(P)=φ(P)-φ(M)(这只是为了清楚起见,也可以只是y(P)=φ(P))

where C is usually 0, but can be ± 360° if the set of points crosses the λ=±180° line. To find the maximum distance you simply have to find

C通常是0,但可以在点集±360°如果穿过λ=±180°线。要找到最大距离,你只需要找到

  • max((x(PN) - x(PN-1))2 + (y(PN) - y(PN-1))2)
  • max((x(PN) - x(PN-1)))2 + (y(PN) - y(PN-1) 2)

(you don't need the square root because it is monotonic)

(你不需要平方根因为它是单调的)

The same coordinate transformation could be used to repeat step 1 (in the new coordinate system) in order to have a better starting point. I suspect that if some conditions are met, the above steps (without repeating step 3) always lead to the "true distant pair" (my terminology). If I only knew which conditions...

同样的坐标变换可以用于重复步骤1(在新的坐标系中),以获得更好的起点。我怀疑,如果满足某些条件,上面的步骤(不重复步骤3)总是会导致“真正的远距离对”(我的术语)。如果我只知道哪些条件…

EDIT:

编辑:

I hate building on others' solutions, but someone will have to.

我讨厌建立在别人的解决方案之上,但总得有人去做。

Still keeping the above 4 steps, with the optional (but probably beneficial, depending on the typical distribution of points) repetition of step 3, and following the solution of Spacedman, doing calculations in 3D overcomes the limitations of closeness and distance from poles:

仍然保持以上4个步骤,可选的(但可能是有益的,取决于点的典型分布)重复步骤3,并遵循Spacedman的解决方案,用3D进行计算,克服了靠近极点和距离的限制:

  • x(P) = sin(φ(P))
  • x(P)=罪(φ(P))
  • y(P) = cos(φ(P)) sin(λ(P))
  • y(P)= cosφ(P))罪(λ(P))
  • z(P) = cos(φ(P)) cos(λ(P))
  • z(P)= cos(φ(P))因为(λ(P))

(the only approximation is that this holds only for a perfect sphere)

(唯一的近似值是它只适用于一个完美的球体)

The center of mass is given by x(M) = Σx(P)/n, etc., and the maximum one has to look for is

质心是由x(M)=Σx(P)/ n,等,和最大的人寻找

  • max((x(PN) - x(PN-1))2 + (y(PN) - y(PN-1))2 + (z(PN) - z(PN-1))2)
  • max((x(PN)- x(PN-1))2 +(y(PN)- y(PN-1))2 +(z(PN)- z(PN-1))2)

So: you first transform spherical to cartesian coordinates, then start from the center of mass, to find, in at least two steps (steps 2 and 3), the farthest point from the preceding point. You could repeat step 3 as long as the distance increases, perhaps with a maximum number of repetitions, but this won't take you away from a local maximum. Starting from the center of mass is not of much help, either, if the points are spread all over the Earth.

你首先把球变换成笛卡尔坐标,然后从质心开始,在至少两个步骤(步骤2和步骤3)中,找到离前一点最远的点。只要距离增加,你可以重复第三步,也许重复的次数最多,但这不会使你偏离局部最大值。从质心开始也没有多少帮助,如果这些点分布在地球上。

EDIT 2:

编辑2:

I learned enough R to write down the core of the algorithm (nice language for data analysis!)

我学了足够多的R,写出了算法的核心(数据分析的好语言!)

For the plane approximation, ignoring the problem around the λ=±180° line:

对于平面近似,忽略了问题在λ=±180°线:

# input: lng, lat (vectors)
rad = pi / 180;
x = (lng - mean(lng)) * cos(lat * rad)
y = (lat - mean(lat))
i = which.max((x - mean(x))^2 + (y       )^2)
j = which.max((x - x[i]   )^2 + (y - y[i])^2)
# output: i, j (indices)

On my PC it takes less than a second to find the indices i and j for 1000000 points.
The following 3D version is a bit slower, but works for any distribution of points (and does not need to be amended when the λ=±180° line is crossed):

在我的电脑上,用不到一秒钟的时间就能找到1000点的索引i和j。下面的3 d版本是有点慢,但适用于任何分布的点(不需要修改当λ=±180°线是交叉):

# input: lng, lat
rad = pi / 180
x = sin(lat * rad)
f = cos(lat * rad)
y = sin(lng * rad) * f
z = cos(lng * rad) * f
i = which.max((x - mean(x))^2 + (y - mean(y))^2 + (z - mean(z))^2)
j = which.max((x - x[i]   )^2 + (y - y[i]   )^2 + (z - z[i]   )^2)
k = which.max((x - x[j]   )^2 + (y - y[j]   )^2 + (z - z[j]   )^2) # optional
# output: j, k (or i, j)

The calculation of k can be left out (i.e., the result could be given by i and j), depending on the data and on the requirements. On the other hand, my experiments have shown that calculating a further index is useless.

k的计算可以省略(即。,结果可以由i和j)给出,具体取决于数据和需求。另一方面,我的实验表明,计算一个进一步的指数是没有用的。

It should be remembered that, in any case, the distance between the resulting points is an estimate which is a lower bound of the "diameter" of the set, although it very often will be the diameter itself (how often depends on the data.)

应该记住,无论如何,结果点之间的距离是一个估计值,它是集合“直径”的下界,尽管它通常是直径本身(取决于数据的频率)。

EDIT 3:

编辑3:

Unfortunately the relative error of the plane approximation can, in extreme cases, be as much as 1-1/√3 ≅ 42.3%, which may be unacceptable, even if very rare. The algorithm can be modified in order to have an upper bound of approximately 20%, which I have derived by compass and straight-edge (the analytic solution is cumbersome). The modified algorithm finds a pair of points whith a locally maximal distance, then repeats the same steps, but this time starting from the midpoint of the first pair, possibly finding a different pair:

不幸的是飞机的相对误差近似,在极端的情况下,尽可能多的1 - 1 /√3≅42.3%,这可能是不可接受的,即使很少见。可以修改算法,使其上界约为20%,这是我用compass和直边导出的(解析解很麻烦)。改进后的算法在局部最大距离的前提下找到一对点,然后重复相同的步骤,但这次是从第一对的中点开始,可能会找到另一对:

# input: lng, lat
rad = pi / 180
x = (lng - mean(lng)) * cos(lat * rad)
y = (lat - mean(lat))
i.n_1 = 1 # n_1: n-1
x.n_1 = mean(x)
y.n_1 = 0 # = mean(y)
s.n_1 = 0 # s: square of distance
repeat {
   s = (x - x.n_1)^2 + (y - y.n_1)^2
   i.n = which.max(s)
   x.n = x[i.n]
   y.n = y[i.n]
   s.n = s[i.n]
   if (s.n <= s.n_1) break
   i.n_1 = i.n
   x.n_1 = x.n
   y.n_1 = y.n
   s.n_1 = s.n
}
i.m_1 = 1
x.m_1 = (x.n + x.n_1) / 2
y.m_1 = (y.n + y.n_1) / 2
s.m_1 = 0
m_ok  = TRUE
repeat {
   s = (x - x.m_1)^2 + (y - y.m_1)^2
   i.m = which.max(s)
   if (i.m == i.n || i.m == i.n_1) { m_ok = FALSE; break }
   x.m = x[i.m]
   y.m = y[i.m]
   s.m = s[i.m]
   if (s.m <= s.m_1) break
   i.m_1 = i.m
   x.m_1 = x.m
   y.m_1 = y.m
   s.m_1 = s.m
}
if (m_ok && s.m > s.n) {
   i = i.m
   j = i.m_1
} else {
   i = i.n
   j = i.n_1
}
# output: i, j

The 3D algorithm can be modified in a similar way. It is possible (both in the 2D and in the 3D case) to start over once again from the midpoint of the second pair of points (if found). The upper bound in this case is "left as an exercise for the reader" :-).

3D算法也可以用类似的方式进行修改。可以(在2D和3D情况下)从第二对点的中点重新开始(如果找到的话)。本例的上限是“留给读者练习”:-)。

Comparison of the modified algorithm with the (too) simple algorithm has shown, for normal and for square uniform distributions, a near doubling of processing time, and a reduction of the average error from .6% to .03% (order of magnitude). A further restart from the midpoint results in an a just slightly better average error, but almost equal maximum error.

将修改后的算法与(过于)简单的算法进行了比较,结果表明,对于正、方均匀分布,处理时间几乎增加了一倍,平均误差从0.6%减少到0.03%(数量级)。进一步从中点重新启动会导致一个稍好一点的平均误差,但几乎等于最大误差。

EDIT 4:

编辑4:

I have to study this article yet, but it looks like the 20% I found with compass and straight-edge is in fact 1-1/√(5-2√3) ≅ 19.3%

我必须学习这篇文章,但它看起来像我发现罗盘和直尺20%实际上是1 - 1 /√√以5比2(3)≅19.3%

#2


11  

Theorem #1: The ordering of any two great circle distances along the surface of the earth is the same as the ordering as the straight line distance between the points where you tunnel through the earth.

定理1:沿地球表面的任意两个大圆距离的顺序,与穿过地球的点之间的直线距离的顺序相同。

Hence turn your lat-long into x,y,z based either on a spherical earth of arbitrary radius or an ellipsoid of given shape parameters. That's a couple of sines/cosines per point (not per pair of points).

因此,将你的latlong转换为x,y,z,基于任意半径的球形地球或给定形状参数的椭圆体。这是每一点的两个sin /cos(不是每对点)

Now you have a standard 3-d problem that doesn't rely on computing Haversine distances. The distance between points is just Euclidean (Pythagoras in 3d). Needs a square-root and some squares, and you can leave out the square root if you only care about comparisons.

现在你有了一个标准的3d问题,它不依赖于计算Haversine距离。点之间的距离就是欧几里得(3d中的毕达哥拉斯)。需要一个平方根和一些平方,如果你只关心比较,你可以省略平方根。

There may be fancy spatial tree data structures to help with this. Or algorithms such as http://www.tcs.fudan.edu.cn/rudolf/Courses/Algorithms/Alg_ss_07w/Webprojects/Qinbo_diameter/2d_alg.htm (click 'Next' for 3d methods). Or C++ code here: http://valis.cs.uiuc.edu/~sariel/papers/00/diameter/diam_prog.html

可能有一些花哨的空间树数据结构来帮助实现这一点。或者像http://www.tcs.fudan.edu.cn/rudolf/Courses/Algorithms/Alg_ss_07w/Webprojects/Qinbo_diameter/2d_alg.htm(点击“Next”获取3d方法)这样的算法。或者这里的c++代码:http://valis.cs.uiuc.edu/~sariel/papers/00/diameter/diam_prog.html

Once you've found your maximum distance pair, you can use the Haversine formula to get the distance along the surface for that pair.

一旦你找到了最大距离对,你就可以使用Haversine公式来得到这对的沿曲面的距离。

#3


3  

Here's a naive example that doesn't scale well (as you say), as you say but might help with building a solution in R.

这里有一个很简单的例子,它不像你说的那样很好地扩展(正如你所说的),但是可能有助于在R中构建一个解决方案。

## lonlat points
n <- 100
d <- cbind(runif(n, -180, 180), runif(n, -90, 90))


library(sp)
## distances on WGS84 ellipsoid
x <- spDists(d, longlat = TRUE)

## row, then column index of furthest points
ind <- c(row(x)[which.max(x)], col(x)[which.max(x)])

## maps
library(maptools)
data(wrld_simpl)
plot(as(wrld_simpl, "SpatialLines"), col = "grey")

points(d, pch = 16, cex = 0.5)

## draw the points and a line between  on the page
points(d[ind, ], pch = 16)
lines(d[ind, ], lwd = 2)


## for extra credit, draw the great circle on which the furthest points lie
library(geosphere)


lines(greatCircle(d[ind[1], ], d[ind[2], ]), col = "firebrick")

一组经纬度点之间的最大距离

The geosphere package provides more options for distance calculation if that's needed. See ?spDists in sp for the details used here.

如果需要,geosphere包提供了更多的距离计算选项。看到了吗?sp中的spDists用于这里的细节。

#4


3  

You don't tell us whether these points will be located in a sufficiently small part of the globe. For truly global sets of points, my first guess would be running a naive O(n^2) algorithm, possibly getting performance boost with some spatial indexing (R*-trees, octal-trees etc.). The idea is to pre-generate an n*(n-1) list of the triangle in the distance matrix and feed it in chunks to a fast distance library to minimize I/O and process churn. Haversine is fine, you could also do it with Vincenty's method (the greatest contributor to running time is quadratic complexity, not the (fixed number of) iterations in Vincenty's formula). As a side note, in fact, you don't need R for this stuff.

你不会告诉我们这些点是否会在地球上足够小的地方。的真正意义上的全球集合点,我第一次想将运行一个天真的O(n ^ 2)算法,可能获得性能提升与一些空间索引(R *—树木,octal-trees等等)。其思想是预生成距离矩阵中三角形的n*(n-1)列表,并将其以块的形式提供给快速距离库,以最小化I/O和进程流变。Haversine还可以,您也可以使用Vincenty的方法(运行时间的最大贡献是二次复杂度,而不是Vincenty公式中的(固定数量的)迭代)。顺便说一句,事实上,这个东西不需要R。

EDIT #2: The Barequet-Har-Peled algorithm (as pointed at by Spacedman in his reply) has O((n+1/(e^3))log(1/e)) complexity for e>0, and is worth exploring.

编辑# 2:Barequet-Har-Peled算法(以Spacedman指着他的回答)有O(n + 1 /(e ^ 3)日志(1 / e)为e > 0的复杂性,值得探索。

For the quasi-planar problem, this is known as "diameter of convex hull" and has three parts:

对于拟平面问题,这被称为“凸壳直径”,有三部分:

  1. Computing convex hull with Graham's scan which is O(n*log(n)) - in fact, one should try transforming points into a transverse Mercator projection (using the centroid of the points in data set).
  2. 计算凸壳与格雷厄姆的扫描,即O(n*log(n)) -事实上,应该尝试将点转化为横切的墨卡托投影(使用数据集中的点的质心)。
  3. Finding antipodal points by Rotating Calipers algorithm - linear O(n).
  4. 用旋转卡尺算法求反足点——线性O(n)。
  5. Finding the largest distance among all antipodal pairs - linear search, O(n).
  6. 求出所有反波对的最大距离——线性搜索,O(n)。

The link with pseudo-code and discussion: http://fredfsh.com/2013/05/03/convex-hull-and-its-diameter/

伪代码链接和讨论:http://fredfsh.com/2013/05/03/convex hull- its-diameter/

See also the discussion on a related question here: https://gis.stackexchange.com/questions/17358/how-can-i-find-the-farthest-point-from-a-set-of-existing-points

在这里还可以看到关于一个相关问题的讨论:https://gis.stackexchange.com/questions/17358/how-can-i-find the far -from-a set- being -points

EDIT: Spacedman's solution pointed me to the Malandain-Boissonnat algorithm (see the paper in pdf here). However, this is worse or the same as the bruteforce naive O(n^2) algorithm.

编辑:Spacedman的解决方案将我指向Malandain-Boissonnat算法(请参阅pdf文件)。然而,这是一样的还是坏bruteforce天真O(n ^ 2)算法。

#1


9  

I think that the following could be a useful approximation, which scales linearly instead of quadratically with the number of points, and is quite easy to implement:

我认为下面这个近似可以作为一个有用的近似,它是线性的,而不是用点的数量来求二次的,而且很容易实现:

  1. calculate the center of mass M of the points
  2. 计算点的质心
  3. find the point P0 that has the maximum distance to M
  4. 找到最大距离为M的点P0。
  5. find the point P1 that has the maximum distance to P0
  6. 求到P0的最大距离的点P1
  7. approximate the maximum diameter with the distance between P0 and P1
  8. 用P0和P1之间的距离估计最大直径

This can be generalized by repeating step 3 N times, and taking the distance between PN-1 and PN

这可以通过重复步骤3n次,取PN-1和PN之间的距离来推广

Step 1 can be carried out efficiently approximating M as the average of longitudes and latitudes, which is OK when distances are "small" and the poles are sufficiently far away. The other steps could be carried out using the exact distance formula, but they are much faster if the points' coordinates can be approximated as lying on a plane. Once the "distant pair" (hopefully the pair with the maximum distance) has been found, its distance can be re-calculated with the exact formula.

第一步可以有效地将M近似为经纬度的平均值,这在距离“小”和极点距离足够远时是可以的。其他的步骤可以用精确的距离公式来完成,但是如果这些点的坐标可以近似地表示为在一个平面上,那么它们会快得多。一旦找到“远方的一对”(希望是距离最大的一对),就可以用精确的公式重新计算它的距离。

An example of approximation could be the following: if φ(M) and λ(M) are latitude and longitude of the center of mass calculated as Σφ(P)/n and Σλ(P)/n,

近似的一个例子可能是以下:如果φ(M)和λ(M)的纬度和经度质心计算Σφ(P)/ n和Σλ/ n(P),

  • x(P) = (λ(P) - λ(M) + C) cos(φ(P))
  • x(P)=(λ(P)-λ(M)+ C)因为(φ(P))
  • y(P) = φ(P) - φ(M) [ this is only for clarity, it can also simply be y(P) = φ(P) ]
  • y(P)=φ(P)-φ(M)(这只是为了清楚起见,也可以只是y(P)=φ(P))

where C is usually 0, but can be ± 360° if the set of points crosses the λ=±180° line. To find the maximum distance you simply have to find

C通常是0,但可以在点集±360°如果穿过λ=±180°线。要找到最大距离,你只需要找到

  • max((x(PN) - x(PN-1))2 + (y(PN) - y(PN-1))2)
  • max((x(PN) - x(PN-1)))2 + (y(PN) - y(PN-1) 2)

(you don't need the square root because it is monotonic)

(你不需要平方根因为它是单调的)

The same coordinate transformation could be used to repeat step 1 (in the new coordinate system) in order to have a better starting point. I suspect that if some conditions are met, the above steps (without repeating step 3) always lead to the "true distant pair" (my terminology). If I only knew which conditions...

同样的坐标变换可以用于重复步骤1(在新的坐标系中),以获得更好的起点。我怀疑,如果满足某些条件,上面的步骤(不重复步骤3)总是会导致“真正的远距离对”(我的术语)。如果我只知道哪些条件…

EDIT:

编辑:

I hate building on others' solutions, but someone will have to.

我讨厌建立在别人的解决方案之上,但总得有人去做。

Still keeping the above 4 steps, with the optional (but probably beneficial, depending on the typical distribution of points) repetition of step 3, and following the solution of Spacedman, doing calculations in 3D overcomes the limitations of closeness and distance from poles:

仍然保持以上4个步骤,可选的(但可能是有益的,取决于点的典型分布)重复步骤3,并遵循Spacedman的解决方案,用3D进行计算,克服了靠近极点和距离的限制:

  • x(P) = sin(φ(P))
  • x(P)=罪(φ(P))
  • y(P) = cos(φ(P)) sin(λ(P))
  • y(P)= cosφ(P))罪(λ(P))
  • z(P) = cos(φ(P)) cos(λ(P))
  • z(P)= cos(φ(P))因为(λ(P))

(the only approximation is that this holds only for a perfect sphere)

(唯一的近似值是它只适用于一个完美的球体)

The center of mass is given by x(M) = Σx(P)/n, etc., and the maximum one has to look for is

质心是由x(M)=Σx(P)/ n,等,和最大的人寻找

  • max((x(PN) - x(PN-1))2 + (y(PN) - y(PN-1))2 + (z(PN) - z(PN-1))2)
  • max((x(PN)- x(PN-1))2 +(y(PN)- y(PN-1))2 +(z(PN)- z(PN-1))2)

So: you first transform spherical to cartesian coordinates, then start from the center of mass, to find, in at least two steps (steps 2 and 3), the farthest point from the preceding point. You could repeat step 3 as long as the distance increases, perhaps with a maximum number of repetitions, but this won't take you away from a local maximum. Starting from the center of mass is not of much help, either, if the points are spread all over the Earth.

你首先把球变换成笛卡尔坐标,然后从质心开始,在至少两个步骤(步骤2和步骤3)中,找到离前一点最远的点。只要距离增加,你可以重复第三步,也许重复的次数最多,但这不会使你偏离局部最大值。从质心开始也没有多少帮助,如果这些点分布在地球上。

EDIT 2:

编辑2:

I learned enough R to write down the core of the algorithm (nice language for data analysis!)

我学了足够多的R,写出了算法的核心(数据分析的好语言!)

For the plane approximation, ignoring the problem around the λ=±180° line:

对于平面近似,忽略了问题在λ=±180°线:

# input: lng, lat (vectors)
rad = pi / 180;
x = (lng - mean(lng)) * cos(lat * rad)
y = (lat - mean(lat))
i = which.max((x - mean(x))^2 + (y       )^2)
j = which.max((x - x[i]   )^2 + (y - y[i])^2)
# output: i, j (indices)

On my PC it takes less than a second to find the indices i and j for 1000000 points.
The following 3D version is a bit slower, but works for any distribution of points (and does not need to be amended when the λ=±180° line is crossed):

在我的电脑上,用不到一秒钟的时间就能找到1000点的索引i和j。下面的3 d版本是有点慢,但适用于任何分布的点(不需要修改当λ=±180°线是交叉):

# input: lng, lat
rad = pi / 180
x = sin(lat * rad)
f = cos(lat * rad)
y = sin(lng * rad) * f
z = cos(lng * rad) * f
i = which.max((x - mean(x))^2 + (y - mean(y))^2 + (z - mean(z))^2)
j = which.max((x - x[i]   )^2 + (y - y[i]   )^2 + (z - z[i]   )^2)
k = which.max((x - x[j]   )^2 + (y - y[j]   )^2 + (z - z[j]   )^2) # optional
# output: j, k (or i, j)

The calculation of k can be left out (i.e., the result could be given by i and j), depending on the data and on the requirements. On the other hand, my experiments have shown that calculating a further index is useless.

k的计算可以省略(即。,结果可以由i和j)给出,具体取决于数据和需求。另一方面,我的实验表明,计算一个进一步的指数是没有用的。

It should be remembered that, in any case, the distance between the resulting points is an estimate which is a lower bound of the "diameter" of the set, although it very often will be the diameter itself (how often depends on the data.)

应该记住,无论如何,结果点之间的距离是一个估计值,它是集合“直径”的下界,尽管它通常是直径本身(取决于数据的频率)。

EDIT 3:

编辑3:

Unfortunately the relative error of the plane approximation can, in extreme cases, be as much as 1-1/√3 ≅ 42.3%, which may be unacceptable, even if very rare. The algorithm can be modified in order to have an upper bound of approximately 20%, which I have derived by compass and straight-edge (the analytic solution is cumbersome). The modified algorithm finds a pair of points whith a locally maximal distance, then repeats the same steps, but this time starting from the midpoint of the first pair, possibly finding a different pair:

不幸的是飞机的相对误差近似,在极端的情况下,尽可能多的1 - 1 /√3≅42.3%,这可能是不可接受的,即使很少见。可以修改算法,使其上界约为20%,这是我用compass和直边导出的(解析解很麻烦)。改进后的算法在局部最大距离的前提下找到一对点,然后重复相同的步骤,但这次是从第一对的中点开始,可能会找到另一对:

# input: lng, lat
rad = pi / 180
x = (lng - mean(lng)) * cos(lat * rad)
y = (lat - mean(lat))
i.n_1 = 1 # n_1: n-1
x.n_1 = mean(x)
y.n_1 = 0 # = mean(y)
s.n_1 = 0 # s: square of distance
repeat {
   s = (x - x.n_1)^2 + (y - y.n_1)^2
   i.n = which.max(s)
   x.n = x[i.n]
   y.n = y[i.n]
   s.n = s[i.n]
   if (s.n <= s.n_1) break
   i.n_1 = i.n
   x.n_1 = x.n
   y.n_1 = y.n
   s.n_1 = s.n
}
i.m_1 = 1
x.m_1 = (x.n + x.n_1) / 2
y.m_1 = (y.n + y.n_1) / 2
s.m_1 = 0
m_ok  = TRUE
repeat {
   s = (x - x.m_1)^2 + (y - y.m_1)^2
   i.m = which.max(s)
   if (i.m == i.n || i.m == i.n_1) { m_ok = FALSE; break }
   x.m = x[i.m]
   y.m = y[i.m]
   s.m = s[i.m]
   if (s.m <= s.m_1) break
   i.m_1 = i.m
   x.m_1 = x.m
   y.m_1 = y.m
   s.m_1 = s.m
}
if (m_ok && s.m > s.n) {
   i = i.m
   j = i.m_1
} else {
   i = i.n
   j = i.n_1
}
# output: i, j

The 3D algorithm can be modified in a similar way. It is possible (both in the 2D and in the 3D case) to start over once again from the midpoint of the second pair of points (if found). The upper bound in this case is "left as an exercise for the reader" :-).

3D算法也可以用类似的方式进行修改。可以(在2D和3D情况下)从第二对点的中点重新开始(如果找到的话)。本例的上限是“留给读者练习”:-)。

Comparison of the modified algorithm with the (too) simple algorithm has shown, for normal and for square uniform distributions, a near doubling of processing time, and a reduction of the average error from .6% to .03% (order of magnitude). A further restart from the midpoint results in an a just slightly better average error, but almost equal maximum error.

将修改后的算法与(过于)简单的算法进行了比较,结果表明,对于正、方均匀分布,处理时间几乎增加了一倍,平均误差从0.6%减少到0.03%(数量级)。进一步从中点重新启动会导致一个稍好一点的平均误差,但几乎等于最大误差。

EDIT 4:

编辑4:

I have to study this article yet, but it looks like the 20% I found with compass and straight-edge is in fact 1-1/√(5-2√3) ≅ 19.3%

我必须学习这篇文章,但它看起来像我发现罗盘和直尺20%实际上是1 - 1 /√√以5比2(3)≅19.3%

#2


11  

Theorem #1: The ordering of any two great circle distances along the surface of the earth is the same as the ordering as the straight line distance between the points where you tunnel through the earth.

定理1:沿地球表面的任意两个大圆距离的顺序,与穿过地球的点之间的直线距离的顺序相同。

Hence turn your lat-long into x,y,z based either on a spherical earth of arbitrary radius or an ellipsoid of given shape parameters. That's a couple of sines/cosines per point (not per pair of points).

因此,将你的latlong转换为x,y,z,基于任意半径的球形地球或给定形状参数的椭圆体。这是每一点的两个sin /cos(不是每对点)

Now you have a standard 3-d problem that doesn't rely on computing Haversine distances. The distance between points is just Euclidean (Pythagoras in 3d). Needs a square-root and some squares, and you can leave out the square root if you only care about comparisons.

现在你有了一个标准的3d问题,它不依赖于计算Haversine距离。点之间的距离就是欧几里得(3d中的毕达哥拉斯)。需要一个平方根和一些平方,如果你只关心比较,你可以省略平方根。

There may be fancy spatial tree data structures to help with this. Or algorithms such as http://www.tcs.fudan.edu.cn/rudolf/Courses/Algorithms/Alg_ss_07w/Webprojects/Qinbo_diameter/2d_alg.htm (click 'Next' for 3d methods). Or C++ code here: http://valis.cs.uiuc.edu/~sariel/papers/00/diameter/diam_prog.html

可能有一些花哨的空间树数据结构来帮助实现这一点。或者像http://www.tcs.fudan.edu.cn/rudolf/Courses/Algorithms/Alg_ss_07w/Webprojects/Qinbo_diameter/2d_alg.htm(点击“Next”获取3d方法)这样的算法。或者这里的c++代码:http://valis.cs.uiuc.edu/~sariel/papers/00/diameter/diam_prog.html

Once you've found your maximum distance pair, you can use the Haversine formula to get the distance along the surface for that pair.

一旦你找到了最大距离对,你就可以使用Haversine公式来得到这对的沿曲面的距离。

#3


3  

Here's a naive example that doesn't scale well (as you say), as you say but might help with building a solution in R.

这里有一个很简单的例子,它不像你说的那样很好地扩展(正如你所说的),但是可能有助于在R中构建一个解决方案。

## lonlat points
n <- 100
d <- cbind(runif(n, -180, 180), runif(n, -90, 90))


library(sp)
## distances on WGS84 ellipsoid
x <- spDists(d, longlat = TRUE)

## row, then column index of furthest points
ind <- c(row(x)[which.max(x)], col(x)[which.max(x)])

## maps
library(maptools)
data(wrld_simpl)
plot(as(wrld_simpl, "SpatialLines"), col = "grey")

points(d, pch = 16, cex = 0.5)

## draw the points and a line between  on the page
points(d[ind, ], pch = 16)
lines(d[ind, ], lwd = 2)


## for extra credit, draw the great circle on which the furthest points lie
library(geosphere)


lines(greatCircle(d[ind[1], ], d[ind[2], ]), col = "firebrick")

一组经纬度点之间的最大距离

The geosphere package provides more options for distance calculation if that's needed. See ?spDists in sp for the details used here.

如果需要,geosphere包提供了更多的距离计算选项。看到了吗?sp中的spDists用于这里的细节。

#4


3  

You don't tell us whether these points will be located in a sufficiently small part of the globe. For truly global sets of points, my first guess would be running a naive O(n^2) algorithm, possibly getting performance boost with some spatial indexing (R*-trees, octal-trees etc.). The idea is to pre-generate an n*(n-1) list of the triangle in the distance matrix and feed it in chunks to a fast distance library to minimize I/O and process churn. Haversine is fine, you could also do it with Vincenty's method (the greatest contributor to running time is quadratic complexity, not the (fixed number of) iterations in Vincenty's formula). As a side note, in fact, you don't need R for this stuff.

你不会告诉我们这些点是否会在地球上足够小的地方。的真正意义上的全球集合点,我第一次想将运行一个天真的O(n ^ 2)算法,可能获得性能提升与一些空间索引(R *—树木,octal-trees等等)。其思想是预生成距离矩阵中三角形的n*(n-1)列表,并将其以块的形式提供给快速距离库,以最小化I/O和进程流变。Haversine还可以,您也可以使用Vincenty的方法(运行时间的最大贡献是二次复杂度,而不是Vincenty公式中的(固定数量的)迭代)。顺便说一句,事实上,这个东西不需要R。

EDIT #2: The Barequet-Har-Peled algorithm (as pointed at by Spacedman in his reply) has O((n+1/(e^3))log(1/e)) complexity for e>0, and is worth exploring.

编辑# 2:Barequet-Har-Peled算法(以Spacedman指着他的回答)有O(n + 1 /(e ^ 3)日志(1 / e)为e > 0的复杂性,值得探索。

For the quasi-planar problem, this is known as "diameter of convex hull" and has three parts:

对于拟平面问题,这被称为“凸壳直径”,有三部分:

  1. Computing convex hull with Graham's scan which is O(n*log(n)) - in fact, one should try transforming points into a transverse Mercator projection (using the centroid of the points in data set).
  2. 计算凸壳与格雷厄姆的扫描,即O(n*log(n)) -事实上,应该尝试将点转化为横切的墨卡托投影(使用数据集中的点的质心)。
  3. Finding antipodal points by Rotating Calipers algorithm - linear O(n).
  4. 用旋转卡尺算法求反足点——线性O(n)。
  5. Finding the largest distance among all antipodal pairs - linear search, O(n).
  6. 求出所有反波对的最大距离——线性搜索,O(n)。

The link with pseudo-code and discussion: http://fredfsh.com/2013/05/03/convex-hull-and-its-diameter/

伪代码链接和讨论:http://fredfsh.com/2013/05/03/convex hull- its-diameter/

See also the discussion on a related question here: https://gis.stackexchange.com/questions/17358/how-can-i-find-the-farthest-point-from-a-set-of-existing-points

在这里还可以看到关于一个相关问题的讨论:https://gis.stackexchange.com/questions/17358/how-can-i-find the far -from-a set- being -points

EDIT: Spacedman's solution pointed me to the Malandain-Boissonnat algorithm (see the paper in pdf here). However, this is worse or the same as the bruteforce naive O(n^2) algorithm.

编辑:Spacedman的解决方案将我指向Malandain-Boissonnat算法(请参阅pdf文件)。然而,这是一样的还是坏bruteforce天真O(n ^ 2)算法。