两个不同Numpy阵列中点之间的最小欧几里德距离,而不是在

时间:2021-03-05 15:21:12

I have two arrays of x-y coordinates, and I would like to find the minimum Euclidean distance between each point in one array with all the points in the other array. The arrays are not necessarily the same size. For example:

我有两个x-y坐标数组,我想找到一个数组中每个点与另一个数组中所有点之间的最小欧几里德距离。阵列的大小不一定相同。例如:

xy1=numpy.array(
[[  243,  3173],
[  525,  2997]])

xy2=numpy.array(
[[ 682, 2644],
[ 277, 2651],
[ 396, 2640]])

My current method loops through each coordinate xy in xy1 and calculates the distances between that coordinate and the other coordinates.

我当前的方法循环遍历xy1中的每个坐标xy,并计算该坐标与其他坐标之间的距离。

mindist=numpy.zeros(len(xy1))
minid=numpy.zeros(len(xy1))

for i,xy in enumerate(xy1):
    dists=numpy.sqrt(numpy.sum((xy-xy2)**2,axis=1))
    mindist[i],minid[i]=dists.min(),dists.argmin()

Is there a way to eliminate the for loop and somehow do element-by-element calculations between the two arrays? I envision generating a distance matrix for which I could find the minimum element in each row or column.

有没有办法消除for循环,并以某种方式在两个数组之间进行逐元素计算?我设想生成一个距离矩阵,我可以在其中找到每行或每列中的最小元素。

Another way to look at the problem. Say I concatenate xy1 (length m) and xy2 (length p) into xy (length n), and I store the lengths of the original arrays. Theoretically, I should then be able to generate a n x n distance matrix from those coordinates from which I can grab an m x p submatrix. Is there a way to efficiently generate this submatrix?

另一种看待问题的方法。假设我将xy1(长度m)和xy2(长度p)连接成xy(长度n),并且我存储原始数组的长度。理论上,我应该能够从那些我可以抓取m×p子矩阵的坐标生成n×n距离矩阵。有没有办法有效地生成这个子矩阵?

5 个解决方案

#1


36  

(Months later) scipy.spatial.distance.cdist( X, Y ) gives all pairs of distances, for X and Y 2 dim, 3 dim ...
It also does 22 different norms, detailed here .

(几个月后)scipy.spatial.distance.cdist(X,Y)给出所有距离对,X和Y 2 dim,3 dim ......它还有22种不同的规范,详见此处。

# cdist example: (nx,dim) (ny,dim) -> (nx,ny)

from __future__ import division
import sys
import numpy as np
from scipy.spatial.distance import cdist

#...............................................................................
dim = 10
nx = 1000
ny = 100
metric = "euclidean"
seed = 1

    # change these params in sh or ipython: run this.py dim=3 ...
for arg in sys.argv[1:]:
    exec( arg )
np.random.seed(seed)
np.set_printoptions( 2, threshold=100, edgeitems=10, suppress=True )

title = "%s  dim %d  nx %d  ny %d  metric %s" % (
        __file__, dim, nx, ny, metric )
print "\n", title

#...............................................................................
X = np.random.uniform( 0, 1, size=(nx,dim) )
Y = np.random.uniform( 0, 1, size=(ny,dim) )
dist = cdist( X, Y, metric=metric )  # -> (nx, ny) distances
#...............................................................................

print "scipy.spatial.distance.cdist: X %s Y %s -> %s" % (
        X.shape, Y.shape, dist.shape )
print "dist average %.3g +- %.2g" % (dist.mean(), dist.std())
print "check: dist[0,3] %.3g == cdist( [X[0]], [Y[3]] ) %.3g" % (
        dist[0,3], cdist( [X[0]], [Y[3]] ))


# (trivia: how do pairwise distances between uniform-random points in the unit cube
# depend on the metric ? With the right scaling, not much at all:
# L1 / dim      ~ .33 +- .2/sqrt dim
# L2 / sqrt dim ~ .4 +- .2/sqrt dim
# Lmax / 2      ~ .4 +- .2/sqrt dim

#2


21  

To compute the m by p matrix of distances, this should work:

要计算m乘以距离的矩阵,这应该工作:

>>> def distances(xy1, xy2):
...   d0 = numpy.subtract.outer(xy1[:,0], xy2[:,0])
...   d1 = numpy.subtract.outer(xy1[:,1], xy2[:,1])
...   return numpy.hypot(d0, d1)

the .outer calls make two such matrices (of scalar differences along the two axes), the .hypot calls turns those into a same-shape matrix (of scalar euclidean distances).

.outer调用产生两个这样的矩阵(沿着两个轴的标量差异),。.hypot调用将它们转换成相同形状的矩阵(标量欧氏距离)。

#3


4  

For what you're trying to do:

对于你想要做的事情:

dists = numpy.sqrt((xy1[:, 0, numpy.newaxis] - xy2[:, 0])**2 + (xy1[:, 1, numpy.newaxis - xy2[:, 1])**2)
mindist = numpy.min(dists, axis=1)
minid = numpy.argmin(dists, axis=1)

Edit: Instead of calling sqrt, doing squares, etc., you can use numpy.hypot:

编辑:您可以使用numpy.hypot而不是调用sqrt,做广场等。

dists = numpy.hypot(xy1[:, 0, numpy.newaxis]-xy2[:, 0], xy1[:, 1, numpy.newaxis]-xy2[:, 1])

#4


4  

The accepted answer does not fully address the question, which requests to find the minimum distance between the two sets of points, not the distance between every point in the two sets.

接受的答案并未完全解决这个问题,该问题要求找到两组点之间的最小距离,而不是两组中每个点之间的距离。

Altough a straightforward solution to the original question indeed consists of computing the distance between every pair and susequently finding the minimum one, this is not necessary if one is only interested in the minimum distances. A much faster solution exists for the latter problem.

尽管原始问题的直接解决方案确实包括计算每对之间的距离并随后找到最小值,但如果只对最小距离感兴趣,则不需要这样做。对于后一个问题,存在更快的解决方案。

All the proposed solutions have a running time that scales as m*p = len(xy1)*len(xy2). This is OK for small datasets, but an optimal solution can be written that scales as m*log(p), producing huge savings for large xy2 datasets.

所有提出的解决方案都有一个运行时间,其范围为m * p = len(xy1)* len(xy2)。这对于小型数据集来说是可以的,但是可以编写一个最佳解决方案,可以缩放为m * log(p),从而为大型xy2数据集节省大量成本。

This optimal execution time scaling can be achieved using scipy.spatial.cKDTree as follows

可以使用scipy.spatial.cKDTree实现此最佳执行时间缩放,如下所示

import numpy as np
from scipy import spatial

xy1 = np.array(
    [[243,  3173],
     [525,  2997]])

xy2 = np.array(
    [[682, 2644],
     [277, 2651],
     [396, 2640]])

# This solution is optimal when xy2 is very large
tree = spatial.cKDTree(xy2)
mindist, minid = tree.query(xy1)
print(mindist)

# This solution by @denis is OK for small xy2
mindist = np.min(spatial.distance.cdist(xy1, xy2), axis=1)
print(mindist)

where mindist is the minimum distance between each point in xy1 and the set of points in xy2

mindist是xy1中每个点与xy2中的点集之间的最小距离

#5


2  

import numpy as np
P = np.add.outer(np.sum(xy1**2, axis=1), np.sum(xy2**2, axis=1))
N = np.dot(xy1, xy2.T)
dists = np.sqrt(P - 2*N)

#1


36  

(Months later) scipy.spatial.distance.cdist( X, Y ) gives all pairs of distances, for X and Y 2 dim, 3 dim ...
It also does 22 different norms, detailed here .

(几个月后)scipy.spatial.distance.cdist(X,Y)给出所有距离对,X和Y 2 dim,3 dim ......它还有22种不同的规范,详见此处。

# cdist example: (nx,dim) (ny,dim) -> (nx,ny)

from __future__ import division
import sys
import numpy as np
from scipy.spatial.distance import cdist

#...............................................................................
dim = 10
nx = 1000
ny = 100
metric = "euclidean"
seed = 1

    # change these params in sh or ipython: run this.py dim=3 ...
for arg in sys.argv[1:]:
    exec( arg )
np.random.seed(seed)
np.set_printoptions( 2, threshold=100, edgeitems=10, suppress=True )

title = "%s  dim %d  nx %d  ny %d  metric %s" % (
        __file__, dim, nx, ny, metric )
print "\n", title

#...............................................................................
X = np.random.uniform( 0, 1, size=(nx,dim) )
Y = np.random.uniform( 0, 1, size=(ny,dim) )
dist = cdist( X, Y, metric=metric )  # -> (nx, ny) distances
#...............................................................................

print "scipy.spatial.distance.cdist: X %s Y %s -> %s" % (
        X.shape, Y.shape, dist.shape )
print "dist average %.3g +- %.2g" % (dist.mean(), dist.std())
print "check: dist[0,3] %.3g == cdist( [X[0]], [Y[3]] ) %.3g" % (
        dist[0,3], cdist( [X[0]], [Y[3]] ))


# (trivia: how do pairwise distances between uniform-random points in the unit cube
# depend on the metric ? With the right scaling, not much at all:
# L1 / dim      ~ .33 +- .2/sqrt dim
# L2 / sqrt dim ~ .4 +- .2/sqrt dim
# Lmax / 2      ~ .4 +- .2/sqrt dim

#2


21  

To compute the m by p matrix of distances, this should work:

要计算m乘以距离的矩阵,这应该工作:

>>> def distances(xy1, xy2):
...   d0 = numpy.subtract.outer(xy1[:,0], xy2[:,0])
...   d1 = numpy.subtract.outer(xy1[:,1], xy2[:,1])
...   return numpy.hypot(d0, d1)

the .outer calls make two such matrices (of scalar differences along the two axes), the .hypot calls turns those into a same-shape matrix (of scalar euclidean distances).

.outer调用产生两个这样的矩阵(沿着两个轴的标量差异),。.hypot调用将它们转换成相同形状的矩阵(标量欧氏距离)。

#3


4  

For what you're trying to do:

对于你想要做的事情:

dists = numpy.sqrt((xy1[:, 0, numpy.newaxis] - xy2[:, 0])**2 + (xy1[:, 1, numpy.newaxis - xy2[:, 1])**2)
mindist = numpy.min(dists, axis=1)
minid = numpy.argmin(dists, axis=1)

Edit: Instead of calling sqrt, doing squares, etc., you can use numpy.hypot:

编辑:您可以使用numpy.hypot而不是调用sqrt,做广场等。

dists = numpy.hypot(xy1[:, 0, numpy.newaxis]-xy2[:, 0], xy1[:, 1, numpy.newaxis]-xy2[:, 1])

#4


4  

The accepted answer does not fully address the question, which requests to find the minimum distance between the two sets of points, not the distance between every point in the two sets.

接受的答案并未完全解决这个问题,该问题要求找到两组点之间的最小距离,而不是两组中每个点之间的距离。

Altough a straightforward solution to the original question indeed consists of computing the distance between every pair and susequently finding the minimum one, this is not necessary if one is only interested in the minimum distances. A much faster solution exists for the latter problem.

尽管原始问题的直接解决方案确实包括计算每对之间的距离并随后找到最小值,但如果只对最小距离感兴趣,则不需要这样做。对于后一个问题,存在更快的解决方案。

All the proposed solutions have a running time that scales as m*p = len(xy1)*len(xy2). This is OK for small datasets, but an optimal solution can be written that scales as m*log(p), producing huge savings for large xy2 datasets.

所有提出的解决方案都有一个运行时间,其范围为m * p = len(xy1)* len(xy2)。这对于小型数据集来说是可以的,但是可以编写一个最佳解决方案,可以缩放为m * log(p),从而为大型xy2数据集节省大量成本。

This optimal execution time scaling can be achieved using scipy.spatial.cKDTree as follows

可以使用scipy.spatial.cKDTree实现此最佳执行时间缩放,如下所示

import numpy as np
from scipy import spatial

xy1 = np.array(
    [[243,  3173],
     [525,  2997]])

xy2 = np.array(
    [[682, 2644],
     [277, 2651],
     [396, 2640]])

# This solution is optimal when xy2 is very large
tree = spatial.cKDTree(xy2)
mindist, minid = tree.query(xy1)
print(mindist)

# This solution by @denis is OK for small xy2
mindist = np.min(spatial.distance.cdist(xy1, xy2), axis=1)
print(mindist)

where mindist is the minimum distance between each point in xy1 and the set of points in xy2

mindist是xy1中每个点与xy2中的点集之间的最小距离

#5


2  

import numpy as np
P = np.add.outer(np.sum(xy1**2, axis=1), np.sum(xy2**2, axis=1))
N = np.dot(xy1, xy2.T)
dists = np.sqrt(P - 2*N)