I searched a bit around and found comparable questions/answers, but none of them returned the correct results for me.
我搜索了一下,找到了一些类似的问题/答案,但是没有一个人给我返回了正确的结果。
Situation: I have an array with a number of clumps of values == 1, while the rest of the cells are set to zero. Each cell is a square (width=height). Now I want to calculate the average distance between all 1 values. The formula should be like this: d = sqrt ( (( x2 - x1 )*size)**2 + (( y2 - y1 )*size)**2 )
情境:我有一个数组,数组中有许多块值== 1,而其余的单元格被设置为0。每个单元格都是一个正方形(宽=高)。现在我要计算所有1个值之间的平均距离。公式应该是这样的:d =√(((x2 - x1)*size)* 2 + (y2 - y1)*size)* 2)
Example:
例子:
import numpy as np
from scipy.spatial.distance import pdist
a = np.array([[1, 0, 1],
[0, 0, 0],
[0, 0, 1]])
# Given that each cell is 10m wide/high
val = 10
d = pdist(a, lambda u, v: np.sqrt( ( ((u-v)*val)**2).sum() ) )
d
array([ 14.14213562, 10. , 10. ])
After that I would calculate the average via d.mean()
. However the result in d is obviously wrong as the distance between the cells in the top-row should be 20 already (two crossed cells * 10). Is there something wrong with my formula, math or approach?
之后,我将通过d。mean()计算平均值。但是,d的结果显然是错误的,因为第一行的单元格之间的距离应该已经是20(两个交叉的单元格* 10)。我的公式、数学或方法有什么问题吗?
1 个解决方案
#1
4
You need the actual coordinates of the non-zero markers, to compute the distance between them:
需要非零标记的实际坐标,计算它们之间的距离:
>>> import numpy as np
>>> from scipy.spatial.distance import squareform, pdist
>>> a = np.array([[1, 0, 1],
... [0, 0, 0],
... [0, 0, 1]])
>>> np.where(a)
(array([0, 0, 2]), array([0, 2, 2]))
>>> x,y = np.where(a)
>>> coords = np.vstack((x,y)).T
>>> coords
array([[0, 0], # That's the coordinate of the "1" in the top left,
[0, 2], # top right,
[2, 2]]) # and bottom right.
Next you want to calculate the distance between these points. You use pdist
for this, like so:
接下来你要计算这些点之间的距离。你用pdist来做这个,像这样:
>>> dists = pdist(coords) * 10 # Uses the Euclidean distance metric by default.
>>> squareform(dists)
array([[ 0. , 20. , 28.28427125],
[ 20. , 0. , 20. ],
[ 28.28427125, 20. , 0. ]])
In this last matrix, you will find (above the diagonal), the distance between each marked point in a
and another coordinate. In this case, you had 3 coordinates, so it gives you the distance between node 0 (a[0,0]
) and node 1 (a[0,2]
), node 0 and node 2 (a[2,2]
) and finally between node 1 and node 2. To put it in different words, if S = squareform(dists)
, then S[i,j]
returns the distance between the coordinates on row i
of coords
and row j
.
在最后一个矩阵中,你会发现(在对角线上),每个标记点与另一个坐标之间的距离。在这种情况下,有3个坐标,它给出了节点0 (a[0,0])和节点1 (a[0,2])、节点0和节点2 (a[2,2])之间的距离,以及节点1和节点2之间的距离。换句话说,如果S = squareform(dists),则S[i,j]返回coords第i行坐标与第j行坐标之间的距离。
Just the values in the upper triangle of that last matrix are also present in the variable dist
, from which you can derive the mean easily, without having to perform the relatively expensive calculation of the squareform
(shown here just for demonstration purposes):
最后一个矩阵的上三角的值也存在于变量dist中,你可以很容易地推导出平均值,而不需要对squareform进行相对昂贵的计算(这里只展示了演示的目的):
>>> dists
array([ 20. , 28.2842712, 20. ])
>>> dists.mean()
22.761423749153966
Remark that your computed solution "looks" nearly correct (aside from a factor of 2), because of the example you chose. What pdist
does, is it takes the Euclidean distance between the first point in the n-dimensional space and the second and then between the first and the third and so on. In your example, that means, it computes the distance between a point on row 0: that point has coordinates in 3 dimensional space given by [1,0,1]
. The 2nd point is [0,0,0]
. The Euclidean distance between those two sqrt(2)~1.4
. Then, the distance between the first and the 3rd coordinate (the last row in a
), is only 1
. Finally, the distance between the 2nd coordinate (row 1: [0,0,0]
) and the 3rd (last row, row 2: [0,0,1]
) is also 1
. So remember, pdist
interprets its first argument as a stack of coordinates in n-dimensional space, n
being the number of elements in the tuple of each node.
注意,由于您选择的示例,您计算的解决方案“看起来”几乎是正确的(除了因子2)。pdist做的是,它是在n维空间的第一个点和第一个点之间的欧几里得距离,然后是第一个点和第三个点之间的距离,以此类推。在你的例子中,这意味着,它计算第0行上的点之间的距离:这个点在三维空间中的坐标由[1,0,1]给出。第二个点是[0,0,0]这两个根号(2)~1.4之间的欧几里得距离。那么,第一个坐标和第三个坐标(a中的最后一行)之间的距离只有1。最后,第二坐标(第一行:[0,0])和第三坐标(最后一行:[0,1])之间的距离也是1。记住,pdist把它的第一个参数解释为n维空间中的一堆坐标,n是每个节点元组中元素的数量。
#1
4
You need the actual coordinates of the non-zero markers, to compute the distance between them:
需要非零标记的实际坐标,计算它们之间的距离:
>>> import numpy as np
>>> from scipy.spatial.distance import squareform, pdist
>>> a = np.array([[1, 0, 1],
... [0, 0, 0],
... [0, 0, 1]])
>>> np.where(a)
(array([0, 0, 2]), array([0, 2, 2]))
>>> x,y = np.where(a)
>>> coords = np.vstack((x,y)).T
>>> coords
array([[0, 0], # That's the coordinate of the "1" in the top left,
[0, 2], # top right,
[2, 2]]) # and bottom right.
Next you want to calculate the distance between these points. You use pdist
for this, like so:
接下来你要计算这些点之间的距离。你用pdist来做这个,像这样:
>>> dists = pdist(coords) * 10 # Uses the Euclidean distance metric by default.
>>> squareform(dists)
array([[ 0. , 20. , 28.28427125],
[ 20. , 0. , 20. ],
[ 28.28427125, 20. , 0. ]])
In this last matrix, you will find (above the diagonal), the distance between each marked point in a
and another coordinate. In this case, you had 3 coordinates, so it gives you the distance between node 0 (a[0,0]
) and node 1 (a[0,2]
), node 0 and node 2 (a[2,2]
) and finally between node 1 and node 2. To put it in different words, if S = squareform(dists)
, then S[i,j]
returns the distance between the coordinates on row i
of coords
and row j
.
在最后一个矩阵中,你会发现(在对角线上),每个标记点与另一个坐标之间的距离。在这种情况下,有3个坐标,它给出了节点0 (a[0,0])和节点1 (a[0,2])、节点0和节点2 (a[2,2])之间的距离,以及节点1和节点2之间的距离。换句话说,如果S = squareform(dists),则S[i,j]返回coords第i行坐标与第j行坐标之间的距离。
Just the values in the upper triangle of that last matrix are also present in the variable dist
, from which you can derive the mean easily, without having to perform the relatively expensive calculation of the squareform
(shown here just for demonstration purposes):
最后一个矩阵的上三角的值也存在于变量dist中,你可以很容易地推导出平均值,而不需要对squareform进行相对昂贵的计算(这里只展示了演示的目的):
>>> dists
array([ 20. , 28.2842712, 20. ])
>>> dists.mean()
22.761423749153966
Remark that your computed solution "looks" nearly correct (aside from a factor of 2), because of the example you chose. What pdist
does, is it takes the Euclidean distance between the first point in the n-dimensional space and the second and then between the first and the third and so on. In your example, that means, it computes the distance between a point on row 0: that point has coordinates in 3 dimensional space given by [1,0,1]
. The 2nd point is [0,0,0]
. The Euclidean distance between those two sqrt(2)~1.4
. Then, the distance between the first and the 3rd coordinate (the last row in a
), is only 1
. Finally, the distance between the 2nd coordinate (row 1: [0,0,0]
) and the 3rd (last row, row 2: [0,0,1]
) is also 1
. So remember, pdist
interprets its first argument as a stack of coordinates in n-dimensional space, n
being the number of elements in the tuple of each node.
注意,由于您选择的示例,您计算的解决方案“看起来”几乎是正确的(除了因子2)。pdist做的是,它是在n维空间的第一个点和第一个点之间的欧几里得距离,然后是第一个点和第三个点之间的距离,以此类推。在你的例子中,这意味着,它计算第0行上的点之间的距离:这个点在三维空间中的坐标由[1,0,1]给出。第二个点是[0,0,0]这两个根号(2)~1.4之间的欧几里得距离。那么,第一个坐标和第三个坐标(a中的最后一行)之间的距离只有1。最后,第二坐标(第一行:[0,0])和第三坐标(最后一行:[0,1])之间的距离也是1。记住,pdist把它的第一个参数解释为n维空间中的一堆坐标,n是每个节点元组中元素的数量。