For obvious reasons I have two numpy arrays of different size one with an index column along with x y z coordinates and the other just containing the coordinates. (please ignore the first serial no., I can't figure out the formatting.) The second array has less no. of coordinates and I need the indexes (atomID) of those coordinates from the first array.
由于明显的原因,我有两个不同大小的numpy数组,一个是索引列,另一个是x y z坐标,另一个只是包含坐标。(请忽略第一个序列号。,我搞不懂格式。第二个数组的no更少。我需要来自第一个数组的坐标的索引(atomID)。
Array1 (with index column):
Array1(索引列):
serialNo. moleculeID atomID x y z
- 1 1 2 0 7.7590151 7.2925348 12.5933323
- 1 12 2 0 7.7590151 7.2925348 12.5933323
- 2 1 2 0 7.123642 6.1970949 11.5622416
- 2 12 2 0 7.123642 6.1970949 11.5622416
- 3 1 6 0 6.944543 7.0390449 12.0713224
- 3 1 6 0 6.944543 7.0390449 12.0713224
- 4 1 2 0 8.8900348 11.5477333 13.5633965
- 4 1 2 0 8.8900348 11.5477333 13.5633965
- 5 1 2 0 7.857268 12.8062735 13.4357052
- 5 12 0 7.857268 12.8062735 13.4357052
- 6 1 6 0 8.2124357 12.1004238 14.0486889
- 6 1 6 0 8.2124357 12.1004238 14.0486889
Array2 (just the coordinates):
Array2(坐标):
x y z
- 7.7590151 7.2925348 12.5933323
- 7.7590151 7.2925348 12.5933323
- 7.123642 6.1970949 11.5622416
- 7.123642 6.1970949 11.5622416
- 6.944543 7.0390449 12.0713224
- 6.944543 7.0390449 12.0713224
- 8.8900348 11.5477333 13.5633965
- 8.8900348 11.5477333 13.5633965
The array with the index column (atomID) has the indexes as 2, 2, 6, 2, 2 and 6. How can I get the indexes for the coordinates that are common in Array1 and Array2. I expect to return 2 2 6 2 as a list and then concatenate it with the second array. Any easy ideas?
具有索引列(atomID)的数组的索引有2、2、6、2、2和6。如何获得Array1和Array2中常见的坐标的索引?我希望返回2 2 6 2作为一个列表,然后将它与第二个数组连接起来。任何简单的想法吗?
Update:
更新:
Tried using the following code, but it doesn't seem to be working.
尝试使用以下代码,但它似乎不工作。
import numpy as np
a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]])
b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]])
print a
print b
for i in range(len(b)):
for j in range(len(a)):
if a[j,1]==b[i,0]:
x = np.insert(b, 0, a[i,0], axis=1) #(input array, position to insert, value to insert, axis)
#continue
else:
print 'not true'
print x
which outputs the following:
输出如下:
not true
not true
not true
not true
not true
not true
not true
not true
not true
[[ 3. 2.2 5. ]
[ 3. -6.3 0. ]
[ 3. 3.6 8. ]]
but expectation was:
但期望是:
[[ 4. 2.2 5. ]
[ 2. -6.3 0. ]
[ 3. 3.6 8. ]]
4 个解决方案
#1
2
The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems in an elegant and efficient/vectorized manner:
numpy_index包(免责声明:我是它的作者)包含以优雅、高效/向量化的方式解决此类问题的功能:
import numpy_indexed as npi
print(a[npi.contains(b, a[:, 1:])])
The currently accepted answer strikes me as being incorrect for points which differ in their latter coordinates. And performance should be much improved here as well; not only is this solution vectorized, but worst case performance is NlogN, as opposed to the quadratic time complexity of the currently accepted answer.
当前被接受的答案在我看来是不正确的,因为它们后面的坐标不同。这里的表现也应该提高很多;这个解决方案不仅是矢量化的,而且最坏的情况是NlogN,而不是当前所接受的二次时间复杂度。
#2
2
Two concise vectorized ways to do it using cdist
-
使用cdist -有两种简洁的矢量化方法
from scipy.spatial.distance import cdist
out = a[np.any(cdist(a[:,1:],b)==0,axis=1)]
Or if you don't mind getting a bit voodoo-ish, here's np.einsum
to replace np.any
-
或者,如果你不介意有一点*,这是np。einsum取代np。任何,
out = a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]
Sample run -
样本运行-
In [15]: from scipy.spatial.distance import cdist
In [16]: a
Out[16]:
array([[ 4. , 2.2, 5. ],
[ 2. , -6.3, 0. ],
[ 3. , 3.6, 8. ],
[ 5. , -9.8, 50. ]])
In [17]: b
Out[17]:
array([[ 2.2, 5. ],
[-6.3, 0. ],
[ 3.6, 8. ]])
In [18]: a[np.any(cdist(a[:,1:],b)==0,axis=1)]
Out[18]:
array([[ 4. , 2.2, 5. ],
[ 2. , -6.3, 0. ],
[ 3. , 3.6, 8. ]])
In [19]: a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]
Out[19]:
array([[ 4. , 2.2, 5. ],
[ 2. , -6.3, 0. ],
[ 3. , 3.6, 8. ]])
#3
1
This is just a pseudo code for your question:
这只是你的问题的伪代码:
import numpy as np
for i in range(len(array2)):
for element in array1:
if array2[i]xyz == elementxyz: #compare the coordinates of the two elements
np.insert(array2[i], 0, element_coord) #insert the atomid at the beginning of the coordinate array
break
#4
0
Using a list instead of array for the values of np.insert
did the trick.
使用列表代替数组作为np值。插入的技巧。
import numpy as np
a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]])
b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]])
print a
print b
x = []
for i in range(len(b)):
for j in range(len(a)):
if a[j,1]==b[i,0]:
x.append(a[j,0])
else:
x = x
print np.insert(b,0,x,axis=1)
which would output:
这将输出:
[[ 4. 2.2 5. ]
[ 2. -6.3 0. ]
[ 3. 3.6 8. ]]
#1
2
The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems in an elegant and efficient/vectorized manner:
numpy_index包(免责声明:我是它的作者)包含以优雅、高效/向量化的方式解决此类问题的功能:
import numpy_indexed as npi
print(a[npi.contains(b, a[:, 1:])])
The currently accepted answer strikes me as being incorrect for points which differ in their latter coordinates. And performance should be much improved here as well; not only is this solution vectorized, but worst case performance is NlogN, as opposed to the quadratic time complexity of the currently accepted answer.
当前被接受的答案在我看来是不正确的,因为它们后面的坐标不同。这里的表现也应该提高很多;这个解决方案不仅是矢量化的,而且最坏的情况是NlogN,而不是当前所接受的二次时间复杂度。
#2
2
Two concise vectorized ways to do it using cdist
-
使用cdist -有两种简洁的矢量化方法
from scipy.spatial.distance import cdist
out = a[np.any(cdist(a[:,1:],b)==0,axis=1)]
Or if you don't mind getting a bit voodoo-ish, here's np.einsum
to replace np.any
-
或者,如果你不介意有一点*,这是np。einsum取代np。任何,
out = a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]
Sample run -
样本运行-
In [15]: from scipy.spatial.distance import cdist
In [16]: a
Out[16]:
array([[ 4. , 2.2, 5. ],
[ 2. , -6.3, 0. ],
[ 3. , 3.6, 8. ],
[ 5. , -9.8, 50. ]])
In [17]: b
Out[17]:
array([[ 2.2, 5. ],
[-6.3, 0. ],
[ 3.6, 8. ]])
In [18]: a[np.any(cdist(a[:,1:],b)==0,axis=1)]
Out[18]:
array([[ 4. , 2.2, 5. ],
[ 2. , -6.3, 0. ],
[ 3. , 3.6, 8. ]])
In [19]: a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]
Out[19]:
array([[ 4. , 2.2, 5. ],
[ 2. , -6.3, 0. ],
[ 3. , 3.6, 8. ]])
#3
1
This is just a pseudo code for your question:
这只是你的问题的伪代码:
import numpy as np
for i in range(len(array2)):
for element in array1:
if array2[i]xyz == elementxyz: #compare the coordinates of the two elements
np.insert(array2[i], 0, element_coord) #insert the atomid at the beginning of the coordinate array
break
#4
0
Using a list instead of array for the values of np.insert
did the trick.
使用列表代替数组作为np值。插入的技巧。
import numpy as np
a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]])
b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]])
print a
print b
x = []
for i in range(len(b)):
for j in range(len(a)):
if a[j,1]==b[i,0]:
x.append(a[j,0])
else:
x = x
print np.insert(b,0,x,axis=1)
which would output:
这将输出:
[[ 4. 2.2 5. ]
[ 2. -6.3 0. ]
[ 3. 3.6 8. ]]