
时间:2022-11-05 21:41:59

For obvious reasons I have two numpy arrays of different size one with an index column along with x y z coordinates and the other just containing the coordinates. (please ignore the first serial no., I can't figure out the formatting.) The second array has less no. of coordinates and I need the indexes (atomID) of those coordinates from the first array.

由于明显的原因,我有两个不同大小的numpy数组,一个是索引列,另一个是x y z坐标,另一个只是包含坐标。(请忽略第一个序列号。,我搞不懂格式。第二个数组的no更少。我需要来自第一个数组的坐标的索引(atomID)。

Array1 (with index column):


    serialNo. moleculeID atomID x y z
  1. 1 1 2 0 7.7590151 7.2925348 12.5933323
  2. 1 12 2 0 7.7590151 7.2925348 12.5933323
  3. 2 1 2 0 7.123642 6.1970949 11.5622416
  4. 2 12 2 0 7.123642 6.1970949 11.5622416
  5. 3 1 6 0 6.944543 7.0390449 12.0713224
  6. 3 1 6 0 6.944543 7.0390449 12.0713224
  7. 4 1 2 0 8.8900348 11.5477333 13.5633965
  8. 4 1 2 0 8.8900348 11.5477333 13.5633965
  9. 5 1 2 0 7.857268 12.8062735 13.4357052
  10. 5 12 0 7.857268 12.8062735 13.4357052
  11. 6 1 6 0 8.2124357 12.1004238 14.0486889
  12. 6 1 6 0 8.2124357 12.1004238 14.0486889

Array2 (just the coordinates):


x          y             z
  1. 7.7590151 7.2925348 12.5933323
  2. 7.7590151 7.2925348 12.5933323
  3. 7.123642 6.1970949 11.5622416
  4. 7.123642 6.1970949 11.5622416
  5. 6.944543 7.0390449 12.0713224
  6. 6.944543 7.0390449 12.0713224
  7. 8.8900348 11.5477333 13.5633965
  8. 8.8900348 11.5477333 13.5633965

The array with the index column (atomID) has the indexes as 2, 2, 6, 2, 2 and 6. How can I get the indexes for the coordinates that are common in Array1 and Array2. I expect to return 2 2 6 2 as a list and then concatenate it with the second array. Any easy ideas?

具有索引列(atomID)的数组的索引有2、2、6、2、2和6。如何获得Array1和Array2中常见的坐标的索引?我希望返回2 2 6 2作为一个列表,然后将它与第二个数组连接起来。任何简单的想法吗?



Tried using the following code, but it doesn't seem to be working.


import numpy as np

a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]])

b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]])

print a
print b

for i in range(len(b)):
 for j in range(len(a)):
    if a[j,1]==b[i,0]:
        x = np.insert(b, 0, a[i,0], axis=1) #(input array, position to insert, value to insert, axis)
        print 'not true'
print x 

which outputs the following:


not true
not true
not true
not true
not true
not true
not true
not true
not true
[[ 3.   2.2  5. ]
 [ 3.  -6.3  0. ]
 [ 3.   3.6  8. ]]

but expectation was:


    [[ 4.   2.2  5. ]
     [ 2.  -6.3  0. ]
     [ 3.   3.6  8. ]]

4 个解决方案



The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems in an elegant and efficient/vectorized manner:


import numpy_indexed as npi
print(a[npi.contains(b, a[:, 1:])])

The currently accepted answer strikes me as being incorrect for points which differ in their latter coordinates. And performance should be much improved here as well; not only is this solution vectorized, but worst case performance is NlogN, as opposed to the quadratic time complexity of the currently accepted answer.




Two concise vectorized ways to do it using cdist -

使用cdist -有两种简洁的矢量化方法

from scipy.spatial.distance import cdist

out = a[np.any(cdist(a[:,1:],b)==0,axis=1)]

Or if you don't mind getting a bit voodoo-ish, here's np.einsum to replace np.any -


out = a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]

Sample run -


In [15]: from scipy.spatial.distance import cdist

In [16]: a
array([[  4. ,   2.2,   5. ],
       [  2. ,  -6.3,   0. ],
       [  3. ,   3.6,   8. ],
       [  5. ,  -9.8,  50. ]])

In [17]: b
array([[ 2.2,  5. ],
       [-6.3,  0. ],
       [ 3.6,  8. ]])

In [18]: a[np.any(cdist(a[:,1:],b)==0,axis=1)]
array([[ 4. ,  2.2,  5. ],
       [ 2. , -6.3,  0. ],
       [ 3. ,  3.6,  8. ]])

In [19]: a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]
array([[ 4. ,  2.2,  5. ],
       [ 2. , -6.3,  0. ],
       [ 3. ,  3.6,  8. ]])



This is just a pseudo code for your question:


import numpy as np
for i in range(len(array2)):
    for element in array1:
        if array2[i]xyz == elementxyz: #compare the coordinates of the two elements
            np.insert(array2[i], 0, element_coord) #insert the atomid at the beginning of the coordinate array



Using a list instead of array for the values of np.insert did the trick.


import numpy as np

a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]])

b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]])

print a
print b
x = []

for i in range(len(b)):
 for j in range(len(a)):
    if a[j,1]==b[i,0]:
        x = x
print np.insert(b,0,x,axis=1)

which would output:


[[ 4.   2.2  5. ]
 [ 2.  -6.3  0. ]
 [ 3.   3.6  8. ]]



The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems in an elegant and efficient/vectorized manner:


import numpy_indexed as npi
print(a[npi.contains(b, a[:, 1:])])

The currently accepted answer strikes me as being incorrect for points which differ in their latter coordinates. And performance should be much improved here as well; not only is this solution vectorized, but worst case performance is NlogN, as opposed to the quadratic time complexity of the currently accepted answer.




Two concise vectorized ways to do it using cdist -

使用cdist -有两种简洁的矢量化方法

from scipy.spatial.distance import cdist

out = a[np.any(cdist(a[:,1:],b)==0,axis=1)]

Or if you don't mind getting a bit voodoo-ish, here's np.einsum to replace np.any -


out = a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]

Sample run -


In [15]: from scipy.spatial.distance import cdist

In [16]: a
array([[  4. ,   2.2,   5. ],
       [  2. ,  -6.3,   0. ],
       [  3. ,   3.6,   8. ],
       [  5. ,  -9.8,  50. ]])

In [17]: b
array([[ 2.2,  5. ],
       [-6.3,  0. ],
       [ 3.6,  8. ]])

In [18]: a[np.any(cdist(a[:,1:],b)==0,axis=1)]
array([[ 4. ,  2.2,  5. ],
       [ 2. , -6.3,  0. ],
       [ 3. ,  3.6,  8. ]])

In [19]: a[np.einsum('ij->i',cdist(a[:,1:],b)==0)]
array([[ 4. ,  2.2,  5. ],
       [ 2. , -6.3,  0. ],
       [ 3. ,  3.6,  8. ]])



This is just a pseudo code for your question:


import numpy as np
for i in range(len(array2)):
    for element in array1:
        if array2[i]xyz == elementxyz: #compare the coordinates of the two elements
            np.insert(array2[i], 0, element_coord) #insert the atomid at the beginning of the coordinate array



Using a list instead of array for the values of np.insert did the trick.


import numpy as np

a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]])

b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]])

print a
print b
x = []

for i in range(len(b)):
 for j in range(len(a)):
    if a[j,1]==b[i,0]:
        x = x
print np.insert(b,0,x,axis=1)

which would output:


[[ 4.   2.2  5. ]
 [ 2.  -6.3  0. ]
 [ 3.   3.6  8. ]]