I have two arrays, a1 and a2. Assume len(a2) >> len(a1)
, and that a1 is a subset of a2.
我有两个数组,a1和a2。假设len(a2) >> len(a1) a1是a2的子集。
I would like a quick way to return the a2 indices of all elements in a1. The time-intensive way to do this is obviously:
我想要一个快速返回a1中所有元素的a2指标的方法。时间密集的方法显然是:
from operator import indexOf
indices = []
for i in a1:
indices.append(indexOf(a2,i))
This of course takes a long time where a2 is large. I could also use numpy.where() instead (although each entry in a1 will appear just once in a2), but I'm not convinced it will be quicker. I could also traverse the large array just once:
这当然要花很长时间,因为a2是大的。我也可以使用numpi .where()代替(尽管a1中的每个条目在a2中只出现一次),但我不认为它会更快。我也可以只遍历一次大数组:
for i in xrange(len(a2)):
if a2[i] in a1:
indices.append(i)
But I'm sure there is a faster, more 'numpy' way - I've looked through the numpy method list, but cannot find anything appropriate.
但我确信有一种更快、更“numpy”的方法——我查看了numpy方法列表,但找不到任何合适的方法。
Many thanks in advance,
非常感谢,
D
D
5 个解决方案
#1
8
How about
如何
numpy.nonzero(numpy.in1d(a2, a1))[0]
This should be fast. From my basic testing, it's about 7 times faster than your second code snippet for len(a2) == 100
, len(a1) == 10000
, and only one common element at index 45. This assumes that both a1
and a2
have no repeating elements.
这应该是快。根据我的基本测试,它比len(a2) = 100、len(a1) = 10000的第二个代码片段快7倍,在索引45中只有一个公共元素。这假设a1和a2都没有重复的元素。
#2
2
how about:
如何:
wanted = set(a1)
indices =[idx for (idx, value) in enumerate(a2) if value in wanted]
This should be O(len(a1)+len(a2)) instead of O(len(a1)*len(a2))
应该是O(len(a1)+len(a2))而不是O(len(a1)*len(a2))
NB I don't know numpy so there may be a more 'numpythonic' way to do it, but this is how I would do it in pure python.
NB,我不知道numpy,所以可能有一种更“numpythonic”的方法,但是这是我在纯python中怎么做的。
#3
1
index = in1d(a2,a1)
result = a2[index]
#4
1
Very similar to @AlokSinghal, but you get an already flattened version.
非常类似于@AlokSinghal,但是您会得到一个已经变平的版本。
numpy.flatnonzero(numpy.in1d(a2, a1))
#5
0
The numpy_indexed package (disclaimer: I am its author) contains a vectorized equivalent of list.index; performance should be similar to the currently accepted answer, but as a bonus, it gives you explicit control over missing values as well, using the 'missing' kwarg.
numpy_indexpackage(免责声明:我是它的作者)包含一个矢量化的等价列表。index;性能应该与当前接受的答案相似,但作为额外的好处,它还可以使用“丢失”kwarg显式地控制丢失的值。
import numpy_indexed as npi
indices = npi.indices(a2, a1, missing='raise')
Also, it will also work on multi-dimensional arrays, ie, finding the indices of one set of rows in another.
此外,它还可以用于多维数组,即在另一个行中查找一组行的索引。
#1
8
How about
如何
numpy.nonzero(numpy.in1d(a2, a1))[0]
This should be fast. From my basic testing, it's about 7 times faster than your second code snippet for len(a2) == 100
, len(a1) == 10000
, and only one common element at index 45. This assumes that both a1
and a2
have no repeating elements.
这应该是快。根据我的基本测试,它比len(a2) = 100、len(a1) = 10000的第二个代码片段快7倍,在索引45中只有一个公共元素。这假设a1和a2都没有重复的元素。
#2
2
how about:
如何:
wanted = set(a1)
indices =[idx for (idx, value) in enumerate(a2) if value in wanted]
This should be O(len(a1)+len(a2)) instead of O(len(a1)*len(a2))
应该是O(len(a1)+len(a2))而不是O(len(a1)*len(a2))
NB I don't know numpy so there may be a more 'numpythonic' way to do it, but this is how I would do it in pure python.
NB,我不知道numpy,所以可能有一种更“numpythonic”的方法,但是这是我在纯python中怎么做的。
#3
1
index = in1d(a2,a1)
result = a2[index]
#4
1
Very similar to @AlokSinghal, but you get an already flattened version.
非常类似于@AlokSinghal,但是您会得到一个已经变平的版本。
numpy.flatnonzero(numpy.in1d(a2, a1))
#5
0
The numpy_indexed package (disclaimer: I am its author) contains a vectorized equivalent of list.index; performance should be similar to the currently accepted answer, but as a bonus, it gives you explicit control over missing values as well, using the 'missing' kwarg.
numpy_indexpackage(免责声明:我是它的作者)包含一个矢量化的等价列表。index;性能应该与当前接受的答案相似,但作为额外的好处,它还可以使用“丢失”kwarg显式地控制丢失的值。
import numpy_indexed as npi
indices = npi.indices(a2, a1, missing='raise')
Also, it will also work on multi-dimensional arrays, ie, finding the indices of one set of rows in another.
此外,它还可以用于多维数组,即在另一个行中查找一组行的索引。