import numpy as np
a=np.random.randint(0,200,100)#rand int array
b1=np.random.randint(0,100,50)
b2=b1**3
c=[]
I have a problem I think should be easy but can't find solution, I want to find the matching values in two arrays, then use the indices of one of these to find values in another array
我有一个问题我认为应该很容易但找不到解决方案,我想在两个数组中找到匹配的值,然后使用其中一个的索引来查找另一个数组中的值
for i in range(len(a)):
for j in range(len(b1)):
if b1[j]==a[i]:
c.append(b2[j])
c=np.asarray(c)
Clearly the above method does work, but it's very slow, and this is just an example, in the work I'm actually do a,b1,b2 are all over 10,000 elements.
显然上面的方法确实有效,但它很慢,这只是一个例子,在我实际做的工作中,b1,b2都是超过10,000个元素。
Any faster solutions?
更快的解决方案?
2 个解决方案
#1
6
np.in1d(b1, a)
returns a boolean array indicating whether each element of b1
is found in a
.
np.in1d(b1,a)返回一个布尔数组,指示是否在a中找到b1的每个元素。
If you wanted to get the values in b2
which corresponded to the indices of common values in a
and b1
, you could use the boolean array to index b2
:
如果你想得到b2中与a和b1中常见值的索引相对应的值,你可以使用布尔数组来索引b2:
b2[np.in1d(b1, a)]
Using this function should be a lot faster as the for
loops are pushed down to the level of NumPy's internal routines.
使用此函数应该快得多,因为for循环被下推到NumPy内部例程的级别。
#2
1
You can use numpy.intersect1d
to get the intersection between 1d arrays.Note that when you can find the intersection then you don't need the indices or use them to find themselves again!!!
您可以使用numpy.intersect1d获取1d数组之间的交集。请注意,当您可以找到交集时,您不需要索引或使用它们再次找到自己!
>>> a=np.random.randint(0,200,100)
>>> b1=np.random.randint(0,100,50)
>>>
>>> np.intersect1d(b1,a)
array([ 3, 9, 17, 19, 22, 23, 37, 53, 55, 58, 67, 85, 93, 94])
You may note that using intersection
is a more efficient way as for a[np.in1d(a, b1)]
in addition of calling in1d
function python is forced to do an extra indexing,for better understanding see the following benchmark :
您可能会注意到使用交集是一种更有效的方式,因为[np.in1d(a,b1)]除了调用in1d函数之外,python还需要进行额外的索引,为了更好地理解,请参阅以下基准:
import numpy as np
s1="""
import numpy as np
a=np.random.randint(0,200,100)
b1=np.random.randint(0,100,50)
np.intersect1d(b1,a)
"""
s2="""
import numpy as np
a=np.random.randint(0,200,100)
b1=np.random.randint(0,100,50)
a[np.in1d(a, b1)]
"""
print ' first: ' ,timeit(stmt=s1, number=100000)
print 'second : ',timeit(stmt=s2, number=100000)
Result:
first: 3.69082999229
second : 7.77609300613
#1
6
np.in1d(b1, a)
returns a boolean array indicating whether each element of b1
is found in a
.
np.in1d(b1,a)返回一个布尔数组,指示是否在a中找到b1的每个元素。
If you wanted to get the values in b2
which corresponded to the indices of common values in a
and b1
, you could use the boolean array to index b2
:
如果你想得到b2中与a和b1中常见值的索引相对应的值,你可以使用布尔数组来索引b2:
b2[np.in1d(b1, a)]
Using this function should be a lot faster as the for
loops are pushed down to the level of NumPy's internal routines.
使用此函数应该快得多,因为for循环被下推到NumPy内部例程的级别。
#2
1
You can use numpy.intersect1d
to get the intersection between 1d arrays.Note that when you can find the intersection then you don't need the indices or use them to find themselves again!!!
您可以使用numpy.intersect1d获取1d数组之间的交集。请注意,当您可以找到交集时,您不需要索引或使用它们再次找到自己!
>>> a=np.random.randint(0,200,100)
>>> b1=np.random.randint(0,100,50)
>>>
>>> np.intersect1d(b1,a)
array([ 3, 9, 17, 19, 22, 23, 37, 53, 55, 58, 67, 85, 93, 94])
You may note that using intersection
is a more efficient way as for a[np.in1d(a, b1)]
in addition of calling in1d
function python is forced to do an extra indexing,for better understanding see the following benchmark :
您可能会注意到使用交集是一种更有效的方式,因为[np.in1d(a,b1)]除了调用in1d函数之外,python还需要进行额外的索引,为了更好地理解,请参阅以下基准:
import numpy as np
s1="""
import numpy as np
a=np.random.randint(0,200,100)
b1=np.random.randint(0,100,50)
np.intersect1d(b1,a)
"""
s2="""
import numpy as np
a=np.random.randint(0,200,100)
b1=np.random.randint(0,100,50)
a[np.in1d(a, b1)]
"""
print ' first: ' ,timeit(stmt=s1, number=100000)
print 'second : ',timeit(stmt=s2, number=100000)
Result:
first: 3.69082999229
second : 7.77609300613