I have two NumPy arrays:
我有两个NumPy数组:
A = asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = asarray(['2', '4', '8', '16', '32'])
I want a function that takes A, B
as parameters, and returns the index in B
for each value in A
, aligned with A
, as efficiently as possible.
我想要一个以a、B为参数的函数,并尽可能高效地返回a中每个值的B的索引,与a对齐。
These are the outputs for the test case above:
这些是上面测试用例的输出:
indices = [1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]
I've tried exploring in1d()
, where()
, and nonzero()
with no luck. Any help is much appreciated.
我尝试过探索in1d()、where()和nonzero(),但运气不好。非常感谢您的帮助。
Edit: Arrays are strings.
编辑:数组是字符串。
5 个解决方案
#1
0
I'm not sure how efficient this is but it works:
我不确定这有多有效,但它确实有效:
import numpy as np
A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = np.asarray(['2', '4', '8', '16', '32'])
idx_of_a_in_b=np.argmax(A[np.newaxis,:]==B[:,np.newaxis],axis=0)
print(idx_of_a_in_b)
from which I get:
我得到的:
[1 1 0 2 2 2 2 2 3 4 3 3 4]
#2
3
You can also do:
你也可以做的事:
>>> np.digitize(A,B)-1
array([1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4])
According to the docs you should be able to specify right=False
and skip the minus one part. This does not work for me, likely due to a version issue as I do not have numpy 1.7.
根据文档,您应该能够指定right=False,并跳过- 1部分。这对我不适用,可能是由于版本问题,因为我没有numpy 1.7。
Im not sure what you are doing with this, but a simple and very fast way to do this is:
我不知道你用这个做什么,但是一个简单而快速的方法是:
>>> A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
>>> B,indices=np.unique(A,return_inverse=True)
>>> B
array(['16', '2', '32', '4', '8'],
dtype='|S2')
>>> indices
array([3, 3, 1, 4, 4, 4, 4, 4, 0, 2, 0, 0, 2])
>>> B[indices]
array(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'],
dtype='|S2')
The order will be different, but this can be changed if needed.
订单会有所不同,但是如果需要的话可以更改。
#3
1
For such things it is important to have lookups in B
as fast as possible. Dictionary provides O(1)
lookup time. So, first of all, let us construct this dictionary:
对于这样的事情,尽可能快地在B中进行查找是很重要的。字典提供O(1)查找时间。所以,首先,让我们来构造这个字典:
>>> indices = dict((value,index) for index,value in enumerate(B))
>>> indices
{8: 2, 16: 3, 2: 0, 4: 1, 32: 4}
And then just go through A
and find corresponding indices:
然后通过A,找到相应的指标
>>> [indices[item] for item in A]
[1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]
#4
1
I think you can do it with np.searchsorted
:
我认为你可以用np.searchsort:
>>> A = asarray([4, 4, 2, 8, 8, 8, 8, 8, 16, 32, 16, 16, 32])
>>> B = asarray([2, 8, 4, 32, 16])
>>> sort_b = np.argsort(B)
>>> idx_of_a_in_sorted_b = np.searchsorted(B, A, sorter=sort_b)
>>> idx_of_a_in_b = np.take(sort_b, idx_of_a_in_sorted_b)
>>> idx_of_a_in_b
array([2, 2, 0, 1, 1, 1, 1, 1, 4, 3, 4, 4, 3], dtype=int64)
Note that B
is scrambled from your version, thus the different output. If some of the items in A
are not in B
(which you could check with np.all(np.in1d(A, B))
) then the return indices for those values will be crap, and you may even get an IndexError
from the last line (if the largest value in A
is missing from B
).
注意,B是从您的版本中打乱的,因此输出不同。如果A中的一些项不在B中(你可以用np。in1d(A, B)),那么这些值的返回索引将是垃圾,您甚至可能从最后一行得到一个IndexError(如果A中最大的值在B中丢失)。
#5
1
The numpy_indexed package (disclaimer: I am its author) implements a solution along the same lines as Jaime's solution; but with a nice interface, tests, and a lot of related useful functionality:
numpy_index包(声明:我是它的作者)实现了与Jaime的解决方案相同的解决方案;但是有一个很好的界面,测试,和许多相关的有用功能:
import numpy_indexed as npi
print(npi.indices(B, A))
#1
0
I'm not sure how efficient this is but it works:
我不确定这有多有效,但它确实有效:
import numpy as np
A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = np.asarray(['2', '4', '8', '16', '32'])
idx_of_a_in_b=np.argmax(A[np.newaxis,:]==B[:,np.newaxis],axis=0)
print(idx_of_a_in_b)
from which I get:
我得到的:
[1 1 0 2 2 2 2 2 3 4 3 3 4]
#2
3
You can also do:
你也可以做的事:
>>> np.digitize(A,B)-1
array([1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4])
According to the docs you should be able to specify right=False
and skip the minus one part. This does not work for me, likely due to a version issue as I do not have numpy 1.7.
根据文档,您应该能够指定right=False,并跳过- 1部分。这对我不适用,可能是由于版本问题,因为我没有numpy 1.7。
Im not sure what you are doing with this, but a simple and very fast way to do this is:
我不知道你用这个做什么,但是一个简单而快速的方法是:
>>> A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
>>> B,indices=np.unique(A,return_inverse=True)
>>> B
array(['16', '2', '32', '4', '8'],
dtype='|S2')
>>> indices
array([3, 3, 1, 4, 4, 4, 4, 4, 0, 2, 0, 0, 2])
>>> B[indices]
array(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'],
dtype='|S2')
The order will be different, but this can be changed if needed.
订单会有所不同,但是如果需要的话可以更改。
#3
1
For such things it is important to have lookups in B
as fast as possible. Dictionary provides O(1)
lookup time. So, first of all, let us construct this dictionary:
对于这样的事情,尽可能快地在B中进行查找是很重要的。字典提供O(1)查找时间。所以,首先,让我们来构造这个字典:
>>> indices = dict((value,index) for index,value in enumerate(B))
>>> indices
{8: 2, 16: 3, 2: 0, 4: 1, 32: 4}
And then just go through A
and find corresponding indices:
然后通过A,找到相应的指标
>>> [indices[item] for item in A]
[1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]
#4
1
I think you can do it with np.searchsorted
:
我认为你可以用np.searchsort:
>>> A = asarray([4, 4, 2, 8, 8, 8, 8, 8, 16, 32, 16, 16, 32])
>>> B = asarray([2, 8, 4, 32, 16])
>>> sort_b = np.argsort(B)
>>> idx_of_a_in_sorted_b = np.searchsorted(B, A, sorter=sort_b)
>>> idx_of_a_in_b = np.take(sort_b, idx_of_a_in_sorted_b)
>>> idx_of_a_in_b
array([2, 2, 0, 1, 1, 1, 1, 1, 4, 3, 4, 4, 3], dtype=int64)
Note that B
is scrambled from your version, thus the different output. If some of the items in A
are not in B
(which you could check with np.all(np.in1d(A, B))
) then the return indices for those values will be crap, and you may even get an IndexError
from the last line (if the largest value in A
is missing from B
).
注意,B是从您的版本中打乱的,因此输出不同。如果A中的一些项不在B中(你可以用np。in1d(A, B)),那么这些值的返回索引将是垃圾,您甚至可能从最后一行得到一个IndexError(如果A中最大的值在B中丢失)。
#5
1
The numpy_indexed package (disclaimer: I am its author) implements a solution along the same lines as Jaime's solution; but with a nice interface, tests, and a lot of related useful functionality:
numpy_index包(声明:我是它的作者)实现了与Jaime的解决方案相同的解决方案;但是有一个很好的界面,测试,和许多相关的有用功能:
import numpy_indexed as npi
print(npi.indices(B, A))