获取数组B中的NumPy数组索引,以获取数组A中的唯一值,以及两个数组中的值,并与数组A对齐

时间:2021-12-20 12:48:32

I have two NumPy arrays:

我有两个NumPy数组:

A = asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = asarray(['2', '4', '8', '16', '32'])

I want a function that takes A, B as parameters, and returns the index in B for each value in A, aligned with A, as efficiently as possible.

我想要一个以a、B为参数的函数,并尽可能高效地返回a中每个值的B的索引,与a对齐。

These are the outputs for the test case above:

这些是上面测试用例的输出:

indices = [1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]

I've tried exploring in1d(), where(), and nonzero() with no luck. Any help is much appreciated.

我尝试过探索in1d()、where()和nonzero(),但运气不好。非常感谢您的帮助。

Edit: Arrays are strings.

编辑:数组是字符串。

5 个解决方案

#1


0  

I'm not sure how efficient this is but it works:

我不确定这有多有效,但它确实有效:

import numpy as np
A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = np.asarray(['2', '4', '8', '16', '32'])
idx_of_a_in_b=np.argmax(A[np.newaxis,:]==B[:,np.newaxis],axis=0)
print(idx_of_a_in_b)

from which I get:

我得到的:

[1 1 0 2 2 2 2 2 3 4 3 3 4]

#2


3  

You can also do:

你也可以做的事:

>>> np.digitize(A,B)-1
array([1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4])

According to the docs you should be able to specify right=False and skip the minus one part. This does not work for me, likely due to a version issue as I do not have numpy 1.7.

根据文档,您应该能够指定right=False,并跳过- 1部分。这对我不适用,可能是由于版本问题,因为我没有numpy 1.7。

Im not sure what you are doing with this, but a simple and very fast way to do this is:

我不知道你用这个做什么,但是一个简单而快速的方法是:

>>> A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
>>> B,indices=np.unique(A,return_inverse=True)
>>> B
array(['16', '2', '32', '4', '8'],
      dtype='|S2')
>>> indices
array([3, 3, 1, 4, 4, 4, 4, 4, 0, 2, 0, 0, 2])

>>> B[indices]
array(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'],
      dtype='|S2')

The order will be different, but this can be changed if needed.

订单会有所不同,但是如果需要的话可以更改。

#3


1  

For such things it is important to have lookups in B as fast as possible. Dictionary provides O(1) lookup time. So, first of all, let us construct this dictionary:

对于这样的事情,尽可能快地在B中进行查找是很重要的。字典提供O(1)查找时间。所以,首先,让我们来构造这个字典:

>>> indices = dict((value,index) for index,value in enumerate(B))
>>> indices
{8: 2, 16: 3, 2: 0, 4: 1, 32: 4}

And then just go through A and find corresponding indices:

然后通过A,找到相应的指标

>>> [indices[item] for item in A]
[1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]

#4


1  

I think you can do it with np.searchsorted:

我认为你可以用np.searchsort:

>>> A = asarray([4, 4, 2, 8, 8, 8, 8, 8, 16, 32, 16, 16, 32])
>>> B = asarray([2, 8, 4, 32, 16])
>>> sort_b = np.argsort(B)
>>> idx_of_a_in_sorted_b = np.searchsorted(B, A, sorter=sort_b)
>>> idx_of_a_in_b = np.take(sort_b, idx_of_a_in_sorted_b)
>>> idx_of_a_in_b
array([2, 2, 0, 1, 1, 1, 1, 1, 4, 3, 4, 4, 3], dtype=int64)

Note that B is scrambled from your version, thus the different output. If some of the items in A are not in B (which you could check with np.all(np.in1d(A, B))) then the return indices for those values will be crap, and you may even get an IndexError from the last line (if the largest value in A is missing from B).

注意,B是从您的版本中打乱的,因此输出不同。如果A中的一些项不在B中(你可以用np。in1d(A, B)),那么这些值的返回索引将是垃圾,您甚至可能从最后一行得到一个IndexError(如果A中最大的值在B中丢失)。

#5


1  

The numpy_indexed package (disclaimer: I am its author) implements a solution along the same lines as Jaime's solution; but with a nice interface, tests, and a lot of related useful functionality:

numpy_index包(声明:我是它的作者)实现了与Jaime的解决方案相同的解决方案;但是有一个很好的界面,测试,和许多相关的有用功能:

import numpy_indexed as npi
print(npi.indices(B, A))

#1


0  

I'm not sure how efficient this is but it works:

我不确定这有多有效,但它确实有效:

import numpy as np
A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = np.asarray(['2', '4', '8', '16', '32'])
idx_of_a_in_b=np.argmax(A[np.newaxis,:]==B[:,np.newaxis],axis=0)
print(idx_of_a_in_b)

from which I get:

我得到的:

[1 1 0 2 2 2 2 2 3 4 3 3 4]

#2


3  

You can also do:

你也可以做的事:

>>> np.digitize(A,B)-1
array([1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4])

According to the docs you should be able to specify right=False and skip the minus one part. This does not work for me, likely due to a version issue as I do not have numpy 1.7.

根据文档,您应该能够指定right=False,并跳过- 1部分。这对我不适用,可能是由于版本问题,因为我没有numpy 1.7。

Im not sure what you are doing with this, but a simple and very fast way to do this is:

我不知道你用这个做什么,但是一个简单而快速的方法是:

>>> A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
>>> B,indices=np.unique(A,return_inverse=True)
>>> B
array(['16', '2', '32', '4', '8'],
      dtype='|S2')
>>> indices
array([3, 3, 1, 4, 4, 4, 4, 4, 0, 2, 0, 0, 2])

>>> B[indices]
array(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'],
      dtype='|S2')

The order will be different, but this can be changed if needed.

订单会有所不同,但是如果需要的话可以更改。

#3


1  

For such things it is important to have lookups in B as fast as possible. Dictionary provides O(1) lookup time. So, first of all, let us construct this dictionary:

对于这样的事情,尽可能快地在B中进行查找是很重要的。字典提供O(1)查找时间。所以,首先,让我们来构造这个字典:

>>> indices = dict((value,index) for index,value in enumerate(B))
>>> indices
{8: 2, 16: 3, 2: 0, 4: 1, 32: 4}

And then just go through A and find corresponding indices:

然后通过A,找到相应的指标

>>> [indices[item] for item in A]
[1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]

#4


1  

I think you can do it with np.searchsorted:

我认为你可以用np.searchsort:

>>> A = asarray([4, 4, 2, 8, 8, 8, 8, 8, 16, 32, 16, 16, 32])
>>> B = asarray([2, 8, 4, 32, 16])
>>> sort_b = np.argsort(B)
>>> idx_of_a_in_sorted_b = np.searchsorted(B, A, sorter=sort_b)
>>> idx_of_a_in_b = np.take(sort_b, idx_of_a_in_sorted_b)
>>> idx_of_a_in_b
array([2, 2, 0, 1, 1, 1, 1, 1, 4, 3, 4, 4, 3], dtype=int64)

Note that B is scrambled from your version, thus the different output. If some of the items in A are not in B (which you could check with np.all(np.in1d(A, B))) then the return indices for those values will be crap, and you may even get an IndexError from the last line (if the largest value in A is missing from B).

注意,B是从您的版本中打乱的,因此输出不同。如果A中的一些项不在B中(你可以用np。in1d(A, B)),那么这些值的返回索引将是垃圾,您甚至可能从最后一行得到一个IndexError(如果A中最大的值在B中丢失)。

#5


1  

The numpy_indexed package (disclaimer: I am its author) implements a solution along the same lines as Jaime's solution; but with a nice interface, tests, and a lot of related useful functionality:

numpy_index包(声明:我是它的作者)实现了与Jaime的解决方案相同的解决方案;但是有一个很好的界面,测试,和许多相关的有用功能:

import numpy_indexed as npi
print(npi.indices(B, A))