I have an array/set with unique positive integers, i.e.
我有一个具有唯一正整数的数组/集,即
>>> unique = np.unique(np.random.choice(100, 4, replace=False))
And an array containing multiple elements sampled from this previous array, such as
并且包含从前一个数组中采样的多个元素的数组,例如
>>> A = np.random.choice(unique, 100)
I want to map the values of the array A
to the position of which those values occur in unique
.
我想将数组A的值映射到这些值在唯一中出现的位置。
So far the best solution I found is through a mapping array:
到目前为止,我找到的最佳解决方案是通过映射数组:
>>> table = np.zeros(unique.max()+1, unique.dtype)
>>> table[unique] = np.arange(unique.size)
The above assigns to each element the index on the array, and thus, can be used later to map A
through advanced indexing:
上面为每个元素分配了数组上的索引,因此可以在以后用于通过高级索引映射A:
>>> table[A]
array([2, 2, 3, 3, 3, 3, 1, 1, 1, 0, 2, 0, 1, 0, 2, 1, 0, 0, 2, 3, 0, 0, 0,
0, 3, 3, 2, 1, 0, 0, 0, 2, 1, 0, 3, 0, 1, 3, 0, 1, 2, 3, 3, 3, 3, 1,
3, 0, 1, 2, 0, 0, 2, 3, 1, 0, 3, 2, 3, 3, 3, 1, 1, 2, 0, 0, 2, 0, 2,
3, 1, 1, 3, 3, 2, 1, 2, 0, 2, 1, 0, 1, 2, 0, 2, 0, 1, 3, 0, 2, 0, 1,
3, 2, 2, 1, 3, 0, 3, 3], dtype=int32)
Which already gives me the proper solution. However, if the unique numbers in unique
are very sparse and large, this approach implies creating a very large table
array just to store a few numbers for later mapping.
哪个已经给了我正确的解决方案。但是,如果unique中的唯一数字非常稀疏且很大,则此方法意味着创建一个非常大的表数组,只是为了存储一些数字以供以后映射。
Is there any better solution?
有没有更好的解决方案?
NOTE: both A
and unique
are sample arrays, not real arrays. So the question is not how to generate positional indexes, it is just how to efficiently map elements of A
to indexes in unique
, the pseudocode of what I'd like to speedup in numpy is as follows,
注意:A和唯一都是样本数组,而不是实数数组。所以问题不在于如何生成位置索引,它只是如何有效地将A的元素映射到唯一的索引,我想在numpy中加速的伪代码如下,
B = np.zeros_like(A)
for i in range(A.size):
B[i] = unique.index(A[i])
(assuming unique
is a list in the above pseudocode).
(假设unique是上述伪代码中的列表)。
3 个解决方案
#1
4
The table approach described in your question is the best option when unique
if pretty dense, but unique.searchsorted(A)
should produce the same result and doesn't require unique
to be dense. searchsorted
is great with ints, if anyone is trying to do this kind of thing with floats which have precision limitations, consider something like this.
你问题中描述的表格方法是唯一的选择,如果非常密集,但unique.searchsorted(A)应该产生相同的结果,并且不需要唯一的密集。 searchsorted很有用int,如果有人试图用具有精度限制的浮点数做这种事情,请考虑这样的事情。
#2
2
You can use standard python dict
with np.vectorize
您可以将标准python dict与np.vectorize一起使用
inds = {e:i for i, e in enumerate(unique)}
B = np.vectorize(inds.get)(A)
#3
2
The numpy_indexed package (disclaimer: I am its author) contains a vectorized equivalent of list.index, which does not require memory proportional to the max element, but only proportional to the input itself:
numpy_indexed包(免责声明:我是它的作者)包含一个vector.index的向量化等价物,它不需要与max元素成比例的内存,但只与输入本身成比例:
import numpy_indexed as npi
npi.indices(unique, A)
Note that it also works for arbitrary dtypes and dimensions. Also, the array being queried does not need to be unique; the first index encountered will be returned, the same as for list.
请注意,它也适用于任意dtypes和维度。此外,被查询的阵列不需要是唯一的;遇到的第一个索引将被返回,与列表相同。
#1
4
The table approach described in your question is the best option when unique
if pretty dense, but unique.searchsorted(A)
should produce the same result and doesn't require unique
to be dense. searchsorted
is great with ints, if anyone is trying to do this kind of thing with floats which have precision limitations, consider something like this.
你问题中描述的表格方法是唯一的选择,如果非常密集,但unique.searchsorted(A)应该产生相同的结果,并且不需要唯一的密集。 searchsorted很有用int,如果有人试图用具有精度限制的浮点数做这种事情,请考虑这样的事情。
#2
2
You can use standard python dict
with np.vectorize
您可以将标准python dict与np.vectorize一起使用
inds = {e:i for i, e in enumerate(unique)}
B = np.vectorize(inds.get)(A)
#3
2
The numpy_indexed package (disclaimer: I am its author) contains a vectorized equivalent of list.index, which does not require memory proportional to the max element, but only proportional to the input itself:
numpy_indexed包(免责声明:我是它的作者)包含一个vector.index的向量化等价物,它不需要与max元素成比例的内存,但只与输入本身成比例:
import numpy_indexed as npi
npi.indices(unique, A)
Note that it also works for arbitrary dtypes and dimensions. Also, the array being queried does not need to be unique; the first index encountered will be returned, the same as for list.
请注意,它也适用于任意dtypes和维度。此外,被查询的阵列不需要是唯一的;遇到的第一个索引将被返回,与列表相同。