I have an array that determines an ordering of elements:
我有一个数组,确定元素的排序:
order = [3, 1, 4, 2]
And then I want to sort another, larger array (containing only those elements):
然后我想要排序另一个更大的数组(只包含那些元素):
a = np.array([4, 2, 1, 1, 4, 3, 1, 3])
such that the element(s) that come first in order
come first in the results, etc.
In straight Python, I would do this with a key function:
这样,按顺序排在第一位的元素在结果中排在第一位,等等。在Python中,我会用一个关键函数来完成:
sorted(a, key=order.index)
[3, 3, 1, 1, 1, 4, 4, 2]
How can I do this (efficiently) with numpy? Is there a similar notion of "key function" for numpy arrays?
如何(有效地)使用numpy这样做? numpy数组是否有类似的“关键功能”概念?
3 个解决方案
#1
5
Specific case : Ints
For ints
, we could use bincount
-
对于整数,我们可以使用bincount -
np.repeat(order,np.bincount(a)[order])
Sample run -
样品运行 -
In [146]: sorted(a, key=order.index)
Out[146]: [3, 3, 1, 1, 1, 4, 4, 2]
In [147]: np.repeat(order,np.bincount(a)[order])
Out[147]: array([3, 3, 1, 1, 1, 4, 4, 2])
Generic case
Approach #1
Generalizing for all dtypes with bincount
-
使用bincount推广所有dtypes -
# https://*.com/a/41242285/ @Andras Deak
def argsort_unique(idx):
n = idx.size
sidx = np.empty(n,dtype=int)
sidx[idx] = np.arange(n)
return sidx
sidx = np.argsort(order)
c = np.bincount(np.searchsorted(order,a,sorter=sidx))
out = np.repeat(order, c[argsort_unique(sidx)])
Approach #2-A
With np.unique
and searchsorted
for the case when all elements from order
are in a
-
使用np.unique和searchsorted来表示订单中的所有元素都在 -
unq, count = np.unique(a, return_counts=True)
out = np.repeat(order, count[np.searchsorted(unq, order)])
Approach #2-B
To cover for all cases, we need one extra step -
为了涵盖所有情况,我们需要一个额外的步骤 -
unq, count = np.unique(a, return_counts=1)
sidx = np.searchsorted(unq, order)
out = np.repeat(order, np.where(unq[sidx] == order,count[sidx],0))
#2
1
Building on @Divakar's solution, you can count how many times each element occurs and then repeat the ordered elements that many times:
在@Divakar的解决方案的基础上,您可以计算每个元素出现的次数,然后多次重复排序的元素:
c = Counter(a)
np.repeat(order, [c[v] for v in order])
(You could vectorize the count lookup if you like). I like this because it's linear time, even if it's not pure numpy.
(如果您愿意,可以对计数查找进行矢量化)。我喜欢这个,因为它是线性时间,即使它不是纯粹的numpy。
I guess a pure numpy equivalent would look like this:
我猜一个纯粹的numpy等价物看起来像这样:
count = np.unique(a, return_counts=True)[1]
np.repeat(order, count[np.argsort(np.argsort(order))])
But that's less direct, more code, and way too many sorts. :)
但这不是直接的,更多的代码,以及太多的种类。 :)
#3
0
This is a fairly direct conversion of your pure-Python approach into numpy. The key idea is replacing the order.index
function with a lookup in a sorted vector. Not sure if this is any simpler or faster than the solution you came up with, but it may generalize to some other cases.
这是将纯Python方法直接转换为numpy的方法。关键的想法是使用排序向量中的查找替换order.index函数。不确定这是否比您提出的解决方案更简单或更快,但它可能会推广到其他一些情况。
import numpy as np
order = np.array([3, 1, 4, 2])
a = np.array([4, 2, 1, 1, 4, 3, 1, 3])
# create sorted lookup vectors
ord = np.argsort(order)
order_sorted = order[ord]
indices_sorted = np.arange(len(order))[ord]
# lookup the index in `order` for each value in the `a` vector
a_indices = np.interp(a, order_sorted, indices_sorted).astype(int)
# sort `a` using the retrieved index values
a_sorted = a[np.argsort(a_indices)]
a_sorted
# array([3, 3, 1, 1, 1, 4, 4, 2])
This is a more direct way (based on this question), but it seems to be about 4 times slower than the np.interp
approach:
这是一种更直接的方式(基于这个问题),但它似乎比np.interp方法慢大约4倍:
lookup_dict = dict(zip(order, range(len(order))))
indices = np.vectorize(lookup_dict.__getitem__)(a)
a_sorted = a[np.argsort(indices)]
#1
5
Specific case : Ints
For ints
, we could use bincount
-
对于整数,我们可以使用bincount -
np.repeat(order,np.bincount(a)[order])
Sample run -
样品运行 -
In [146]: sorted(a, key=order.index)
Out[146]: [3, 3, 1, 1, 1, 4, 4, 2]
In [147]: np.repeat(order,np.bincount(a)[order])
Out[147]: array([3, 3, 1, 1, 1, 4, 4, 2])
Generic case
Approach #1
Generalizing for all dtypes with bincount
-
使用bincount推广所有dtypes -
# https://*.com/a/41242285/ @Andras Deak
def argsort_unique(idx):
n = idx.size
sidx = np.empty(n,dtype=int)
sidx[idx] = np.arange(n)
return sidx
sidx = np.argsort(order)
c = np.bincount(np.searchsorted(order,a,sorter=sidx))
out = np.repeat(order, c[argsort_unique(sidx)])
Approach #2-A
With np.unique
and searchsorted
for the case when all elements from order
are in a
-
使用np.unique和searchsorted来表示订单中的所有元素都在 -
unq, count = np.unique(a, return_counts=True)
out = np.repeat(order, count[np.searchsorted(unq, order)])
Approach #2-B
To cover for all cases, we need one extra step -
为了涵盖所有情况,我们需要一个额外的步骤 -
unq, count = np.unique(a, return_counts=1)
sidx = np.searchsorted(unq, order)
out = np.repeat(order, np.where(unq[sidx] == order,count[sidx],0))
#2
1
Building on @Divakar's solution, you can count how many times each element occurs and then repeat the ordered elements that many times:
在@Divakar的解决方案的基础上,您可以计算每个元素出现的次数,然后多次重复排序的元素:
c = Counter(a)
np.repeat(order, [c[v] for v in order])
(You could vectorize the count lookup if you like). I like this because it's linear time, even if it's not pure numpy.
(如果您愿意,可以对计数查找进行矢量化)。我喜欢这个,因为它是线性时间,即使它不是纯粹的numpy。
I guess a pure numpy equivalent would look like this:
我猜一个纯粹的numpy等价物看起来像这样:
count = np.unique(a, return_counts=True)[1]
np.repeat(order, count[np.argsort(np.argsort(order))])
But that's less direct, more code, and way too many sorts. :)
但这不是直接的,更多的代码,以及太多的种类。 :)
#3
0
This is a fairly direct conversion of your pure-Python approach into numpy. The key idea is replacing the order.index
function with a lookup in a sorted vector. Not sure if this is any simpler or faster than the solution you came up with, but it may generalize to some other cases.
这是将纯Python方法直接转换为numpy的方法。关键的想法是使用排序向量中的查找替换order.index函数。不确定这是否比您提出的解决方案更简单或更快,但它可能会推广到其他一些情况。
import numpy as np
order = np.array([3, 1, 4, 2])
a = np.array([4, 2, 1, 1, 4, 3, 1, 3])
# create sorted lookup vectors
ord = np.argsort(order)
order_sorted = order[ord]
indices_sorted = np.arange(len(order))[ord]
# lookup the index in `order` for each value in the `a` vector
a_indices = np.interp(a, order_sorted, indices_sorted).astype(int)
# sort `a` using the retrieved index values
a_sorted = a[np.argsort(a_indices)]
a_sorted
# array([3, 3, 1, 1, 1, 4, 4, 2])
This is a more direct way (based on this question), but it seems to be about 4 times slower than the np.interp
approach:
这是一种更直接的方式(基于这个问题),但它似乎比np.interp方法慢大约4倍:
lookup_dict = dict(zip(order, range(len(order))))
indices = np.vectorize(lookup_dict.__getitem__)(a)
a_sorted = a[np.argsort(indices)]