另一个订购一个numpy数组

时间:2022-12-01 12:33:27

I have an array that determines an ordering of elements:

我有一个数组,确定元素的排序:

order = [3, 1, 4, 2]

And then I want to sort another, larger array (containing only those elements):

然后我想要排序另一个更大的数组(只包含那些元素):

a = np.array([4, 2, 1, 1, 4, 3, 1, 3])    

such that the element(s) that come first in order come first in the results, etc.
In straight Python, I would do this with a key function:

这样,按顺序排在第一位的元素在结果中排​​在第一位,等等。在Python中,我会用一个关键函数来完成:

sorted(a, key=order.index)
[3, 3, 1, 1, 1, 4, 4, 2]

How can I do this (efficiently) with numpy? Is there a similar notion of "key function" for numpy arrays?

如何(有效地)使用numpy这样做? numpy数组是否有类似的“关键功能”概念?

3 个解决方案

#1


5  

Specific case : Ints

For ints, we could use bincount -

对于整数,我们可以使用bincount -

np.repeat(order,np.bincount(a)[order])

Sample run -

样品运行 -

In [146]: sorted(a, key=order.index)
Out[146]: [3, 3, 1, 1, 1, 4, 4, 2]

In [147]: np.repeat(order,np.bincount(a)[order])
Out[147]: array([3, 3, 1, 1, 1, 4, 4, 2])

Generic case

Approach #1

Generalizing for all dtypes with bincount -

使用bincount推广所有dtypes -

# https://*.com/a/41242285/ @Andras Deak
def argsort_unique(idx):
    n = idx.size
    sidx = np.empty(n,dtype=int)
    sidx[idx] = np.arange(n)
    return sidx

sidx = np.argsort(order)
c = np.bincount(np.searchsorted(order,a,sorter=sidx))
out = np.repeat(order, c[argsort_unique(sidx)])

Approach #2-A

With np.unique and searchsorted for the case when all elements from order are in a -

使用np.unique和searchsorted来表示订单中的所有元素都在 -

unq, count = np.unique(a, return_counts=True)
out = np.repeat(order, count[np.searchsorted(unq, order)])

Approach #2-B

To cover for all cases, we need one extra step -

为了涵盖所有情况,我们需要一个额外的步骤 -

unq, count = np.unique(a, return_counts=1)
sidx = np.searchsorted(unq, order)
out = np.repeat(order, np.where(unq[sidx] == order,count[sidx],0))

#2


1  

Building on @Divakar's solution, you can count how many times each element occurs and then repeat the ordered elements that many times:

在@Divakar的解决方案的基础上,您可以计算每个元素出现的次数,然后多次重复排序的元素:

c = Counter(a)
np.repeat(order, [c[v] for v in order])

(You could vectorize the count lookup if you like). I like this because it's linear time, even if it's not pure numpy.

(如果您愿意,可以对计数查找进行矢量化)。我喜欢这个,因为它是线性时间,即使它不是纯粹的numpy。

I guess a pure numpy equivalent would look like this:

我猜一个纯粹的numpy等价物看起来像这样:

count = np.unique(a, return_counts=True)[1]
np.repeat(order, count[np.argsort(np.argsort(order))])

But that's less direct, more code, and way too many sorts. :)

但这不是直接的,更多的代码,以及太多的种类。 :)

#3


0  

This is a fairly direct conversion of your pure-Python approach into numpy. The key idea is replacing the order.index function with a lookup in a sorted vector. Not sure if this is any simpler or faster than the solution you came up with, but it may generalize to some other cases.

这是将纯Python方法直接转换为numpy的方法。关键的想法是使用排序向量中的查找替换order.index函数。不确定这是否比您提出的解决方案更简单或更快,但它可能会推广到其他一些情况。

import numpy as np
order = np.array([3, 1, 4, 2])
a = np.array([4, 2, 1, 1, 4, 3, 1, 3])  

# create sorted lookup vectors
ord = np.argsort(order)
order_sorted = order[ord]
indices_sorted = np.arange(len(order))[ord]

# lookup the index in `order` for each value in the `a` vector
a_indices = np.interp(a, order_sorted, indices_sorted).astype(int)

# sort `a` using the retrieved index values
a_sorted = a[np.argsort(a_indices)]
a_sorted

# array([3, 3, 1, 1, 1, 4, 4, 2])

This is a more direct way (based on this question), but it seems to be about 4 times slower than the np.interp approach:

这是一种更直接的方式(基于这个问题),但它似乎比np.interp方法慢大约4倍:

lookup_dict = dict(zip(order, range(len(order))))
indices = np.vectorize(lookup_dict.__getitem__)(a)
a_sorted = a[np.argsort(indices)]

#1


5  

Specific case : Ints

For ints, we could use bincount -

对于整数,我们可以使用bincount -

np.repeat(order,np.bincount(a)[order])

Sample run -

样品运行 -

In [146]: sorted(a, key=order.index)
Out[146]: [3, 3, 1, 1, 1, 4, 4, 2]

In [147]: np.repeat(order,np.bincount(a)[order])
Out[147]: array([3, 3, 1, 1, 1, 4, 4, 2])

Generic case

Approach #1

Generalizing for all dtypes with bincount -

使用bincount推广所有dtypes -

# https://*.com/a/41242285/ @Andras Deak
def argsort_unique(idx):
    n = idx.size
    sidx = np.empty(n,dtype=int)
    sidx[idx] = np.arange(n)
    return sidx

sidx = np.argsort(order)
c = np.bincount(np.searchsorted(order,a,sorter=sidx))
out = np.repeat(order, c[argsort_unique(sidx)])

Approach #2-A

With np.unique and searchsorted for the case when all elements from order are in a -

使用np.unique和searchsorted来表示订单中的所有元素都在 -

unq, count = np.unique(a, return_counts=True)
out = np.repeat(order, count[np.searchsorted(unq, order)])

Approach #2-B

To cover for all cases, we need one extra step -

为了涵盖所有情况,我们需要一个额外的步骤 -

unq, count = np.unique(a, return_counts=1)
sidx = np.searchsorted(unq, order)
out = np.repeat(order, np.where(unq[sidx] == order,count[sidx],0))

#2


1  

Building on @Divakar's solution, you can count how many times each element occurs and then repeat the ordered elements that many times:

在@Divakar的解决方案的基础上,您可以计算每个元素出现的次数,然后多次重复排序的元素:

c = Counter(a)
np.repeat(order, [c[v] for v in order])

(You could vectorize the count lookup if you like). I like this because it's linear time, even if it's not pure numpy.

(如果您愿意,可以对计数查找进行矢量化)。我喜欢这个,因为它是线性时间,即使它不是纯粹的numpy。

I guess a pure numpy equivalent would look like this:

我猜一个纯粹的numpy等价物看起来像这样:

count = np.unique(a, return_counts=True)[1]
np.repeat(order, count[np.argsort(np.argsort(order))])

But that's less direct, more code, and way too many sorts. :)

但这不是直接的,更多的代码,以及太多的种类。 :)

#3


0  

This is a fairly direct conversion of your pure-Python approach into numpy. The key idea is replacing the order.index function with a lookup in a sorted vector. Not sure if this is any simpler or faster than the solution you came up with, but it may generalize to some other cases.

这是将纯Python方法直接转换为numpy的方法。关键的想法是使用排序向量中的查找替换order.index函数。不确定这是否比您提出的解决方案更简单或更快,但它可能会推广到其他一些情况。

import numpy as np
order = np.array([3, 1, 4, 2])
a = np.array([4, 2, 1, 1, 4, 3, 1, 3])  

# create sorted lookup vectors
ord = np.argsort(order)
order_sorted = order[ord]
indices_sorted = np.arange(len(order))[ord]

# lookup the index in `order` for each value in the `a` vector
a_indices = np.interp(a, order_sorted, indices_sorted).astype(int)

# sort `a` using the retrieved index values
a_sorted = a[np.argsort(a_indices)]
a_sorted

# array([3, 3, 1, 1, 1, 4, 4, 2])

This is a more direct way (based on this question), but it seems to be about 4 times slower than the np.interp approach:

这是一种更直接的方式(基于这个问题),但它似乎比np.interp方法慢大约4倍:

lookup_dict = dict(zip(order, range(len(order))))
indices = np.vectorize(lookup_dict.__getitem__)(a)
a_sorted = a[np.argsort(indices)]