Numpy proposes a way to get the index of the maximum value of an array via np.argmax.


I would like a similar thing, but returning the indexes of the N maximum values.


For instance, if I have an array [1, 3, 2, 4, 5], it function(array, n=3) would return [4, 3, 1].


The simplest I've been able to come up with is:


In [1]: import numpy as np

In [2]: arr = np.array([1, 3, 2, 4, 5])

In [3]: arr.argsort()[-3:][::-1]
Out[3]: array([4, 3, 1])

This involves a complete sort of the array. I wonder if numpy provides a built-in way to do a partial sort; so far I haven't been able to find one.


If this solution turns out to be too slow (especially for small n), it may be worth looking at coding something up in Cython.




Newer NumPy versions (1.8 and up) have a function called argpartition for this. To get the indices of the four largest elements, do


>>> a
array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> ind = np.argpartition(a, -4)[-4:]
>>> ind
array([1, 5, 8, 0])
>>> a[ind]
array([4, 9, 6, 9])

Unlike argsort, this function runs in linear time in the worst case, but the returned indices are not sorted, as can be seen from the result of evaluating a[ind]. If you need that too, sort them afterwards:


>>> ind[np.argsort(a[ind])]
array([1, 8, 5, 0])

To get the top-k elements in sorted order in this way takes O(n + k log k) time.

要以这种方式得到排序顺序的top-k元素,需要O(n + k log k)时间。



EDIT: Modified to include Ashwini Chaudhary's improvement.

编辑:修改为包括Ashwini Chaudhary的改进。

>>> import heapq
>>> import numpy
>>> a = numpy.array([1, 3, 2, 4, 5])
>>> heapq.nlargest(3, range(len(a)), a.take)
[4, 3, 1]

For regular Python lists:


>>> a = [1, 3, 2, 4, 5]
>>> heapq.nlargest(3, range(len(a)), a.__getitem__)
[4, 3, 1]

If you use Python 2, use xrange instead of range.

如果您使用Python 2,则使用xrange而不是range。

Source: http://docs.python.org/3/library/heapq.html




Simpler yet:


idx = (-arr).argsort()[:n]

where n is the number of maximum values.




If you happen to be working with a multidimensional array then you'll need to flatten and unravel the indices:


def largest_indices(ary, n):
    """Returns the n largest indices from a numpy array."""
    flat = ary.flatten()
    indices = np.argpartition(flat, -n)[-n:]
    indices = indices[np.argsort(-flat[indices])]
    return np.unravel_index(indices, ary.shape)

For example:


>>> xs = np.sin(np.arange(9)).reshape((3, 3))
>>> xs
array([[ 0.        ,  0.84147098,  0.90929743],
       [ 0.14112001, -0.7568025 , -0.95892427],
       [-0.2794155 ,  0.6569866 ,  0.98935825]])
>>> largest_indices(xs, 3)
(array([2, 0, 0]), array([2, 2, 1]))
>>> xs[largest_indices(xs, 3)]
array([ 0.98935825,  0.90929743,  0.84147098])



If you don't care about the order of the K-th largest elements you can you use argpartition, which should perform better than a full sort through argsort.


K = 4 # we want the indeces of the four largest values
a = np.array([0, 8, 0, 4, 5, 8, 8, 0, 4, 2])
array([4, 1, 5, 6])

Credits to this question.


I ran a few tests and it looks loke argpartition outperforms argsort as the size of the array and the value of K increase.

我运行了一些测试,看起来loke argpartition在数组的大小和K值增加的情况下比argsort好。



This will be faster than a full sort depending on the size of your original array and the size of your selection:


>>> A = np.random.randint(0,10,10)
>>> A
array([5, 1, 5, 5, 2, 3, 2, 4, 1, 0])
>>> B = np.zeros(3, int)
>>> for i in xrange(3):
...     idx = np.argmax(A)
...     B[i]=idx; A[idx]=0 #something smaller than A.min()
>>> B
array([0, 2, 3])

It, of course, involves tampering with your original array. Which you could fix (if needed) by making a copy or replacing back the original values. ...whichever is cheaper for your use case.




bottleneck has a partial sort function, if the expense of sorting the entire array just to get the N largest values is too great.


I know nothing about this module; I just googled numpy partial sort.




For multidimensional arrays you can use axis keyword in order to apply the partitioning along the expected axis.


# For a 2D array
indices = np.argpartition(arr, -N, axis=1)[:, -N:]

And for grabbing the items:


x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

But note that this won't return a sorted result. In that case you can use np.argsort() along the intended axis:


indices = np.argsort(arr, axis=1)[:, -N:]

# result
x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

Here is an example:


In [42]: a = np.random.randint(0, 20, (10, 10))

In [44]: a
array([[ 7, 11, 12,  0,  2,  3,  4, 10,  6, 10],
       [16, 16,  4,  3, 18,  5, 10,  4, 14,  9],
       [ 2,  9, 15, 12, 18,  3, 13, 11,  5, 10],
       [14,  0,  9, 11,  1,  4,  9, 19, 18, 12],
       [ 0, 10,  5, 15,  9, 18,  5,  2, 16, 19],
       [14, 19,  3, 11, 13, 11, 13, 11,  1, 14],
       [ 7, 15, 18,  6,  5, 13,  1,  7,  9, 19],
       [11, 17, 11, 16, 14,  3, 16,  1, 12, 19],
       [ 2,  4, 14,  8,  6,  9, 14,  9,  1,  5],
       [ 1, 10, 15,  0,  1,  9, 18,  2,  2, 12]])

In [45]: np.argpartition(a, np.argmin(a, axis=0))[:, 1:] # 1 is because the first item is the minimum one.
array([[4, 5, 6, 8, 0, 7, 9, 1, 2],
       [2, 7, 5, 9, 6, 8, 1, 0, 4],
       [5, 8, 1, 9, 7, 3, 6, 2, 4],
       [4, 5, 2, 6, 3, 9, 0, 8, 7],
       [7, 2, 6, 4, 1, 3, 8, 5, 9],
       [2, 3, 5, 7, 6, 4, 0, 9, 1],
       [4, 3, 0, 7, 8, 5, 1, 2, 9],
       [5, 2, 0, 8, 4, 6, 3, 1, 9],
       [0, 1, 9, 4, 3, 7, 5, 2, 6],
       [0, 4, 7, 8, 5, 1, 9, 2, 6]])

In [46]: np.argpartition(a, np.argmin(a, axis=0))[:, -3:]
array([[9, 1, 2],
       [1, 0, 4],
       [6, 2, 4],
       [0, 8, 7],
       [8, 5, 9],
       [0, 9, 1],
       [1, 2, 9],
       [3, 1, 9],
       [5, 2, 6],
       [9, 2, 6]])

In [89]: a[np.repeat(np.arange(x), 3), ind.ravel()].reshape(x, 3)
array([[10, 11, 12],
       [16, 16, 18],
       [13, 15, 18],
       [14, 18, 19],
       [16, 18, 19],
       [14, 14, 19],
       [15, 18, 19],
       [16, 17, 19],
       [ 9, 14, 14],
       [12, 15, 18]])



from operator import itemgetter
from heapq import nlargest
result = nlargest(N, enumerate(your_list), itemgetter(1))

Now the result list would contain N tuples (index, value) where value is maximized




def max_indices(arr, k):
    Returns the indices of the k first largest elements of arr
    (in descending order in values)
    assert k <= arr.size, 'k should be smaller or equal to the array size'
    arr_ = arr.astype(float)  # make a copy of arr
    max_idxs = []
    for _ in range(k):
        max_element = np.max(arr_)
        if np.isinf(max_element):
            idx = np.where(arr_ == max_element)
        arr_[idx] = -np.inf
    return max_idxs

Works also with 2D arrays. E.g.


In [0]: A = np.array([[ 0.51845014,  0.72528114],
                     [ 0.88421561,  0.18798661],
                     [ 0.89832036,  0.19448609],
                     [ 0.89832036,  0.19448609]])
In [1]: max_indices(A, 8)
    [(array([2, 3], dtype=int64), array([0, 0], dtype=int64)),
     (array([1], dtype=int64), array([0], dtype=int64)),
     (array([0], dtype=int64), array([1], dtype=int64)),
     (array([0], dtype=int64), array([0], dtype=int64)),
     (array([2, 3], dtype=int64), array([1, 1], dtype=int64)),
     (array([1], dtype=int64), array([1], dtype=int64))]

In [2]: A[max_indices(A, 8)[0]][0]
Out[2]: array([ 0.89832036])



I found it most intuitive to use np.unique.


The idea is, that the unique method returns the indices of the input values. Then from the max unique value and the indicies, the position of the original values can be recreated.


multi_max = [1,1,2,2,4,0,0,4]
uniques, idx = np.unique(multi_max, return_inverse=True)
print np.squeeze(np.argwhere(idx == np.argmax(uniques)))
>> [4 7]



method np.argpartition only returns the k largest indices, performs a local sort, is faster than np.argsort(performing a full sort) when array is quite large. but returned indices are NOT in ascending/descending order. Let's say with an example:



we can see that if you want a strict ascending order top k indices, np.argpartition won't return what you want.


Apart from doing a sort manually after np.argpartition, my solution is to use PyTorch, torch.topk, a tool for neural network construction, providing numpy-like APIs with both CPU and GPU support. It's as fast as numpy with MKL, and offers GPU boost if you need large matrix/vector calculation.

除了在np之后手动排序。argpartition,我的解决方案是使用PyTorch, torch。topk是一种用于神经网络构建的工具,它提供了与CPU和GPU支持相同的numpi类api。它与MKL的numpy一样快,如果您需要大的矩阵/矢量计算,可以提供GPU boost。

Strict ascend/descend top k indices code will be:

严格提升/下降top k指数代码将是:


Note that torch.topk accepts a torch tensor, and returns both top k values and top k indices in type torch.Tensor. Similar with np, torch.topk also accepts axis argument so that you can handle multi-dimensional array/tensor.

请注意,火炬。topk接受一个torch张量,并返回type torch.张量中的topk值和topk指标。类似于np,火炬。topk也接受axis参数,这样您就可以处理多维数组/张量。



