找到numpy数组中的n个最小项

时间:2021-03-10 21:40:03

There are plenty of questions on here where one wants to find the nth smallest element in a numpy array. However, what if you have an array of arrays? Like so:

这里有很多问题,想要找到numpy数组中的第n个最小元素。但是,如果你有一个数组数组怎么办?像这样:

>>> print matrix
[[ 1.          0.28958002  0.09972488 ...,  0.46999924  0.64723113
   0.60217694]
 [ 0.28958002  1.          0.58005657 ...,  0.37668355  0.48852272
   0.3860152 ]
 [ 0.09972488  0.58005657  1.         ...,  0.13151364  0.29539992
   0.03686381]
 ..., 
 [ 0.46999924  0.37668355  0.13151364 ...,  1.          0.50250212
   0.73128971]
 [ 0.64723113  0.48852272  0.29539992 ...,  0.50250212  1.          0.71249226]
 [ 0.60217694  0.3860152   0.03686381 ...,  0.73128971  0.71249226  1.        ]]

How can I get the n smallest items out of this array of arrays?

如何从这个数组中获取n个最小的项目?

>>> print type(matrix)
<type 'numpy.ndarray'>

This is how I have been doing it to find the coordinates of the smallest item:

这就是我一直在寻找最小项目的坐标:

min_cordinates = []
for i in matrix:
    if numpy.any(numpy.where(i==numpy.amin(matrix))[0]):
        min_cordinates.append(int(numpy.where(i==numpy.amin(matrix))[0][0])+1)

Now I would like to find, for example, the 10 smallest items.

现在我想找到10个最小的项目。

3 个解决方案

#1


5  

Flatten the matrix, sort and then select the first 10.

展平矩阵,排序然后选择前10个。

print(numpy.sort(matrix.flatten())[:10])

#2


5  

If your array is not large, the accepted answer is fine. For large arrays, np.partition will accomplish this much more efficiently. Here's an example where the array has 10000 elements, and you want the 10 smallest values:

如果您的数组不大,接受的答案就可以了。对于大型数组,np.partition将更有效地完成此任务。这是一个数组有10000个元素的例子,你想要10个最小的值:

In [56]: np.random.seed(123)

In [57]: a = 10*np.random.rand(100, 100)

Use np.partition to get the 10 smallest values:

使用np.partition获取10个最小值:

In [58]: np.partition(a, 10, axis=None)[:10]
Out[58]: 
array([ 0.00067838,  0.00081888,  0.00124711,  0.00120101,  0.00135942,
        0.00271129,  0.00297489,  0.00489126,  0.00556923,  0.00594738])

Note that the values are not in increasing order. np.partition does not guarantee that the first 10 values will be sorted. If you need them in increasing order, you can sort the selected values afterwards. This will still be faster than sorting the entire array.

请注意,值不是按递增顺序排列的。 np.partition不保证前10个值将被排序。如果按升序需要它们,则可以在之后对所选值进行排序。这仍然比排序整个阵列更快。

Here's the result using np.sort:

这是使用np.sort的结果:

In [59]: np.sort(a, axis=None)[:10]
Out[59]: 
array([ 0.00067838,  0.00081888,  0.00120101,  0.00124711,  0.00135942,
        0.00271129,  0.00297489,  0.00489126,  0.00556923,  0.00594738])

Now compare the timing:

现在比较时间:

In [60]: %timeit np.partition(a, 10, axis=None)[:10]
10000 loops, best of 3: 75.1 µs per loop

In [61]: %timeit np.sort(a, axis=None)[:10]
1000 loops, best of 3: 465 µs per loop

In this case, using np.partition is more than six times faster.

在这种情况下,使用np.partition的速度要快六倍。

#3


3  

You can use the heapq.nsmallest function to return the list of the 10 smallest elements.

您可以使用heapq.nsmallest函数返回10个最小元素的列表。

In [84]: import heapq

In [85]: heapq.nsmallest(10, matrix.flatten())
Out[85]: 
[-1.7009047695355393,
 -1.4737632239971061,
 -1.1246243781838825,
 -0.7862983016935523,
 -0.5080863016259798,
 -0.43802651199959347,
 -0.22125698200832566,
 0.034938408281615596,
 0.13610084041121048,
 0.15876389111565958]

#1


5  

Flatten the matrix, sort and then select the first 10.

展平矩阵,排序然后选择前10个。

print(numpy.sort(matrix.flatten())[:10])

#2


5  

If your array is not large, the accepted answer is fine. For large arrays, np.partition will accomplish this much more efficiently. Here's an example where the array has 10000 elements, and you want the 10 smallest values:

如果您的数组不大,接受的答案就可以了。对于大型数组,np.partition将更有效地完成此任务。这是一个数组有10000个元素的例子,你想要10个最小的值:

In [56]: np.random.seed(123)

In [57]: a = 10*np.random.rand(100, 100)

Use np.partition to get the 10 smallest values:

使用np.partition获取10个最小值:

In [58]: np.partition(a, 10, axis=None)[:10]
Out[58]: 
array([ 0.00067838,  0.00081888,  0.00124711,  0.00120101,  0.00135942,
        0.00271129,  0.00297489,  0.00489126,  0.00556923,  0.00594738])

Note that the values are not in increasing order. np.partition does not guarantee that the first 10 values will be sorted. If you need them in increasing order, you can sort the selected values afterwards. This will still be faster than sorting the entire array.

请注意,值不是按递增顺序排列的。 np.partition不保证前10个值将被排序。如果按升序需要它们,则可以在之后对所选值进行排序。这仍然比排序整个阵列更快。

Here's the result using np.sort:

这是使用np.sort的结果:

In [59]: np.sort(a, axis=None)[:10]
Out[59]: 
array([ 0.00067838,  0.00081888,  0.00120101,  0.00124711,  0.00135942,
        0.00271129,  0.00297489,  0.00489126,  0.00556923,  0.00594738])

Now compare the timing:

现在比较时间:

In [60]: %timeit np.partition(a, 10, axis=None)[:10]
10000 loops, best of 3: 75.1 µs per loop

In [61]: %timeit np.sort(a, axis=None)[:10]
1000 loops, best of 3: 465 µs per loop

In this case, using np.partition is more than six times faster.

在这种情况下,使用np.partition的速度要快六倍。

#3


3  

You can use the heapq.nsmallest function to return the list of the 10 smallest elements.

您可以使用heapq.nsmallest函数返回10个最小元素的列表。

In [84]: import heapq

In [85]: heapq.nsmallest(10, matrix.flatten())
Out[85]: 
[-1.7009047695355393,
 -1.4737632239971061,
 -1.1246243781838825,
 -0.7862983016935523,
 -0.5080863016259798,
 -0.43802651199959347,
 -0.22125698200832566,
 0.034938408281615596,
 0.13610084041121048,
 0.15876389111565958]