在Python中对2d列表的长列表进行高效排序和排序

时间:2021-07-22 11:07:18

I have a dataset with several hundred 2d value plots stored in list such as this simplified version (actual plots are up to 100 x 100):

我有一个数据集,其中有数百个2d值图存储在列表中,例如这个简化版本(实际图表高达100 x 100):

[['p01',               ['p0x',
 [[30, 40, 50]         [[42, 52, 72]
  [33, 43, 53]          [44, 63, 83]
  [36, 46, 56]]],...    [76, 96, 99]]

I have so far iterated through the plots to simply check if the current plot has the lowest value for a cell and then stored the lowest costs and lowest plots name into the respective result array, giving me two array like so: lowest values

到目前为止,我已经遍历了图,只是检查当前图是否具有单元格的最低值,然后将最低成本和最低图表名称存储到相应的结果数组中,给我两个数组如下:最低值

[[20, 40, 45]
 [26, 42, 50]
 [36, 44, 51]]]

lowest plot for each point

每个点的最低图

[[p02, p01, p03]
 [p02, p02, p02]
 [p01, p01, p04]]]

What I ideally like is say the lowest 10 of each, i.e. is there an efficient way to go through the plots without having to iterate through loop by loop i.e.: generate lowest, iterate again and see if 2nd lowest (is higher then lowest but lower than currently stored in 2nd lowest array) etc

我理想的是说每个最低的10个,即是否有一种有效的方式来遍历图而不必逐个循环迭代即:生成最低,再次迭代并查看是否第二低(高于最低但是更低)比目前存储在第二低的阵列中的那样)等

2 个解决方案

#1


1  

Well, here's a plain, brute-force Python solution. I don't think it's the very efficient (I believe it's O(n^3log(n)), but it works:

好吧,这是一个简单,蛮力的Python解决方案。我不认为这是非常有效的(我相信它是O(n ^ 3log(n)),但它的工作原理:

Suppose, we have some data like the following:

假设,我们有一些如下数据:

>>> from pprint import pprint
>>> pprint(data)
[['p1',
  [[48, 71, 36, 40, 80, 59],
   [44, 56, 87, 43, 78, 47],
   [45, 71, 86, 61, 45, 27],
   [40, 82, 72, 39, 47, 77],
   [46, 82, 66, 48, 78, 57],
   [49, 38, 65, 56, 75, 79]]],
 ['p2',
  [[82, 49, 72, 76, 48, 67],
   [78, 57, 62, 20, 43, 28],
   [71, 40, 23, 35, 88, 32],
   [51, 66, 73, 84, 68, 35],
   [44, 42, 44, 67, 20, 59],
   [62, 20, 39, 33, 63, 46]]],
 ['p3',
  [[70, 59, 86, 80, 70, 87],
   [88, 47, 38, 63, 56, 63],
   [84, 26, 46, 31, 52, 22],
   [51, 63, 63, 34, 58, 87],
   [75, 69, 39, 37, 88, 35],
   [42, 25, 76, 86, 59, 47]]],
 ['p4',
  [[44, 21, 39, 57, 61, 88],
   [31, 64, 36, 42, 79, 62],
   [41, 38, 21, 82, 71, 60],
   [37, 23, 46, 40, 77, 69],
   [27, 47, 64, 59, 51, 32],
   [23, 68, 76, 67, 39, 60]]],
 ['p5',
  [[33, 41, 41, 54, 25, 86],
   [64, 34, 76, 66, 78, 51],
   [85, 47, 85, 22, 40, 28],
   [20, 33, 30, 59, 86, 47],
   [36, 39, 32, 60, 41, 78],
   [57, 33, 35, 37, 86, 64]]],
 ['p6',
  [[58, 72, 82, 80, 80, 21],
   [41, 45, 57, 67, 74, 39],
   [70, 78, 51, 81, 85, 86],
   [81, 53, 49, 73, 60, 60],
   [26, 66, 60, 38, 87, 54],
   [31, 55, 44, 38, 28, 68]]],
 ['p7',
  [[43, 22, 57, 66, 53, 68],
   [65, 61, 52, 78, 59, 27],
   [66, 42, 58, 79, 75, 60],
   [83, 81, 67, 43, 34, 76],
   [53, 41, 36, 34, 32, 76],
   [68, 43, 53, 46, 54, 41]]],
 ['p8',
  [[74, 65, 37, 50, 51, 87],
   [72, 79, 65, 44, 46, 73],
   [42, 31, 80, 46, 63, 24],
   [83, 40, 28, 39, 86, 29],
   [29, 45, 86, 20, 26, 25],
   [52, 52, 34, 24, 44, 65]]],
 ['p9',
  [[63, 76, 54, 71, 64, 56],
   [24, 30, 67, 65, 49, 50],
   [38, 40, 55, 72, 78, 56],
   [74, 41, 34, 62, 53, 76],
   [30, 30, 36, 86, 69, 74],
   [40, 87, 29, 75, 50, 51]]]]

First, we sort each i, j value along the first axis:

首先,我们沿第一轴对每个i,j值进行排序:

>>> def matrix_idx(x,y, matrix):
...     return matrix[x][y]
...
>>> from operator import itemgetter
>>> sorts = [[sorted([(tag, matrix_idx(i, j, matrix)) for tag, matrix in data], key=itemgetter(1)) for j in range(6)] for i in range(6)]

So, now, each element of sorts:

所以,现在,各种元素:

>>> pprint(sorts[0], width=600)
[[('p5', 33), ('p7', 43), ('p4', 44), ('p1', 48), ('p6', 58), ('p9', 63), ('p3', 70), ('p8', 74), ('p2', 82)],
 [('p4', 21), ('p7', 22), ('p5', 41), ('p2', 49), ('p3', 59), ('p8', 65), ('p1', 71), ('p6', 72), ('p9', 76)],
 [('p1', 36), ('p8', 37), ('p4', 39), ('p5', 41), ('p9', 54), ('p7', 57), ('p2', 72), ('p6', 82), ('p3', 86)],
 [('p1', 40), ('p8', 50), ('p5', 54), ('p4', 57), ('p7', 66), ('p9', 71), ('p2', 76), ('p3', 80), ('p6', 80)],
 [('p5', 25), ('p2', 48), ('p8', 51), ('p7', 53), ('p4', 61), ('p9', 64), ('p3', 70), ('p1', 80), ('p6', 80)],
 [('p6', 21), ('p9', 56), ('p1', 59), ('p2', 67), ('p7', 68), ('p5', 86), ('p3', 87), ('p8', 87), ('p4', 88)]]
>>> len(sorts)
6
>>>
>>> len(sorts[0])
6

What does this correspond to? Consider again:

这对应于什么?再考虑一下:

[['p1',
  [[48, 71, **36**, 40, 80, 59],
   [44, 56, 87, 43, 78, 47],
   [45, 71, 86, 61, 45, 27],
   [40, 82, 72, 39, 47, 77],
   [46, 82, 66, 48, 78, 57],
   [49, 38, 65, 56, 75, 79]]],
...

['p4',
  [[44, **21**, 39, 57, 61, 88],
   [31, 64, 36, 42, 79, 62],
   [41, 38, 21, 82, 71, 60],
   [37, 23, 46, 40, 77, 69],
   [27, 47, 64, 59, 51, 32],
   [23, 68, 76, 67, 39, 60]]],
 ['p5',
  [[**33**, 41, 41, 54, 25, 86],
   [64, 34, 76, 66, 78, 51],
   [85, 47, 85, 22, 40, 28],
   [20, 33, 30, 59, 86, 47],
   [36, 39, 32, 60, 41, 78],
   [57, 33, 35, 37, 86, 64]]],

So each ith element of sorts contains a list, with the sorted values for the ith row, going down the first axis of the data. So, let's do one final transformation to get this into a more useful format, first, let's define a handy namedtuple:

因此,每个排序的第i个元素都包含一个列表,其中第i行的排序值沿着数据的第一轴向下。所以,让我们做一个最后的转换,让它变成一个更有用的格式,首先,让我们定义一个方便的命名元组:

>>> from collections import namedtuple
>>> Data = namedtuple('Data', 'origin value')

Now, finally:

>>> for r in range(len(data)):
...     val = []
...     orig = []
...     for i in range(6):
...         orig.append([sorts[i][j][r][0] for j in range(6)])
...         val.append([sorts[i][j][r][1] for j in range(6)])
...     ranked.append(Data(orig, val))
...

And now, check this:

现在,检查一下:

>>> pprint(ranked[0].value)
[[33, 21, 36, 40, 25, 21],
 [24, 30, 36, 20, 43, 27],
 [38, 26, 21, 22, 40, 22],
 [20, 23, 28, 34, 34, 29],
 [26, 30, 32, 20, 20, 25],
 [23, 20, 29, 24, 28, 41]]
>>> pprint(ranked[0].origin)
[['p5', 'p4', 'p1', 'p1', 'p5', 'p6'],
 ['p9', 'p9', 'p4', 'p2', 'p2', 'p7'],
 ['p9', 'p3', 'p4', 'p5', 'p5', 'p3'],
 ['p5', 'p4', 'p8', 'p3', 'p7', 'p8'],
 ['p6', 'p9', 'p5', 'p8', 'p2', 'p8'],
 ['p4', 'p2', 'p9', 'p8', 'p6', 'p7']]
>>> pprint(ranked[-1].value)
[[82, 76, 86, 80, 80, 88],
 [88, 79, 87, 78, 79, 73],
 [85, 78, 86, 82, 88, 86],
 [83, 82, 73, 84, 86, 87],
 [75, 82, 86, 86, 88, 78],
 [68, 87, 76, 86, 86, 79]]
>>> pprint(ranked[-1].origin)
[['p2', 'p9', 'p3', 'p6', 'p6', 'p4'],
 ['p3', 'p8', 'p1', 'p7', 'p4', 'p8'],
 ['p5', 'p6', 'p1', 'p4', 'p2', 'p6'],
 ['p8', 'p1', 'p2', 'p2', 'p8', 'p3'],
 ['p3', 'p1', 'p8', 'p9', 'p3', 'p5'],
 ['p7', 'p9', 'p4', 'p3', 'p5', 'p1']]

Whether it is performant enough for your use-case I don't know, but it sure was fun!

它是否足以满足您的用例我不知道,但它确实很有趣!

#2


1  

If you have NumPy (you tagged it with so I assume you already work with NumPy) you could use np.min and np.argmin to get the lowest and index of the lowest element:

如果你有NumPy(你用数组标记它,所以我假设你已经使用NumPy)你可以使用np.min和np.argmin来获得最低元素的最低和索引:

>>> import numpy as np

>>> p1 = [[1, 10], 
...       [1, 20]]
>>> p2 = [[2, 8],
...       [4, 6]]
>>> p3 = [[0, 4],
...       [17, 8]]

>>> np.min([p1, p2, p3], axis=0)      # lowest value
array([[0, 4],
       [1, 6]])

>>> np.argmin([p1, p2, p3], axis=0)   # lowest plot number
array([[2, 2],
       [0, 1]], dtype=int64)

In case you want to sort them you can use np.sort and np.argsort:

如果你想对它们进行排序,你可以使用np.sort和np.argsort:

>>> np.sort([p1, p2, p3], axis=0)
array([[[ 0,  4],          # lowest
        [ 1,  6]],

       [[ 1,  8],          # middle
        [ 4,  8]],

       [[ 2, 10],          # highest
        [17, 20]]])

>>> np.argsort([p1, p2, p3], axis=0)
array([[[2, 2],
        [0, 1]],

       [[0, 1],
        [1, 2]],

       [[1, 0],
        [2, 0]]], dtype=int64)

#1


1  

Well, here's a plain, brute-force Python solution. I don't think it's the very efficient (I believe it's O(n^3log(n)), but it works:

好吧,这是一个简单,蛮力的Python解决方案。我不认为这是非常有效的(我相信它是O(n ^ 3log(n)),但它的工作原理:

Suppose, we have some data like the following:

假设,我们有一些如下数据:

>>> from pprint import pprint
>>> pprint(data)
[['p1',
  [[48, 71, 36, 40, 80, 59],
   [44, 56, 87, 43, 78, 47],
   [45, 71, 86, 61, 45, 27],
   [40, 82, 72, 39, 47, 77],
   [46, 82, 66, 48, 78, 57],
   [49, 38, 65, 56, 75, 79]]],
 ['p2',
  [[82, 49, 72, 76, 48, 67],
   [78, 57, 62, 20, 43, 28],
   [71, 40, 23, 35, 88, 32],
   [51, 66, 73, 84, 68, 35],
   [44, 42, 44, 67, 20, 59],
   [62, 20, 39, 33, 63, 46]]],
 ['p3',
  [[70, 59, 86, 80, 70, 87],
   [88, 47, 38, 63, 56, 63],
   [84, 26, 46, 31, 52, 22],
   [51, 63, 63, 34, 58, 87],
   [75, 69, 39, 37, 88, 35],
   [42, 25, 76, 86, 59, 47]]],
 ['p4',
  [[44, 21, 39, 57, 61, 88],
   [31, 64, 36, 42, 79, 62],
   [41, 38, 21, 82, 71, 60],
   [37, 23, 46, 40, 77, 69],
   [27, 47, 64, 59, 51, 32],
   [23, 68, 76, 67, 39, 60]]],
 ['p5',
  [[33, 41, 41, 54, 25, 86],
   [64, 34, 76, 66, 78, 51],
   [85, 47, 85, 22, 40, 28],
   [20, 33, 30, 59, 86, 47],
   [36, 39, 32, 60, 41, 78],
   [57, 33, 35, 37, 86, 64]]],
 ['p6',
  [[58, 72, 82, 80, 80, 21],
   [41, 45, 57, 67, 74, 39],
   [70, 78, 51, 81, 85, 86],
   [81, 53, 49, 73, 60, 60],
   [26, 66, 60, 38, 87, 54],
   [31, 55, 44, 38, 28, 68]]],
 ['p7',
  [[43, 22, 57, 66, 53, 68],
   [65, 61, 52, 78, 59, 27],
   [66, 42, 58, 79, 75, 60],
   [83, 81, 67, 43, 34, 76],
   [53, 41, 36, 34, 32, 76],
   [68, 43, 53, 46, 54, 41]]],
 ['p8',
  [[74, 65, 37, 50, 51, 87],
   [72, 79, 65, 44, 46, 73],
   [42, 31, 80, 46, 63, 24],
   [83, 40, 28, 39, 86, 29],
   [29, 45, 86, 20, 26, 25],
   [52, 52, 34, 24, 44, 65]]],
 ['p9',
  [[63, 76, 54, 71, 64, 56],
   [24, 30, 67, 65, 49, 50],
   [38, 40, 55, 72, 78, 56],
   [74, 41, 34, 62, 53, 76],
   [30, 30, 36, 86, 69, 74],
   [40, 87, 29, 75, 50, 51]]]]

First, we sort each i, j value along the first axis:

首先,我们沿第一轴对每个i,j值进行排序:

>>> def matrix_idx(x,y, matrix):
...     return matrix[x][y]
...
>>> from operator import itemgetter
>>> sorts = [[sorted([(tag, matrix_idx(i, j, matrix)) for tag, matrix in data], key=itemgetter(1)) for j in range(6)] for i in range(6)]

So, now, each element of sorts:

所以,现在,各种元素:

>>> pprint(sorts[0], width=600)
[[('p5', 33), ('p7', 43), ('p4', 44), ('p1', 48), ('p6', 58), ('p9', 63), ('p3', 70), ('p8', 74), ('p2', 82)],
 [('p4', 21), ('p7', 22), ('p5', 41), ('p2', 49), ('p3', 59), ('p8', 65), ('p1', 71), ('p6', 72), ('p9', 76)],
 [('p1', 36), ('p8', 37), ('p4', 39), ('p5', 41), ('p9', 54), ('p7', 57), ('p2', 72), ('p6', 82), ('p3', 86)],
 [('p1', 40), ('p8', 50), ('p5', 54), ('p4', 57), ('p7', 66), ('p9', 71), ('p2', 76), ('p3', 80), ('p6', 80)],
 [('p5', 25), ('p2', 48), ('p8', 51), ('p7', 53), ('p4', 61), ('p9', 64), ('p3', 70), ('p1', 80), ('p6', 80)],
 [('p6', 21), ('p9', 56), ('p1', 59), ('p2', 67), ('p7', 68), ('p5', 86), ('p3', 87), ('p8', 87), ('p4', 88)]]
>>> len(sorts)
6
>>>
>>> len(sorts[0])
6

What does this correspond to? Consider again:

这对应于什么?再考虑一下:

[['p1',
  [[48, 71, **36**, 40, 80, 59],
   [44, 56, 87, 43, 78, 47],
   [45, 71, 86, 61, 45, 27],
   [40, 82, 72, 39, 47, 77],
   [46, 82, 66, 48, 78, 57],
   [49, 38, 65, 56, 75, 79]]],
...

['p4',
  [[44, **21**, 39, 57, 61, 88],
   [31, 64, 36, 42, 79, 62],
   [41, 38, 21, 82, 71, 60],
   [37, 23, 46, 40, 77, 69],
   [27, 47, 64, 59, 51, 32],
   [23, 68, 76, 67, 39, 60]]],
 ['p5',
  [[**33**, 41, 41, 54, 25, 86],
   [64, 34, 76, 66, 78, 51],
   [85, 47, 85, 22, 40, 28],
   [20, 33, 30, 59, 86, 47],
   [36, 39, 32, 60, 41, 78],
   [57, 33, 35, 37, 86, 64]]],

So each ith element of sorts contains a list, with the sorted values for the ith row, going down the first axis of the data. So, let's do one final transformation to get this into a more useful format, first, let's define a handy namedtuple:

因此,每个排序的第i个元素都包含一个列表,其中第i行的排序值沿着数据的第一轴向下。所以,让我们做一个最后的转换,让它变成一个更有用的格式,首先,让我们定义一个方便的命名元组:

>>> from collections import namedtuple
>>> Data = namedtuple('Data', 'origin value')

Now, finally:

>>> for r in range(len(data)):
...     val = []
...     orig = []
...     for i in range(6):
...         orig.append([sorts[i][j][r][0] for j in range(6)])
...         val.append([sorts[i][j][r][1] for j in range(6)])
...     ranked.append(Data(orig, val))
...

And now, check this:

现在,检查一下:

>>> pprint(ranked[0].value)
[[33, 21, 36, 40, 25, 21],
 [24, 30, 36, 20, 43, 27],
 [38, 26, 21, 22, 40, 22],
 [20, 23, 28, 34, 34, 29],
 [26, 30, 32, 20, 20, 25],
 [23, 20, 29, 24, 28, 41]]
>>> pprint(ranked[0].origin)
[['p5', 'p4', 'p1', 'p1', 'p5', 'p6'],
 ['p9', 'p9', 'p4', 'p2', 'p2', 'p7'],
 ['p9', 'p3', 'p4', 'p5', 'p5', 'p3'],
 ['p5', 'p4', 'p8', 'p3', 'p7', 'p8'],
 ['p6', 'p9', 'p5', 'p8', 'p2', 'p8'],
 ['p4', 'p2', 'p9', 'p8', 'p6', 'p7']]
>>> pprint(ranked[-1].value)
[[82, 76, 86, 80, 80, 88],
 [88, 79, 87, 78, 79, 73],
 [85, 78, 86, 82, 88, 86],
 [83, 82, 73, 84, 86, 87],
 [75, 82, 86, 86, 88, 78],
 [68, 87, 76, 86, 86, 79]]
>>> pprint(ranked[-1].origin)
[['p2', 'p9', 'p3', 'p6', 'p6', 'p4'],
 ['p3', 'p8', 'p1', 'p7', 'p4', 'p8'],
 ['p5', 'p6', 'p1', 'p4', 'p2', 'p6'],
 ['p8', 'p1', 'p2', 'p2', 'p8', 'p3'],
 ['p3', 'p1', 'p8', 'p9', 'p3', 'p5'],
 ['p7', 'p9', 'p4', 'p3', 'p5', 'p1']]

Whether it is performant enough for your use-case I don't know, but it sure was fun!

它是否足以满足您的用例我不知道,但它确实很有趣!

#2


1  

If you have NumPy (you tagged it with so I assume you already work with NumPy) you could use np.min and np.argmin to get the lowest and index of the lowest element:

如果你有NumPy(你用数组标记它,所以我假设你已经使用NumPy)你可以使用np.min和np.argmin来获得最低元素的最低和索引:

>>> import numpy as np

>>> p1 = [[1, 10], 
...       [1, 20]]
>>> p2 = [[2, 8],
...       [4, 6]]
>>> p3 = [[0, 4],
...       [17, 8]]

>>> np.min([p1, p2, p3], axis=0)      # lowest value
array([[0, 4],
       [1, 6]])

>>> np.argmin([p1, p2, p3], axis=0)   # lowest plot number
array([[2, 2],
       [0, 1]], dtype=int64)

In case you want to sort them you can use np.sort and np.argsort:

如果你想对它们进行排序,你可以使用np.sort和np.argsort:

>>> np.sort([p1, p2, p3], axis=0)
array([[[ 0,  4],          # lowest
        [ 1,  6]],

       [[ 1,  8],          # middle
        [ 4,  8]],

       [[ 2, 10],          # highest
        [17, 20]]])

>>> np.argsort([p1, p2, p3], axis=0)
array([[[2, 2],
        [0, 1]],

       [[0, 1],
        [1, 2]],

       [[1, 0],
        [2, 0]]], dtype=int64)