I have a dataset with several hundred 2d value plots stored in list such as this simplified version (actual plots are up to 100 x 100):
我有一个数据集,其中有数百个2d值图存储在列表中,例如这个简化版本(实际图表高达100 x 100):
[['p01', ['p0x',
[[30, 40, 50] [[42, 52, 72]
[33, 43, 53] [44, 63, 83]
[36, 46, 56]]],... [76, 96, 99]]
I have so far iterated through the plots to simply check if the current plot has the lowest value for a cell and then stored the lowest costs and lowest plots name into the respective result array, giving me two array like so: lowest values
到目前为止,我已经遍历了图,只是检查当前图是否具有单元格的最低值,然后将最低成本和最低图表名称存储到相应的结果数组中,给我两个数组如下:最低值
[[20, 40, 45]
[26, 42, 50]
[36, 44, 51]]]
lowest plot for each point
每个点的最低图
[[p02, p01, p03]
[p02, p02, p02]
[p01, p01, p04]]]
What I ideally like is say the lowest 10 of each, i.e. is there an efficient way to go through the plots without having to iterate through loop by loop i.e.: generate lowest, iterate again and see if 2nd lowest (is higher then lowest but lower than currently stored in 2nd lowest array) etc
我理想的是说每个最低的10个,即是否有一种有效的方式来遍历图而不必逐个循环迭代即:生成最低,再次迭代并查看是否第二低(高于最低但是更低)比目前存储在第二低的阵列中的那样)等
2 个解决方案
#1
1
Well, here's a plain, brute-force Python solution. I don't think it's the very efficient (I believe it's O(n^3log(n)), but it works:
好吧,这是一个简单,蛮力的Python解决方案。我不认为这是非常有效的(我相信它是O(n ^ 3log(n)),但它的工作原理:
Suppose, we have some data like the following:
假设,我们有一些如下数据:
>>> from pprint import pprint
>>> pprint(data)
[['p1',
[[48, 71, 36, 40, 80, 59],
[44, 56, 87, 43, 78, 47],
[45, 71, 86, 61, 45, 27],
[40, 82, 72, 39, 47, 77],
[46, 82, 66, 48, 78, 57],
[49, 38, 65, 56, 75, 79]]],
['p2',
[[82, 49, 72, 76, 48, 67],
[78, 57, 62, 20, 43, 28],
[71, 40, 23, 35, 88, 32],
[51, 66, 73, 84, 68, 35],
[44, 42, 44, 67, 20, 59],
[62, 20, 39, 33, 63, 46]]],
['p3',
[[70, 59, 86, 80, 70, 87],
[88, 47, 38, 63, 56, 63],
[84, 26, 46, 31, 52, 22],
[51, 63, 63, 34, 58, 87],
[75, 69, 39, 37, 88, 35],
[42, 25, 76, 86, 59, 47]]],
['p4',
[[44, 21, 39, 57, 61, 88],
[31, 64, 36, 42, 79, 62],
[41, 38, 21, 82, 71, 60],
[37, 23, 46, 40, 77, 69],
[27, 47, 64, 59, 51, 32],
[23, 68, 76, 67, 39, 60]]],
['p5',
[[33, 41, 41, 54, 25, 86],
[64, 34, 76, 66, 78, 51],
[85, 47, 85, 22, 40, 28],
[20, 33, 30, 59, 86, 47],
[36, 39, 32, 60, 41, 78],
[57, 33, 35, 37, 86, 64]]],
['p6',
[[58, 72, 82, 80, 80, 21],
[41, 45, 57, 67, 74, 39],
[70, 78, 51, 81, 85, 86],
[81, 53, 49, 73, 60, 60],
[26, 66, 60, 38, 87, 54],
[31, 55, 44, 38, 28, 68]]],
['p7',
[[43, 22, 57, 66, 53, 68],
[65, 61, 52, 78, 59, 27],
[66, 42, 58, 79, 75, 60],
[83, 81, 67, 43, 34, 76],
[53, 41, 36, 34, 32, 76],
[68, 43, 53, 46, 54, 41]]],
['p8',
[[74, 65, 37, 50, 51, 87],
[72, 79, 65, 44, 46, 73],
[42, 31, 80, 46, 63, 24],
[83, 40, 28, 39, 86, 29],
[29, 45, 86, 20, 26, 25],
[52, 52, 34, 24, 44, 65]]],
['p9',
[[63, 76, 54, 71, 64, 56],
[24, 30, 67, 65, 49, 50],
[38, 40, 55, 72, 78, 56],
[74, 41, 34, 62, 53, 76],
[30, 30, 36, 86, 69, 74],
[40, 87, 29, 75, 50, 51]]]]
First, we sort each i, j
value along the first axis:
首先,我们沿第一轴对每个i,j值进行排序:
>>> def matrix_idx(x,y, matrix):
... return matrix[x][y]
...
>>> from operator import itemgetter
>>> sorts = [[sorted([(tag, matrix_idx(i, j, matrix)) for tag, matrix in data], key=itemgetter(1)) for j in range(6)] for i in range(6)]
So, now, each element of sorts
:
所以,现在,各种元素:
>>> pprint(sorts[0], width=600)
[[('p5', 33), ('p7', 43), ('p4', 44), ('p1', 48), ('p6', 58), ('p9', 63), ('p3', 70), ('p8', 74), ('p2', 82)],
[('p4', 21), ('p7', 22), ('p5', 41), ('p2', 49), ('p3', 59), ('p8', 65), ('p1', 71), ('p6', 72), ('p9', 76)],
[('p1', 36), ('p8', 37), ('p4', 39), ('p5', 41), ('p9', 54), ('p7', 57), ('p2', 72), ('p6', 82), ('p3', 86)],
[('p1', 40), ('p8', 50), ('p5', 54), ('p4', 57), ('p7', 66), ('p9', 71), ('p2', 76), ('p3', 80), ('p6', 80)],
[('p5', 25), ('p2', 48), ('p8', 51), ('p7', 53), ('p4', 61), ('p9', 64), ('p3', 70), ('p1', 80), ('p6', 80)],
[('p6', 21), ('p9', 56), ('p1', 59), ('p2', 67), ('p7', 68), ('p5', 86), ('p3', 87), ('p8', 87), ('p4', 88)]]
>>> len(sorts)
6
>>>
>>> len(sorts[0])
6
What does this correspond to? Consider again:
这对应于什么?再考虑一下:
[['p1',
[[48, 71, **36**, 40, 80, 59],
[44, 56, 87, 43, 78, 47],
[45, 71, 86, 61, 45, 27],
[40, 82, 72, 39, 47, 77],
[46, 82, 66, 48, 78, 57],
[49, 38, 65, 56, 75, 79]]],
...
['p4',
[[44, **21**, 39, 57, 61, 88],
[31, 64, 36, 42, 79, 62],
[41, 38, 21, 82, 71, 60],
[37, 23, 46, 40, 77, 69],
[27, 47, 64, 59, 51, 32],
[23, 68, 76, 67, 39, 60]]],
['p5',
[[**33**, 41, 41, 54, 25, 86],
[64, 34, 76, 66, 78, 51],
[85, 47, 85, 22, 40, 28],
[20, 33, 30, 59, 86, 47],
[36, 39, 32, 60, 41, 78],
[57, 33, 35, 37, 86, 64]]],
So each ith element of sorts
contains a list, with the sorted values for the ith row, going down the first axis of the data
. So, let's do one final transformation to get this into a more useful format, first, let's define a handy namedtuple
:
因此,每个排序的第i个元素都包含一个列表,其中第i行的排序值沿着数据的第一轴向下。所以,让我们做一个最后的转换,让它变成一个更有用的格式,首先,让我们定义一个方便的命名元组:
>>> from collections import namedtuple
>>> Data = namedtuple('Data', 'origin value')
Now, finally:
>>> for r in range(len(data)):
... val = []
... orig = []
... for i in range(6):
... orig.append([sorts[i][j][r][0] for j in range(6)])
... val.append([sorts[i][j][r][1] for j in range(6)])
... ranked.append(Data(orig, val))
...
And now, check this:
现在,检查一下:
>>> pprint(ranked[0].value)
[[33, 21, 36, 40, 25, 21],
[24, 30, 36, 20, 43, 27],
[38, 26, 21, 22, 40, 22],
[20, 23, 28, 34, 34, 29],
[26, 30, 32, 20, 20, 25],
[23, 20, 29, 24, 28, 41]]
>>> pprint(ranked[0].origin)
[['p5', 'p4', 'p1', 'p1', 'p5', 'p6'],
['p9', 'p9', 'p4', 'p2', 'p2', 'p7'],
['p9', 'p3', 'p4', 'p5', 'p5', 'p3'],
['p5', 'p4', 'p8', 'p3', 'p7', 'p8'],
['p6', 'p9', 'p5', 'p8', 'p2', 'p8'],
['p4', 'p2', 'p9', 'p8', 'p6', 'p7']]
>>> pprint(ranked[-1].value)
[[82, 76, 86, 80, 80, 88],
[88, 79, 87, 78, 79, 73],
[85, 78, 86, 82, 88, 86],
[83, 82, 73, 84, 86, 87],
[75, 82, 86, 86, 88, 78],
[68, 87, 76, 86, 86, 79]]
>>> pprint(ranked[-1].origin)
[['p2', 'p9', 'p3', 'p6', 'p6', 'p4'],
['p3', 'p8', 'p1', 'p7', 'p4', 'p8'],
['p5', 'p6', 'p1', 'p4', 'p2', 'p6'],
['p8', 'p1', 'p2', 'p2', 'p8', 'p3'],
['p3', 'p1', 'p8', 'p9', 'p3', 'p5'],
['p7', 'p9', 'p4', 'p3', 'p5', 'p1']]
Whether it is performant enough for your use-case I don't know, but it sure was fun!
它是否足以满足您的用例我不知道,但它确实很有趣!
#2
1
If you have NumPy (you tagged it with arrays so I assume you already work with NumPy) you could use np.min
and np.argmin
to get the lowest and index of the lowest element:
如果你有NumPy(你用数组标记它,所以我假设你已经使用NumPy)你可以使用np.min和np.argmin来获得最低元素的最低和索引:
>>> import numpy as np
>>> p1 = [[1, 10],
... [1, 20]]
>>> p2 = [[2, 8],
... [4, 6]]
>>> p3 = [[0, 4],
... [17, 8]]
>>> np.min([p1, p2, p3], axis=0) # lowest value
array([[0, 4],
[1, 6]])
>>> np.argmin([p1, p2, p3], axis=0) # lowest plot number
array([[2, 2],
[0, 1]], dtype=int64)
In case you want to sort them you can use np.sort
and np.argsort
:
如果你想对它们进行排序,你可以使用np.sort和np.argsort:
>>> np.sort([p1, p2, p3], axis=0)
array([[[ 0, 4], # lowest
[ 1, 6]],
[[ 1, 8], # middle
[ 4, 8]],
[[ 2, 10], # highest
[17, 20]]])
>>> np.argsort([p1, p2, p3], axis=0)
array([[[2, 2],
[0, 1]],
[[0, 1],
[1, 2]],
[[1, 0],
[2, 0]]], dtype=int64)
#1
1
Well, here's a plain, brute-force Python solution. I don't think it's the very efficient (I believe it's O(n^3log(n)), but it works:
好吧,这是一个简单,蛮力的Python解决方案。我不认为这是非常有效的(我相信它是O(n ^ 3log(n)),但它的工作原理:
Suppose, we have some data like the following:
假设,我们有一些如下数据:
>>> from pprint import pprint
>>> pprint(data)
[['p1',
[[48, 71, 36, 40, 80, 59],
[44, 56, 87, 43, 78, 47],
[45, 71, 86, 61, 45, 27],
[40, 82, 72, 39, 47, 77],
[46, 82, 66, 48, 78, 57],
[49, 38, 65, 56, 75, 79]]],
['p2',
[[82, 49, 72, 76, 48, 67],
[78, 57, 62, 20, 43, 28],
[71, 40, 23, 35, 88, 32],
[51, 66, 73, 84, 68, 35],
[44, 42, 44, 67, 20, 59],
[62, 20, 39, 33, 63, 46]]],
['p3',
[[70, 59, 86, 80, 70, 87],
[88, 47, 38, 63, 56, 63],
[84, 26, 46, 31, 52, 22],
[51, 63, 63, 34, 58, 87],
[75, 69, 39, 37, 88, 35],
[42, 25, 76, 86, 59, 47]]],
['p4',
[[44, 21, 39, 57, 61, 88],
[31, 64, 36, 42, 79, 62],
[41, 38, 21, 82, 71, 60],
[37, 23, 46, 40, 77, 69],
[27, 47, 64, 59, 51, 32],
[23, 68, 76, 67, 39, 60]]],
['p5',
[[33, 41, 41, 54, 25, 86],
[64, 34, 76, 66, 78, 51],
[85, 47, 85, 22, 40, 28],
[20, 33, 30, 59, 86, 47],
[36, 39, 32, 60, 41, 78],
[57, 33, 35, 37, 86, 64]]],
['p6',
[[58, 72, 82, 80, 80, 21],
[41, 45, 57, 67, 74, 39],
[70, 78, 51, 81, 85, 86],
[81, 53, 49, 73, 60, 60],
[26, 66, 60, 38, 87, 54],
[31, 55, 44, 38, 28, 68]]],
['p7',
[[43, 22, 57, 66, 53, 68],
[65, 61, 52, 78, 59, 27],
[66, 42, 58, 79, 75, 60],
[83, 81, 67, 43, 34, 76],
[53, 41, 36, 34, 32, 76],
[68, 43, 53, 46, 54, 41]]],
['p8',
[[74, 65, 37, 50, 51, 87],
[72, 79, 65, 44, 46, 73],
[42, 31, 80, 46, 63, 24],
[83, 40, 28, 39, 86, 29],
[29, 45, 86, 20, 26, 25],
[52, 52, 34, 24, 44, 65]]],
['p9',
[[63, 76, 54, 71, 64, 56],
[24, 30, 67, 65, 49, 50],
[38, 40, 55, 72, 78, 56],
[74, 41, 34, 62, 53, 76],
[30, 30, 36, 86, 69, 74],
[40, 87, 29, 75, 50, 51]]]]
First, we sort each i, j
value along the first axis:
首先,我们沿第一轴对每个i,j值进行排序:
>>> def matrix_idx(x,y, matrix):
... return matrix[x][y]
...
>>> from operator import itemgetter
>>> sorts = [[sorted([(tag, matrix_idx(i, j, matrix)) for tag, matrix in data], key=itemgetter(1)) for j in range(6)] for i in range(6)]
So, now, each element of sorts
:
所以,现在,各种元素:
>>> pprint(sorts[0], width=600)
[[('p5', 33), ('p7', 43), ('p4', 44), ('p1', 48), ('p6', 58), ('p9', 63), ('p3', 70), ('p8', 74), ('p2', 82)],
[('p4', 21), ('p7', 22), ('p5', 41), ('p2', 49), ('p3', 59), ('p8', 65), ('p1', 71), ('p6', 72), ('p9', 76)],
[('p1', 36), ('p8', 37), ('p4', 39), ('p5', 41), ('p9', 54), ('p7', 57), ('p2', 72), ('p6', 82), ('p3', 86)],
[('p1', 40), ('p8', 50), ('p5', 54), ('p4', 57), ('p7', 66), ('p9', 71), ('p2', 76), ('p3', 80), ('p6', 80)],
[('p5', 25), ('p2', 48), ('p8', 51), ('p7', 53), ('p4', 61), ('p9', 64), ('p3', 70), ('p1', 80), ('p6', 80)],
[('p6', 21), ('p9', 56), ('p1', 59), ('p2', 67), ('p7', 68), ('p5', 86), ('p3', 87), ('p8', 87), ('p4', 88)]]
>>> len(sorts)
6
>>>
>>> len(sorts[0])
6
What does this correspond to? Consider again:
这对应于什么?再考虑一下:
[['p1',
[[48, 71, **36**, 40, 80, 59],
[44, 56, 87, 43, 78, 47],
[45, 71, 86, 61, 45, 27],
[40, 82, 72, 39, 47, 77],
[46, 82, 66, 48, 78, 57],
[49, 38, 65, 56, 75, 79]]],
...
['p4',
[[44, **21**, 39, 57, 61, 88],
[31, 64, 36, 42, 79, 62],
[41, 38, 21, 82, 71, 60],
[37, 23, 46, 40, 77, 69],
[27, 47, 64, 59, 51, 32],
[23, 68, 76, 67, 39, 60]]],
['p5',
[[**33**, 41, 41, 54, 25, 86],
[64, 34, 76, 66, 78, 51],
[85, 47, 85, 22, 40, 28],
[20, 33, 30, 59, 86, 47],
[36, 39, 32, 60, 41, 78],
[57, 33, 35, 37, 86, 64]]],
So each ith element of sorts
contains a list, with the sorted values for the ith row, going down the first axis of the data
. So, let's do one final transformation to get this into a more useful format, first, let's define a handy namedtuple
:
因此,每个排序的第i个元素都包含一个列表,其中第i行的排序值沿着数据的第一轴向下。所以,让我们做一个最后的转换,让它变成一个更有用的格式,首先,让我们定义一个方便的命名元组:
>>> from collections import namedtuple
>>> Data = namedtuple('Data', 'origin value')
Now, finally:
>>> for r in range(len(data)):
... val = []
... orig = []
... for i in range(6):
... orig.append([sorts[i][j][r][0] for j in range(6)])
... val.append([sorts[i][j][r][1] for j in range(6)])
... ranked.append(Data(orig, val))
...
And now, check this:
现在,检查一下:
>>> pprint(ranked[0].value)
[[33, 21, 36, 40, 25, 21],
[24, 30, 36, 20, 43, 27],
[38, 26, 21, 22, 40, 22],
[20, 23, 28, 34, 34, 29],
[26, 30, 32, 20, 20, 25],
[23, 20, 29, 24, 28, 41]]
>>> pprint(ranked[0].origin)
[['p5', 'p4', 'p1', 'p1', 'p5', 'p6'],
['p9', 'p9', 'p4', 'p2', 'p2', 'p7'],
['p9', 'p3', 'p4', 'p5', 'p5', 'p3'],
['p5', 'p4', 'p8', 'p3', 'p7', 'p8'],
['p6', 'p9', 'p5', 'p8', 'p2', 'p8'],
['p4', 'p2', 'p9', 'p8', 'p6', 'p7']]
>>> pprint(ranked[-1].value)
[[82, 76, 86, 80, 80, 88],
[88, 79, 87, 78, 79, 73],
[85, 78, 86, 82, 88, 86],
[83, 82, 73, 84, 86, 87],
[75, 82, 86, 86, 88, 78],
[68, 87, 76, 86, 86, 79]]
>>> pprint(ranked[-1].origin)
[['p2', 'p9', 'p3', 'p6', 'p6', 'p4'],
['p3', 'p8', 'p1', 'p7', 'p4', 'p8'],
['p5', 'p6', 'p1', 'p4', 'p2', 'p6'],
['p8', 'p1', 'p2', 'p2', 'p8', 'p3'],
['p3', 'p1', 'p8', 'p9', 'p3', 'p5'],
['p7', 'p9', 'p4', 'p3', 'p5', 'p1']]
Whether it is performant enough for your use-case I don't know, but it sure was fun!
它是否足以满足您的用例我不知道,但它确实很有趣!
#2
1
If you have NumPy (you tagged it with arrays so I assume you already work with NumPy) you could use np.min
and np.argmin
to get the lowest and index of the lowest element:
如果你有NumPy(你用数组标记它,所以我假设你已经使用NumPy)你可以使用np.min和np.argmin来获得最低元素的最低和索引:
>>> import numpy as np
>>> p1 = [[1, 10],
... [1, 20]]
>>> p2 = [[2, 8],
... [4, 6]]
>>> p3 = [[0, 4],
... [17, 8]]
>>> np.min([p1, p2, p3], axis=0) # lowest value
array([[0, 4],
[1, 6]])
>>> np.argmin([p1, p2, p3], axis=0) # lowest plot number
array([[2, 2],
[0, 1]], dtype=int64)
In case you want to sort them you can use np.sort
and np.argsort
:
如果你想对它们进行排序,你可以使用np.sort和np.argsort:
>>> np.sort([p1, p2, p3], axis=0)
array([[[ 0, 4], # lowest
[ 1, 6]],
[[ 1, 8], # middle
[ 4, 8]],
[[ 2, 10], # highest
[17, 20]]])
>>> np.argsort([p1, p2, p3], axis=0)
array([[[2, 2],
[0, 1]],
[[0, 1],
[1, 2]],
[[1, 0],
[2, 0]]], dtype=int64)