Apologies if this has been answered elsewhere; I've tried searching, but haven't found anything that answers my question (or perhaps I have, but didn't understand it)...
如果在别处得到回答,请道歉;我试过搜索,但没找到任何能回答我问题的东西(或许我有,但不明白)......
I'm fairly new to Python (v2.6.2) and have a list of lists containing floating point values which looks something like the following (except the full thing has 2+ million entries for each list):
我是Python的新手(v2.6.2)并且有一个包含浮点值的列表列表,看起来类似于以下内容(除了完整的东西每个列表有2百万个条目):
cat = [[152.123, 150.456, 151.789, ...], [4.123, 3.456, 1.789, ...], [20.123, 22.456, 21.789, ...]]
Now what I would like to do is sort all 3 of the lists by ascending order of the elements of the 3rd list, such that I get:
现在我想要做的是按照第三个列表的元素的升序排序所有3个列表,这样我得到:
cat_sorted = [[152.123, 151.789, 150.456, ...], [4.123, 1.789, 3.456, ...], [20.123, 21.789, 22.456, ...]]
I've tried a few things, but they don't give me what I'm looking for (or perhaps I'm using them incorrectly). Is there a way to do what I am looking for and if so, what's the easiest & quickest (considering I have 3 x 2million entries)? Is there a way of sorting one list using another?
我尝试了一些东西,但他们没有给我我正在寻找的东西(或者我可能错误地使用它们)。有没有办法做我想要的东西,如果有的话,最简单和最快的是什么(考虑到我有3×2百万条款)?有没有办法用另一个列表排序一个列表?
6 个解决方案
#1
8
This is going to be painful, but using default python you have 2 options:
这将是痛苦的,但使用默认python你有2个选项:
-
decorate the 1st and 2nd lists with
enumerate()
, then sort these using the index to refer to values from the 3rd list:使用enumerate()装饰第一个和第二个列表,然后使用索引对这些列表进行排序以引用第三个列表中的值:
cat_sorted = [ [e for i, e in sorted(enumerate(cat[0]), key=lambda p: cat[2][p[0]])], [e for i, e in sorted(enumerate(cat[1]), key=lambda p: cat[2][p[0]])], sorted(cat[2]) ]
although it may help to sort
cat[2]
in-place instead of usingsorted()
; you cannot get around usingsorted()
for the other two.虽然它可能有助于对cat [2]进行原位排序而不是使用sorted();你不能使用sorted()来解决其他两个问题。
-
zip()
the three lists together, then sort on the third element of this new list of lists, thenzip()
again to get back to the original structure:zip()将三个列表放在一起,然后对这个新列表列表的第三个元素进行排序,然后再次压缩zip()以回到原始结构:
from operator import itemgetter cat_sorted = zip(*sorted(zip(*cat), key=itemgetter(2)))
Neither will be a performance buster, not with plain python lists of millions of numbers.
两者都不是性能破坏者,也不是数百万个数字的普通python列表。
#2
4
If you're willing to use an additional library, I suggest Python Pandas. It has a DataFrame object similar to R's data.frame
and accepts a list of lists in the constructor, which will create a 3-column data array. Then you can easily use the built-in pandas.DataFrame.sort
function to sort by the third column (ascending or descending).
如果你愿意使用额外的库,我建议使用Python Pandas。它有一个类似于R的data.frame的DataFrame对象,并接受构造函数中的列表列表,这将创建一个3列数据数组。然后,您可以轻松使用内置的pandas.DataFrame.sort函数按第三列(升序或降序)进行排序。
There are many plain Python ways to do this, but given the size of your problem, using the optimized functions in Pandas is a better approach. And if you need any kind of aggregated statistics from your sorted data, then Pandas is a no-brainer for this.
有许多简单的Python方法可以做到这一点,但考虑到问题的大小,使用Pandas中的优化函数是一种更好的方法。如果您需要从排序数据中获得任何类型的汇总统计数据,那么Pandas就是一个明智的选择。
#3
2
The general approach I would take was to do a schwartzian transform on the whole thing.
我将采取的一般方法是对整个事情进行schwartzian变换。
Zip the three lists together into a list of tuples.
将三个列表一起压缩成元组列表。
Sort the tuples using the third element as key.
使用第三个元素作为键对元组进行排序。
iterate over the newly sorted list of tuples and fill in the three lists again.
迭代新排序的元组列表并再次填写三个列表。
#4
1
For the sake of completion, a solution using numpy:
为了完成,使用numpy的解决方案:
import numpy as np
cat = [[152.123, 150.456, 151.789],
[4.123, 3.456, 1.789],
[20.123, 22.456, 21.789]]
cat = np.array(cat)
cat_sorted = cat[:, cat[2].argsort()]
print cat_sorted
[[ 152.123 151.789 150.456]
[ 4.123 1.789 3.456]
[ 20.123 21.789 22.456]]
#5
0
Here is another way to do it based on the great answers by Martijn Pieters and pcalcao
基于Martijn Pieters和pcalcao的伟大答案,这是另一种方法
def sort_by_last(ll):
"""
>>> sort_by_last([[10, 20, 30], [3, 2, 1]])
[[30, 20, 10], [1, 2, 3]]
>>> sort_by_last([[10, 20, 30], [40, 50, 60], [3, 2, 1]])
[[30, 20, 10], [60, 50, 40], [1, 2, 3]]
>>> sort_by_last([[10, 20, 30], [40, 50, 60], [1, 1, 1]])
[[10, 20, 30], [40, 50, 60], [1, 1, 1]]
>>> sort_by_last([[10, 20, 30], [40, 50, 60], [1, 3, 1]])
[[10, 30, 20], [40, 60, 50], [1, 1, 3]]
>>> sort_by_last([[152.123, 150.456, 151.789], [4.123, 3.456, 1.789], [20.123, 22.456, 21.789]])
[[152.123, 151.789, 150.456], [4.123, 1.789, 3.456], [20.123, 21.789, 22.456]]
"""
return [sorted(x, key=lambda y: ll[-1][x.index(y)]) for x in ll]
The big string there is a docstring with doctest, to test the function copy it to a file and run it with python -m doctest -v <file>
大字符串有一个带doctest的文档字符串,用于测试函数将其复制到文件并使用python -m doctest -v
#6
0
Here, keys
is a sorted list of indices.
这里,keys是索引的排序列表。
keys = sorted(range(len(cat[2])), key=cat[2].__getitem__)
cat_sorted = [[cat[i][k] for k in keys] for i in range(3)]
#1
8
This is going to be painful, but using default python you have 2 options:
这将是痛苦的,但使用默认python你有2个选项:
-
decorate the 1st and 2nd lists with
enumerate()
, then sort these using the index to refer to values from the 3rd list:使用enumerate()装饰第一个和第二个列表,然后使用索引对这些列表进行排序以引用第三个列表中的值:
cat_sorted = [ [e for i, e in sorted(enumerate(cat[0]), key=lambda p: cat[2][p[0]])], [e for i, e in sorted(enumerate(cat[1]), key=lambda p: cat[2][p[0]])], sorted(cat[2]) ]
although it may help to sort
cat[2]
in-place instead of usingsorted()
; you cannot get around usingsorted()
for the other two.虽然它可能有助于对cat [2]进行原位排序而不是使用sorted();你不能使用sorted()来解决其他两个问题。
-
zip()
the three lists together, then sort on the third element of this new list of lists, thenzip()
again to get back to the original structure:zip()将三个列表放在一起,然后对这个新列表列表的第三个元素进行排序,然后再次压缩zip()以回到原始结构:
from operator import itemgetter cat_sorted = zip(*sorted(zip(*cat), key=itemgetter(2)))
Neither will be a performance buster, not with plain python lists of millions of numbers.
两者都不是性能破坏者,也不是数百万个数字的普通python列表。
#2
4
If you're willing to use an additional library, I suggest Python Pandas. It has a DataFrame object similar to R's data.frame
and accepts a list of lists in the constructor, which will create a 3-column data array. Then you can easily use the built-in pandas.DataFrame.sort
function to sort by the third column (ascending or descending).
如果你愿意使用额外的库,我建议使用Python Pandas。它有一个类似于R的data.frame的DataFrame对象,并接受构造函数中的列表列表,这将创建一个3列数据数组。然后,您可以轻松使用内置的pandas.DataFrame.sort函数按第三列(升序或降序)进行排序。
There are many plain Python ways to do this, but given the size of your problem, using the optimized functions in Pandas is a better approach. And if you need any kind of aggregated statistics from your sorted data, then Pandas is a no-brainer for this.
有许多简单的Python方法可以做到这一点,但考虑到问题的大小,使用Pandas中的优化函数是一种更好的方法。如果您需要从排序数据中获得任何类型的汇总统计数据,那么Pandas就是一个明智的选择。
#3
2
The general approach I would take was to do a schwartzian transform on the whole thing.
我将采取的一般方法是对整个事情进行schwartzian变换。
Zip the three lists together into a list of tuples.
将三个列表一起压缩成元组列表。
Sort the tuples using the third element as key.
使用第三个元素作为键对元组进行排序。
iterate over the newly sorted list of tuples and fill in the three lists again.
迭代新排序的元组列表并再次填写三个列表。
#4
1
For the sake of completion, a solution using numpy:
为了完成,使用numpy的解决方案:
import numpy as np
cat = [[152.123, 150.456, 151.789],
[4.123, 3.456, 1.789],
[20.123, 22.456, 21.789]]
cat = np.array(cat)
cat_sorted = cat[:, cat[2].argsort()]
print cat_sorted
[[ 152.123 151.789 150.456]
[ 4.123 1.789 3.456]
[ 20.123 21.789 22.456]]
#5
0
Here is another way to do it based on the great answers by Martijn Pieters and pcalcao
基于Martijn Pieters和pcalcao的伟大答案,这是另一种方法
def sort_by_last(ll):
"""
>>> sort_by_last([[10, 20, 30], [3, 2, 1]])
[[30, 20, 10], [1, 2, 3]]
>>> sort_by_last([[10, 20, 30], [40, 50, 60], [3, 2, 1]])
[[30, 20, 10], [60, 50, 40], [1, 2, 3]]
>>> sort_by_last([[10, 20, 30], [40, 50, 60], [1, 1, 1]])
[[10, 20, 30], [40, 50, 60], [1, 1, 1]]
>>> sort_by_last([[10, 20, 30], [40, 50, 60], [1, 3, 1]])
[[10, 30, 20], [40, 60, 50], [1, 1, 3]]
>>> sort_by_last([[152.123, 150.456, 151.789], [4.123, 3.456, 1.789], [20.123, 22.456, 21.789]])
[[152.123, 151.789, 150.456], [4.123, 1.789, 3.456], [20.123, 21.789, 22.456]]
"""
return [sorted(x, key=lambda y: ll[-1][x.index(y)]) for x in ll]
The big string there is a docstring with doctest, to test the function copy it to a file and run it with python -m doctest -v <file>
大字符串有一个带doctest的文档字符串,用于测试函数将其复制到文件并使用python -m doctest -v
#6
0
Here, keys
is a sorted list of indices.
这里,keys是索引的排序列表。
keys = sorted(range(len(cat[2])), key=cat[2].__getitem__)
cat_sorted = [[cat[i][k] for k in keys] for i in range(3)]