Python - 对列表列表中的元素进行排序

时间:2022-10-29 18:10:40

Apologies if this has been answered elsewhere; I've tried searching, but haven't found anything that answers my question (or perhaps I have, but didn't understand it)...


I'm fairly new to Python (v2.6.2) and have a list of lists containing floating point values which looks something like the following (except the full thing has 2+ million entries for each list):


cat = [[152.123, 150.456, 151.789, ...], [4.123, 3.456, 1.789, ...], [20.123, 22.456, 21.789, ...]]

Now what I would like to do is sort all 3 of the lists by ascending order of the elements of the 3rd list, such that I get:


cat_sorted = [[152.123, 151.789, 150.456, ...], [4.123, 1.789, 3.456, ...], [20.123, 21.789, 22.456, ...]]

I've tried a few things, but they don't give me what I'm looking for (or perhaps I'm using them incorrectly). Is there a way to do what I am looking for and if so, what's the easiest & quickest (considering I have 3 x 2million entries)? Is there a way of sorting one list using another?


6 个解决方案



This is going to be painful, but using default python you have 2 options:


  • decorate the 1st and 2nd lists with enumerate(), then sort these using the index to refer to values from the 3rd list:


    cat_sorted = [
        [e for i, e in sorted(enumerate(cat[0]), key=lambda p: cat[2][p[0]])],
        [e for i, e in sorted(enumerate(cat[1]), key=lambda p: cat[2][p[0]])],

    although it may help to sort cat[2] in-place instead of using sorted(); you cannot get around using sorted() for the other two.

    虽然它可能有助于对cat [2]进行原位排序而不是使用sorted();你不能使用sorted()来解决其他两个问题。

  • zip() the three lists together, then sort on the third element of this new list of lists, then zip() again to get back to the original structure:


    from operator import itemgetter
    cat_sorted = zip(*sorted(zip(*cat), key=itemgetter(2)))

Neither will be a performance buster, not with plain python lists of millions of numbers.




If you're willing to use an additional library, I suggest Python Pandas. It has a DataFrame object similar to R's data.frame and accepts a list of lists in the constructor, which will create a 3-column data array. Then you can easily use the built-in pandas.DataFrame.sort function to sort by the third column (ascending or descending).

如果你愿意使用额外的库,我建议使用Python Pandas。它有一个类似于R的data.frame的DataFrame对象,并接受构造函数中的列表列表,这将创建一个3列数据数组。然后,您可以轻松使用内置的pandas.DataFrame.sort函数按第三列(升序或降序)进行排序。

There are many plain Python ways to do this, but given the size of your problem, using the optimized functions in Pandas is a better approach. And if you need any kind of aggregated statistics from your sorted data, then Pandas is a no-brainer for this.




The general approach I would take was to do a schwartzian transform on the whole thing.


Zip the three lists together into a list of tuples.


Sort the tuples using the third element as key.


iterate over the newly sorted list of tuples and fill in the three lists again.




For the sake of completion, a solution using numpy:


import numpy as np

cat = [[152.123, 150.456, 151.789],
        [4.123, 3.456, 1.789],
        [20.123, 22.456, 21.789]]

cat = np.array(cat) 
cat_sorted = cat[:, cat[2].argsort()]

print cat_sorted
[[ 152.123  151.789  150.456]
 [   4.123    1.789    3.456]
 [  20.123   21.789   22.456]]



Here is another way to do it based on the great answers by Martijn Pieters and pcalcao

基于Martijn Pieters和pcalcao的伟大答案,这是另一种方法

def sort_by_last(ll):
        >>> sort_by_last([[10, 20, 30], [3, 2, 1]])
        [[30, 20, 10], [1, 2, 3]]

        >>> sort_by_last([[10, 20, 30], [40, 50, 60], [3, 2, 1]])
        [[30, 20, 10], [60, 50, 40], [1, 2, 3]]

        >>> sort_by_last([[10, 20, 30], [40, 50, 60], [1, 1, 1]])
        [[10, 20, 30], [40, 50, 60], [1, 1, 1]]

        >>> sort_by_last([[10, 20, 30], [40, 50, 60], [1, 3, 1]])
        [[10, 30, 20], [40, 60, 50], [1, 1, 3]]

        >>> sort_by_last([[152.123, 150.456, 151.789], [4.123, 3.456, 1.789], [20.123, 22.456, 21.789]])
        [[152.123, 151.789, 150.456], [4.123, 1.789, 3.456], [20.123, 21.789, 22.456]]
    return [sorted(x, key=lambda y: ll[-1][x.index(y)]) for x in ll]

The big string there is a docstring with doctest, to test the function copy it to a file and run it with python -m doctest -v <file>

大字符串有一个带doctest的文档字符串,用于测试函数将其复制到文件并使用python -m doctest -v 运行它



Here, keys is a sorted list of indices.


keys = sorted(range(len(cat[2])), key=cat[2].__getitem__)
cat_sorted = [[cat[i][k] for k in keys] for i in range(3)]



This is going to be painful, but using default python you have 2 options:


  • decorate the 1st and 2nd lists with enumerate(), then sort these using the index to refer to values from the 3rd list:


    cat_sorted = [
        [e for i, e in sorted(enumerate(cat[0]), key=lambda p: cat[2][p[0]])],
        [e for i, e in sorted(enumerate(cat[1]), key=lambda p: cat[2][p[0]])],

    although it may help to sort cat[2] in-place instead of using sorted(); you cannot get around using sorted() for the other two.

    虽然它可能有助于对cat [2]进行原位排序而不是使用sorted();你不能使用sorted()来解决其他两个问题。

  • zip() the three lists together, then sort on the third element of this new list of lists, then zip() again to get back to the original structure:


    from operator import itemgetter
    cat_sorted = zip(*sorted(zip(*cat), key=itemgetter(2)))

Neither will be a performance buster, not with plain python lists of millions of numbers.




If you're willing to use an additional library, I suggest Python Pandas. It has a DataFrame object similar to R's data.frame and accepts a list of lists in the constructor, which will create a 3-column data array. Then you can easily use the built-in pandas.DataFrame.sort function to sort by the third column (ascending or descending).

如果你愿意使用额外的库,我建议使用Python Pandas。它有一个类似于R的data.frame的DataFrame对象,并接受构造函数中的列表列表,这将创建一个3列数据数组。然后,您可以轻松使用内置的pandas.DataFrame.sort函数按第三列(升序或降序)进行排序。

There are many plain Python ways to do this, but given the size of your problem, using the optimized functions in Pandas is a better approach. And if you need any kind of aggregated statistics from your sorted data, then Pandas is a no-brainer for this.




The general approach I would take was to do a schwartzian transform on the whole thing.


Zip the three lists together into a list of tuples.


Sort the tuples using the third element as key.


iterate over the newly sorted list of tuples and fill in the three lists again.




For the sake of completion, a solution using numpy:


import numpy as np

cat = [[152.123, 150.456, 151.789],
        [4.123, 3.456, 1.789],
        [20.123, 22.456, 21.789]]

cat = np.array(cat) 
cat_sorted = cat[:, cat[2].argsort()]

print cat_sorted
[[ 152.123  151.789  150.456]
 [   4.123    1.789    3.456]
 [  20.123   21.789   22.456]]



Here is another way to do it based on the great answers by Martijn Pieters and pcalcao

基于Martijn Pieters和pcalcao的伟大答案,这是另一种方法

def sort_by_last(ll):
        >>> sort_by_last([[10, 20, 30], [3, 2, 1]])
        [[30, 20, 10], [1, 2, 3]]

        >>> sort_by_last([[10, 20, 30], [40, 50, 60], [3, 2, 1]])
        [[30, 20, 10], [60, 50, 40], [1, 2, 3]]

        >>> sort_by_last([[10, 20, 30], [40, 50, 60], [1, 1, 1]])
        [[10, 20, 30], [40, 50, 60], [1, 1, 1]]

        >>> sort_by_last([[10, 20, 30], [40, 50, 60], [1, 3, 1]])
        [[10, 30, 20], [40, 60, 50], [1, 1, 3]]

        >>> sort_by_last([[152.123, 150.456, 151.789], [4.123, 3.456, 1.789], [20.123, 22.456, 21.789]])
        [[152.123, 151.789, 150.456], [4.123, 1.789, 3.456], [20.123, 21.789, 22.456]]
    return [sorted(x, key=lambda y: ll[-1][x.index(y)]) for x in ll]

The big string there is a docstring with doctest, to test the function copy it to a file and run it with python -m doctest -v <file>

大字符串有一个带doctest的文档字符串,用于测试函数将其复制到文件并使用python -m doctest -v 运行它



Here, keys is a sorted list of indices.


keys = sorted(range(len(cat[2])), key=cat[2].__getitem__)
cat_sorted = [[cat[i][k] for k in keys] for i in range(3)]