x.sort
和sorted
函数中参数key
的使用
介绍
python中,列表
自带了排序函数sort
>>> l = [1, 3, 2]
>>> l.sort()
>>> l
[1, 2, 3]
对于其他字典
、元组
、集合
容器,可以使用内置方法sort
来做排序,注意返回的结果是列表结构
, 字典
容器,默认是key
进行排序的。
>>> # tuple sort
>>> t = (1, 3, 2)
>>> sorted(t)
[1, 2, 3]
>>>
>>> # set sort
>>> s = {1, 3, 2}
>>> sorted(s)
[1, 2, 3]
>>>
>>> # dict sort
>>> d = {1:100, 3:200, 2: 0}
>>> sorted(d)
[1, 2, 3]
>>> sorted(d.values())
[0, 100, 200]
>>> sorted(d.items())
[(1, 100), (2, 0), (3, 200)]
>>>
参数key的使用
先看一下sorted
函数的文档说明
>>> help(sorted)
Help on built-in function sorted in module builtins:
sorted(iterable, /, *, key=None, reverse=False)
Return a new list containing all items from the iterable in ascending order.
A custom key function can be supplied to customize the sort order, and the
reverse flag can be set to request the result in descending order.
参数key
是函数类型,用来支持自定义的排序方式。我们先看一个使用参数key
的场景,比如:有一组员工工资单
Name | salary | age |
---|---|---|
Tom | 4000 | 20 |
Jerry | 4000 | 24 |
Bob | 5000 | 28 |
希望可以按照工资从多到少排序,如果工资一样,按年龄从小到大排序
>>> salary_list = [
... ("Tom", 4000, 20),
... ("Jerry", 4000, 24),
... ("Bob", 5000, 28),
... ]
>>> # 自定义比较函数,返回值为元组(-salary, age)
>>> def mycmp(salary_item: tuple) -> (int, int):
... return (-salary_item[1], salary_item[2])
...
>>> sorted(salary_list, key=mycmp)
[('Bob', 5000, 28), ('Tom', 4000, 20), ('Jerry', 4000, 24)]
所以,为什么是这样呢?
我们来实现一个类CmpObject
,在它的一些魔法方法中加入调试信息,来看看sorted
函数是什么进行比较的
class CmpObject:
def __init__(self, name, val):
self.name = name
self.val = val
def __neg__(self):
print("called __neg__", self.name, self.val)
return CmpObject(self.name, -self.val)
def __eq__(self, other):
print("called __eq__")
return self.get_val() == other.get_val()
def __lt__(self, other):
print("called __lt__")
return self.get_val() < other.get_val()
def __gt__(self, other):
print("called __gt__")
return self.get_val() > other.get_val()
def get_val(self):
print("called get_val", self.name, self.val)
return self.val
def __repr__(self):
return f"{self.val}"
>>> # 初始化工资单
>>> salary_list = [
... ("Tom", CmpObject("Tom", 4000), CmpObject("Tom", 20)),
... ("Jerry", CmpObject("Jerry", 4000), CmpObject("Jerry", 24)),
... ("Bob", CmpObject("Bob", 5000), CmpObject("Bob", 28)),
... ]
>>> salary_list
[('Tom', 4000, 20), ('Jerry', 4000, 24), ('Bob', 5000, 28)]
>>> # 还是用原来的mycmp比较函数
>>> def mycmp(salary_item: tuple) -> (int, int):
... return (-salary_item[1], salary_item[2])
...
>>> # 执行排序
>>> sorted(salary_list, key=mycmp)
# 分析一下比较顺序
# sorted函数会先变量保存用于比较的key
called __neg__ Tom 4000
called __neg__ Jerry 4000
called __neg__ Bob 5000
# 新比较 -4000 -4000, 即Jerry和Tom的工资,因为是按工资倒序排(工资多的在前),所以比较的是工资的负数
called __eq__
called get_val Jerry -4000
called get_val Tom -4000
# 发现Jerry和Tom的工资相同,再去比较他们的年龄
called __eq__
called get_val Jerry 24
called get_val Tom 20
# 发现Jerry的年龄(24)大于Tom的年龄(20),Jerry要排在Tom后面
called __lt__
called get_val Jerry 24
called get_val Tom 20
# 然后比较 Bob和Jerry的工资,Bob的工资比Jerry的工资高,Jerry要排在Bob后面
called __eq__
called get_val Bob -5000
called get_val Jerry -4000
called __lt__
called get_val Bob -5000
called get_val Jerry -4000
# 又比较了 Bob和Jerry
called __eq__
called get_val Bob -5000
called get_val Jerry -4000
called __lt__
called get_val Bob -5000
called get_val Jerry -4000
# 最后比较Bob和Tom, 发现Bob的工资比Tom多,Tom要排在Bob后面
called __eq__
called get_val Bob -5000
called get_val Tom -4000
called __lt__
called get_val Bob -5000
called get_val Tom -4000
# 最后结果 Bob > Tom > Jerry
[('Bob', 5000, 28), ('Tom', 4000, 20), ('Jerry', 4000, 24)]
结论
1.sorted
函数的排序策略是对于比较用的key,如果是元组形式,从左到右权重递减,(-工资, 年龄)
,如果-工资
相同,则再比较年龄
。
2.比较是用使用__eq__
和__lt__
进行比较,如果__eq__
为真,再比较__lt__
,如果__lt__
为真,则要调换顺序;否则,不变。