为什么max比sort更慢?

时间:2021-07-01 21:20:58

I've found that max is slower than the sort function in Python 2 and 3.

我发现max比Python 2和3中的sort函数慢。

Python 2

Python 2

$ python -m timeit -s 'import random;a=range(10000);random.shuffle(a)' 'a.sort();a[-1]'
1000 loops, best of 3: 239 usec per loop
$ python -m timeit -s 'import random;a=range(10000);random.shuffle(a)' 'max(a)'        
1000 loops, best of 3: 342 usec per loop

Python 3

Python 3

$ python3 -m timeit -s 'import random;a=list(range(10000));random.shuffle(a)' 'a.sort();a[-1]'
1000 loops, best of 3: 252 usec per loop
$ python3 -m timeit -s 'import random;a=list(range(10000));random.shuffle(a)' 'max(a)'
1000 loops, best of 3: 371 usec per loop

Why is max (O(n)) slower than the sort function (O(nlogn))?

为什么max(O(n))比sort函数(O(nlogn))慢?

3 个解决方案

#1


122  

You have to be very careful when using the timeit module in Python.

在Python中使用timeit模块时必须非常小心。

python -m timeit -s 'import random;a=range(10000);random.shuffle(a)' 'a.sort();a[-1]'

Here the initialisation code runs once to produce a randomised array a. Then the rest of the code is run several times. The first time it sorts the array, but every other time you are calling the sort method on an already sorted array. Only the fastest time is returned, so you are actually timing how long it takes Python to sort an already sorted array.

这里初始化代码运行一次以产生随机数组a。然后其余代码运行几次。它第一次对数组进行排序,但是每隔一段时间就在已排序的数组上调用sort方法。只返回最快的时间,因此您实际上计算了Python对已排序的数组进行排序所需的时间。

Part of Python's sort algorithm is to detect when the array is already partly or completely sorted. When completely sorted it simply has to scan once through the array to detect this and then it stops.

Python的排序算法的一部分是检测阵列何时已经部分或完全排序。当完全排序时,它只需要扫描一次通过数组来检测它然后停止。

If instead you tried:

相反,你试过:

python -m timeit -s 'import random;a=range(100000);random.shuffle(a)' 'sorted(a)[-1]'

then the sort happens on every timing loop and you can see that the time for sorting an array is indeed much longer than to just find the maximum value.

然后排序发生在每个定时循环上,你可以看到排序数组的时间确实比找到最大值要长得多。

Edit: @skyking's answer explains the part I left unexplained: a.sort() knows it is working on a list so can directly access the elements. max(a) works on any arbitrary iterable so has to use generic iteration.

编辑:@ skyking的答案解释了我不明原因的部分:a.sort()知道它正在列表中工作,因此可以直接访问元素。 max(a)适用于任意迭代,因此必须使用泛型迭代。

#2


88  

First off, note that max() uses the iterator protocol, while list.sort() uses ad-hoc code. Clearly, using an iterator is an important overhead, that's why you are observing that difference in timings.

首先,请注意max()使用迭代器协议,而list.sort()使用ad-hoc代码。显然,使用迭代器是一个重要的开销,这就是为什么你要观察时间差异的原因。

However, apart from that, your tests are not fair. You are running a.sort() on the same list more than once. The algorithm used by Python is specifically designed to be fast for already (partially) sorted data. Your tests are saying that the algorithm is doing its job well.

但是,除此之外,您的测试不公平。您在同一个列表上多次运行a.sort()。 Python使用的算法专门针对已经(部分)排序的数据快速设计。您的测试表明该算法正在很好地完成其工作。

These are fair tests:

这些是公平的测试:

$ python3 -m timeit -s 'import random;a=list(range(10000));random.shuffle(a)' 'max(a[:])'
1000 loops, best of 3: 227 usec per loop
$ python3 -m timeit -s 'import random;a=list(range(10000));random.shuffle(a)' 'a[:].sort()'
100 loops, best of 3: 2.28 msec per loop

Here I'm creating a copy of the list every time. As you can see, the order of magnitude of the results are different: micro- vs milliseconds, as we would expect.

我每次都在创建列表的副本。如您所见,结果的数量级是不同的:微米与毫秒,正如我们所期望的那样。

And remember: big-Oh specifies an upper bound! The lower bound for Python's sorting algorithm is Ω(n). Being O(n log n) does not automatically imply that every run takes a time proportional to n log n. It does not even imply that it needs to be slower than a O(n) algorithm, but that's another story. What's important to understand is that in some favorable cases, an O(n log n) algorithm may run in O(n) time or less.

并记住:大哦指定上限! Python排序算法的下限是Ω(n)。作为O(n log n)并不会自动暗示每次运行都需要与n log n成比例的时间。它甚至不暗示它需要比O(n)算法慢,但这是另一个故事。重要的是要理解,在一些有利的情况下,O(n log n)算法可以在O(n)时间或更短的时间内运行。

#3


31  

This could be because l.sort is a member of list while max is a generic function. This means that l.sort can rely on the internal representation of list while max will have to go through generic iterator protocol.

这可能是因为l.sort是列表的成员,而max是通用函数。这意味着l.sort可以依赖于list的内部表示,而max则必须通过泛型迭代器协议。

This makes that each element fetch for l.sort is faster than each element fetch that max does.

这使得l.sort的每个元素获取都比max的每个元素获取快。

I assume that if you instead use sorted(a) you will get the result slower than max(a).

我假设如果你改为使用sorted(a),你会得到比max(a)慢的结果。

#1


122  

You have to be very careful when using the timeit module in Python.

在Python中使用timeit模块时必须非常小心。

python -m timeit -s 'import random;a=range(10000);random.shuffle(a)' 'a.sort();a[-1]'

Here the initialisation code runs once to produce a randomised array a. Then the rest of the code is run several times. The first time it sorts the array, but every other time you are calling the sort method on an already sorted array. Only the fastest time is returned, so you are actually timing how long it takes Python to sort an already sorted array.

这里初始化代码运行一次以产生随机数组a。然后其余代码运行几次。它第一次对数组进行排序,但是每隔一段时间就在已排序的数组上调用sort方法。只返回最快的时间,因此您实际上计算了Python对已排序的数组进行排序所需的时间。

Part of Python's sort algorithm is to detect when the array is already partly or completely sorted. When completely sorted it simply has to scan once through the array to detect this and then it stops.

Python的排序算法的一部分是检测阵列何时已经部分或完全排序。当完全排序时,它只需要扫描一次通过数组来检测它然后停止。

If instead you tried:

相反,你试过:

python -m timeit -s 'import random;a=range(100000);random.shuffle(a)' 'sorted(a)[-1]'

then the sort happens on every timing loop and you can see that the time for sorting an array is indeed much longer than to just find the maximum value.

然后排序发生在每个定时循环上,你可以看到排序数组的时间确实比找到最大值要长得多。

Edit: @skyking's answer explains the part I left unexplained: a.sort() knows it is working on a list so can directly access the elements. max(a) works on any arbitrary iterable so has to use generic iteration.

编辑:@ skyking的答案解释了我不明原因的部分:a.sort()知道它正在列表中工作,因此可以直接访问元素。 max(a)适用于任意迭代,因此必须使用泛型迭代。

#2


88  

First off, note that max() uses the iterator protocol, while list.sort() uses ad-hoc code. Clearly, using an iterator is an important overhead, that's why you are observing that difference in timings.

首先,请注意max()使用迭代器协议,而list.sort()使用ad-hoc代码。显然,使用迭代器是一个重要的开销,这就是为什么你要观察时间差异的原因。

However, apart from that, your tests are not fair. You are running a.sort() on the same list more than once. The algorithm used by Python is specifically designed to be fast for already (partially) sorted data. Your tests are saying that the algorithm is doing its job well.

但是,除此之外,您的测试不公平。您在同一个列表上多次运行a.sort()。 Python使用的算法专门针对已经(部分)排序的数据快速设计。您的测试表明该算法正在很好地完成其工作。

These are fair tests:

这些是公平的测试:

$ python3 -m timeit -s 'import random;a=list(range(10000));random.shuffle(a)' 'max(a[:])'
1000 loops, best of 3: 227 usec per loop
$ python3 -m timeit -s 'import random;a=list(range(10000));random.shuffle(a)' 'a[:].sort()'
100 loops, best of 3: 2.28 msec per loop

Here I'm creating a copy of the list every time. As you can see, the order of magnitude of the results are different: micro- vs milliseconds, as we would expect.

我每次都在创建列表的副本。如您所见,结果的数量级是不同的:微米与毫秒,正如我们所期望的那样。

And remember: big-Oh specifies an upper bound! The lower bound for Python's sorting algorithm is Ω(n). Being O(n log n) does not automatically imply that every run takes a time proportional to n log n. It does not even imply that it needs to be slower than a O(n) algorithm, but that's another story. What's important to understand is that in some favorable cases, an O(n log n) algorithm may run in O(n) time or less.

并记住:大哦指定上限! Python排序算法的下限是Ω(n)。作为O(n log n)并不会自动暗示每次运行都需要与n log n成比例的时间。它甚至不暗示它需要比O(n)算法慢,但这是另一个故事。重要的是要理解,在一些有利的情况下,O(n log n)算法可以在O(n)时间或更短的时间内运行。

#3


31  

This could be because l.sort is a member of list while max is a generic function. This means that l.sort can rely on the internal representation of list while max will have to go through generic iterator protocol.

这可能是因为l.sort是列表的成员,而max是通用函数。这意味着l.sort可以依赖于list的内部表示,而max则必须通过泛型迭代器协议。

This makes that each element fetch for l.sort is faster than each element fetch that max does.

这使得l.sort的每个元素获取都比max的每个元素获取快。

I assume that if you instead use sorted(a) you will get the result slower than max(a).

我假设如果你改为使用sorted(a),你会得到比max(a)慢的结果。