声明
本文基于Python2.7语言,给出判断列表是否已排序的多种方法,并在作者的Windows XP主机(Pentium G630 2.7GHz主频2GB内存)上对比和分析其性能表现。
一. 问题提出
Haskell培训老师提出一个问题:如何判断列表是否已经排序?
排序与否实际只是相邻元素间的某种二元关系,即a->a->Bool。所以第一步可以把二元组列表找出来;第二步是把这个函数作用于每个元组,然后用and操作。老师给出的实现代码如下:
1
2
|
pair lst = zip lst ( tail lst )
sorted lst predict = and [ predict x y | (x,y) < - pair lst]
|
Haskell中,等号前面是函数的名称和参数,后面是函数的定义体。pair函数将列表lst错位一下(tail除去列表的第一个元素)后,和原列表在zip的作用下形成前后相邻元素二元组列表。predict函数接受两个数字,根据大小返回True或False。and对类型为[Bool]的列表中所有元素求与,其语义类似Python的all()函数。
随后,老师请大家思考性能问题。作者提出,若准确性要求不高,可生成三组随机数排序后作为下标,提取原列表相应的三组元素组成三个新的子列表("采样")。若判断三个子列表遵循同样的排序规则时,则认为原列表已排序。当列表很大且前段已排序时,选取适当数目的随机数,可在保障一定准确率的同时显著地降低运算规模。
此外,实际应用中还应考虑数据来源。例如,Python语言的os.listdir()方法在Windows系统中返回的列表条目满足字母序,在Linux系统中则返回乱序条目。因此,若判断系统平台(os.platform)为win32,则条目必然已经排序。
为对比验证随机采样方式的可行性,作者先在网上搜集判断列表排序的现有方法,主要参考Stack Overflow网站上"Pythonic way to check if a list is sorted or not"问题的答案,并在本文第二节给出相关的代码示例。注意,本文所述的"排序"不要求严格排序,即相邻元素允许相等。例如,[1,2,2,3]视为升序,[3,3,2,2]视为降序。
二. 代码实现
本节判断列表排序的函数名格式为IsListSorted_XXX()。为简洁起见,除代码片段及其输出外,一律以_XXX()指代。
2.1 guess
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
def IsListSorted_guess(lst):
listLen = len (lst)
if listLen < = 1 :
return True
#由首个元素和末尾元素猜测可能的排序规则
if lst[ 0 ] = = lst[ - 1 ]: #列表元素相同
for elem in lst:
if elem ! = lst[ 0 ]: return False
elif lst[ 0 ] < lst[ - 1 ]: #列表元素升序
for i, elem in enumerate (lst[ 1 :]):
if elem < lst[i]: return False
else : #列表元素降序
for i, elem in enumerate (lst[ 1 :]):
if elem > lst[i]: return False
return True
|
_guess()是最通用的实现,几乎与语言无关。值得注意的是,该函数内会猜测给定列表可能的排序规则,因此无需外部调用者指明排序规则。
2.2 sorted
1
2
|
def IsListSorted_sorted(lst):
return sorted (lst) = = lst or sorted (lst, reverse = True ) = = lst
|
_sorted()使用Python内置函数sorted()。由于sorted()会对未排序的列表排序,_sorted()函数主要适用于已排序列表。
若想判断列表未排序后再对其排序,不如直接调用列表的sort()方法,因为该方法内部会判断列表是否排序。对于已排序列表,该方法的时间复杂度为线性阶O(n)——判断为O(n)而排序为O(nlgn)。
2.3 for-loop
1
2
3
4
5
|
def IsListSorted_forloop(lst, key = lambda x, y: x < = y):
for i, elem in enumerate (lst[ 1 :]): #注意,enumerate默认迭代下标从0开始
if not key(lst[i], elem): #if elem > lst[i]更快,但通用性差
return False
return True
|
无论列表是否已排序,本函数的时间复杂度均为线性阶O(n)。注意,参数key表明缺省的排序规则为升序。
2.4 all
1
2
3
4
5
6
7
8
9
10
11
12
|
def IsListSorted_allenumk(lst, key = lambda x, y: x < = y):
return all (key(lst[i], elem) for i, elem in enumerate (lst[ 1 :]))
import operator
def IsListSorted_allenumo(lst, oCmp = operator.le):
return all (oCmp(lst[i], elem) for i, elem in enumerate (lst[ 1 :]))
def IsListSorted_allenumd(lst):
return all ((lst[i] < = elem) for i, elem in enumerate (lst[ 1 :]))
def IsListSorted_allxran(lst, key = lambda x,y: x < = y):
return all (key(lst[i],lst[i + 1 ]) for i in xrange ( len (lst) - 1 ))
def IsListSorted_allzip(lst, key = lambda x,y: x < = y):
from itertools import izip #Python 3中zip返回生成器(generator),而izip被废弃
return all (key(a, b) for (a, b) in izip(lst[: - 1 ],lst[ 1 :]))
|
lambda表达式与operator运算符速度相当,前者简单灵活,后者略为高效(实测并不一定)。但两者速度均不如列表元素直接比较(可能存在调用开销)。亦即,_allenumd()快于_allenumo()快于_allenumk()。
若使用lambda表达式指示排序规则,更改规则时只需要改变x和y之间的比较运算符;若使用operator模块指示排序规则,更改规则时需要改变对象比较方法。具体地,lt(x, y)等效于x < y,le(x, y)等效于x <= y,eq(x, y)等效于x == y,ne(x, y)等效于x != y,gt(x, y)等效于x > y,ge(x, y)等效于x >= y。例如,_allenumo()函数若要严格升序可设置oCmp=operator.lt。
此外,由all()函数的帮助信息可知,_allenumk()其实是_forloop()的等效形式。
2.5 numpy
1
2
3
4
5
6
7
8
|
def IsListSorted_numpy(arr, key = lambda dif: dif > = 0 ):
import numpy
try :
if arr.dtype.kind = = 'u' : #无符号整数数组执行np.diff时存在underflow风险
arr = numpy.int64(lst)
except AttributeError:
pass #无dtype属性,非数组
return (key(numpy.diff(arr))). all () #numpy.diff(x)返回相邻数组元素的差值构成的数组
|
NumPy是用于科学计算的Python基础包,可存储和处理大型矩阵。它包含一个强大的N维数组对象,比Python自身的嵌套列表结构(nested list structure)高效得多。第三节的实测数据表明,_numpy()处理大型列表时性能非常出色。
在Windows系统中可通过pip install numpy命令安装NumPy包,不建议登录官网下载文件自行安装。
2.6 reduce
1
2
3
|
def IsListSorted_reduce(iterable, key = lambda x, y: x < = y):
cmpFunc = lambda x, y: y if key(x, y) else float ( 'inf' )
return reduce (cmpFunc, iterable, . 0 ) < float ( 'inf' )
|
reduce实现是all实现的变体。累加器(accumulator)中仅存储最后一个检查的列表元素,或者Infinity(若任一元素小于前个元素值)。
前面2.1~2.5小节涉及下标操作的函数适用于列表等可迭代对象(Iterable)。对于通用迭代器(Iterator)对象,即可以作用于next()函数或方法的对象,可使用_reduce()及后面除_rand()外各小节的函数。迭代器的计算是惰性的,只有在需要返回下一个数据时才会计算,以避免不必要的计算。而且,迭代器方式无需像列表那样切片为两个迭代对象。
2.7 imap
1
2
3
4
5
|
def IsListSorted_itermap(iterable, key = lambda x, y: x < = y):
from itertools import imap, tee
a, b = tee(iterable) #为单个iterable创建两个独立的iterator
next (b, None )
return all (imap(key, a, b))
|
2.8 izip
1
2
3
4
5
6
7
8
9
10
11
12
|
def IsListSorted_iterzip(iterable, key = lambda x, y: x < = y):
from itertools import tee, izip
a, b = tee(iterable)
next (b, None )
return all (key(x, y) for x, y in izip(a, b))
def pairwise(iterable):
from itertools import tee, izip
a, b = tee(iterable)
next (b, None )
return izip(a, b) #"s -> (s0,s1), (s1,s2), (s2, s3), ..."
def IsListSorted_iterzipf(iterable, key = lambda x, y: x < = y):
return all (key(a, b) for a, b in pairwise(iterable))
|
第三节的实测数据表明,虽然存在外部函数调用,_iterzipf()却比_iterzip()略为高效。
2.9 fast
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
def IsListSorted_fastd(lst):
it = iter (lst)
try :
prev = it. next ()
except StopIteration:
return True
for cur in it:
if prev > cur:
return False
prev = cur
return True
def IsListSorted_fastk(lst, key = lambda x, y: x < = y):
it = iter (lst)
try :
prev = it. next ()
except StopIteration:
return True
for cur in it:
if not key(prev, cur):
return False
prev = cur
return True
|
_fastd()和_fastk()是Stack Overflow网站回答里据称执行最快的。实测数据表明,在列表未排序时,它们的性能表现确实优异。
2.10 random
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
import random
def IsListSorted_rand(lst, randNum = 3 , randLen = 100 ):
listLen = len (lst)
if listLen < = 1 :
return True
#由首个元素和末尾元素猜测可能的排序规则
if lst[ 0 ] < lst[ - 1 ]: #列表元素升序
key = lambda dif: dif > = 0
else : #列表元素降序或相等
key = lambda dif: dif < = 0
threshold, sortedFlag = 10000 , True
import numpy
if listLen < = threshold or listLen < = randLen * 2 or not randNum:
return (key(numpy.diff(numpy.array(lst)))). all ()
from random import sample
for i in range (randNum):
sortedRandList = sorted (sample( xrange (listLen), randLen))
flag = (key(numpy.diff(numpy.array([lst[x] for x in sortedRandList])))). all ()
sortedFlag = sortedFlag and flag
return sortedFlag
|
_rand()借助随机采样降低运算规模,并融入其他判断函数的优点。例如,猜测列表可能的排序规则,并在随机采样不适合时使用相对快速的判断方式,如NumPy。
通过line_profiler分析可知,第20行和第21行与randLen有关,但两者耗时接近。因此randLen应小于listLen的一半,以抵消sorted开销。除内部限制外,用户可以调节随机序列个数和长度,如定制单个但较长的序列。
注意,_rand()不适用于存在微量异常数据的长列表。因为这些数据很可能被随机采样遗漏,从而影响判断结果的准确性。
三. 性能分析
本节借助Python标准库random模块,生成各种随机列表,以对比和分析上节列表排序判断函数的性能。
可通过sample()、randint()等方法生成随机列表。例如:
1
2
3
4
5
6
7
8
9
10
11
|
>>> import random
>>> random.sample( range ( 10 ), 10 ); random.sample( range ( 10 ), 5 )
[ 9 , 1 , 6 , 3 , 0 , 8 , 4 , 2 , 7 , 5 ]
[ 1 , 2 , 5 , 6 , 0 ]
>>> rand = [random.randint( 1 , 10 ) for i in range ( 10 )]; rand
[ 7 , 3 , 7 , 5 , 7 , 2 , 4 , 4 , 9 , 8 ]
>>> random.sample(rand, 5 ); random.sample(rand, 5 )
[ 4 , 7 , 7 , 9 , 8 ]
[ 3 , 9 , 2 , 5 , 7 ]
>>> randGen = (random.randint( 1 , 10 ) for i in range ( 10 )); randGen
<generator object <genexpr> at 0x0192C878 >
|
sample()方法从列表中随机选择数字,结合range()函数可生产不含重复元素的随机列表;而randint()方法结合列表解析生成的随机列表可能包含重复元素。Python文档规定,首次导入random模块时使用当前系统时间作为种子初始化随机数生成器。因此,本文并未显式地调用seed()方法设置种子。
为度量性能表现,定义如下计时装饰器:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
from time import clock
TimeList = []
def FuncTimer(repeats = 1000 ):
def decorator(func):
def wrapper( * args, * * kwargs):
try :
startTime = clock()
for i in xrange (repeats):
ret = func( * args, * * kwargs)
except Exception, e:
print '%s() Error: %s!' % (func.__name__, e)
ret = None
finally : #当目标函数发生异常时,仍旧输出计时信息
endTime = clock()
timeElasped = (endTime - startTime) * 1000.0
msg = '%21s(): %s =>Time Elasped: %.3f msec, repeated %d time(s).' \
% (func.__name__, ret, timeElasped, repeats)
global TimeList; TimeList.append([timeElasped, msg])
return ret
return wrapper
return decorator
def ReportTimer():
global TimeList; TimeList.sort(key = lambda x:x[ 0 ])
for entry in TimeList:
print entry[ 1 ]
TimeList = [] #Flush
|
该装饰器允许对输出进行排序,以便更直观地观察性能。基于该装饰器,下文将分别测试不同的排序场景。注意,第二节各函数头部需添加FuncTimer()装饰。
3.1 列表前段乱序
测试代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
def TestHeadUnorderedList():
TEST_NAME = 'HeadUnorderedList' ; scale = int ( 1e5 )
List = random.sample( xrange (scale), scale) + range (scale)
print 'Test 1: %s, list len: %d' % (TEST_NAME, len ( List ))
IsListSorted_guess( List )
IsListSorted_sorted( List )
IsListSorted_allenumk( List )
IsListSorted_allenumo( List )
IsListSorted_allenumd( List )
IsListSorted_allxran( List )
IsListSorted_allzip( List )
IsListSorted_forloop( List )
IsListSorted_itermap( List )
IsListSorted_iterzipf( List )
IsListSorted_iterzip( List )
IsListSorted_reduce( List )
IsListSorted_numpy(numpy.array( List )) #若不先转换为数组,则耗时骤增
IsListSorted_fastd( List )
IsListSorted_fastk( List )
ReportTimer()
|
运行输出如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
Test 1 : HeadUnorderedList, list len : 200
IsListSorted_fastd(): False = >Time Elasped: 0.757 msec, repeated 1000 time(s).
IsListSorted_fastk(): False = >Time Elasped: 1.091 msec, repeated 1000 time(s).
IsListSorted_forloop(): False = >Time Elasped: 2.080 msec, repeated 1000 time(s).
IsListSorted_guess(): False = >Time Elasped: 2.123 msec, repeated 1000 time(s).
IsListSorted_allxran(): False = >Time Elasped: 2.255 msec, repeated 1000 time(s).
IsListSorted_allenumd(): False = >Time Elasped: 2.672 msec, repeated 1000 time(s).
IsListSorted_allenumo(): False = >Time Elasped: 3.021 msec, repeated 1000 time(s).
IsListSorted_allenumk(): False = >Time Elasped: 3.207 msec, repeated 1000 time(s).
IsListSorted_itermap(): False = >Time Elasped: 5.845 msec, repeated 1000 time(s).
IsListSorted_allzip(): False = >Time Elasped: 7.793 msec, repeated 1000 time(s).
IsListSorted_iterzip(): False = >Time Elasped: 9.667 msec, repeated 1000 time(s).
IsListSorted_iterzipf(): False = >Time Elasped: 9.969 msec, repeated 1000 time(s).
IsListSorted_numpy(): False = >Time Elasped: 16.203 msec, repeated 1000 time(s).
IsListSorted_sorted(): False = >Time Elasped: 45.742 msec, repeated 1000 time(s).
IsListSorted_reduce(): False = >Time Elasped: 145.447 msec, repeated 1000 time(s).
Test 1 : HeadUnorderedList, list len : 200000
IsListSorted_fastd(): False = >Time Elasped: 0.717 msec, repeated 1000 time(s).
IsListSorted_fastk(): False = >Time Elasped: 0.876 msec, repeated 1000 time(s).
IsListSorted_allxran(): False = >Time Elasped: 2.104 msec, repeated 1000 time(s).
IsListSorted_itermap(): False = >Time Elasped: 6.062 msec, repeated 1000 time(s).
IsListSorted_iterzip(): False = >Time Elasped: 7.244 msec, repeated 1000 time(s).
IsListSorted_iterzipf(): False = >Time Elasped: 8.491 msec, repeated 1000 time(s).
IsListSorted_numpy(): False = >Time Elasped: 801.916 msec, repeated 1000 time(s).
IsListSorted_forloop(): False = >Time Elasped: 2924.755 msec, repeated 1000 time(s).
IsListSorted_guess(): False = >Time Elasped: 2991.756 msec, repeated 1000 time(s).
IsListSorted_allenumo(): False = >Time Elasped: 3025.864 msec, repeated 1000 time(s).
IsListSorted_allenumk(): False = >Time Elasped: 3062.792 msec, repeated 1000 time(s).
IsListSorted_allenumd(): False = >Time Elasped: 3190.896 msec, repeated 1000 time(s).
IsListSorted_allzip(): False = >Time Elasped: 6586.183 msec, repeated 1000 time(s).
IsListSorted_sorted(): False = >Time Elasped: 119974.955 msec, repeated 1000 time(s).
IsListSorted_reduce(): False = >Time Elasped: 154747.659 msec, repeated 1000 time(s).
|
可见,对于前段乱序的列表,无论其长短_fastd()和_fastk()的表现均为最佳。对于未排序列表,_sorted()需要进行排序,故性能非常差。然而,_reduce()性能最差。
实际上除_guess()和_sorted()外,其他函数都按升序检查列表。为安全起见,可仿照_guess()实现,先猜测排序方式,再进一步检查。
因为短列表耗时差异大多可以忽略,后续测试将不再包含短列表(但作者确实测试过),仅关注长列表。除非特别说明,列表长度为10万级,重复计时1000次。
3.2 列表中段乱序
测试代码及结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
def TestMiddUnorderedList():
TEST_NAME = 'MiddUnorderedList' ; scale = int ( 1e5 )
List = range (scale) + random.sample( xrange (scale), scale) + range (scale)
print 'Test 2: %s, list len: %d' % (TEST_NAME, len ( List ))
IsListSorted_numpy(numpy.array( List )) # 1572.295 msec
IsListSorted_guess( List ) # 14540.637 msec
IsListSorted_itermap( List ) # 21013.096 msec
IsListSorted_fastk( List ) # 23840.582 msec
IsListSorted_allxran( List ) # 31014.215 msec
IsListSorted_iterzip( List ) # 33386.059 msec
IsListSorted_forloop( List ) # 34228.006 msec
IsListSorted_allenumk( List ) # 34416.802 msec
IsListSorted_allzip( List ) # 42370.528 msec
IsListSorted_sorted( List ) # 142592.756 msec
IsListSorted_reduce( List ) # 187514.967 msec
ReportTimer()
|
为节省篇幅,已根据运行输出调整函数的调用顺序。下文也作如此处理。
3.3 列表后段乱序
测试代码及结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
def TestTailUnorderedList():
TEST_NAME = 'TailUnorderedList' ; scale = int ( 1e5 )
List = range (scale, 0 , - 1 ) + random.sample( xrange (scale), scale)
print 'Test 3: %s, list len: %d' % (TEST_NAME, len ( List ))
IsListSorted_numpy(numpy.array( List ), key = lambda dif: dif < = 0 ) # 980.789 msec
IsListSorted_guess( List ) # 13273.862 msec
IsListSorted_itermap( List , key = lambda x, y: x > = y) # 21603.315 msec
IsListSorted_fastk( List , key = lambda x, y: x > = y) # 24183.548 msec
IsListSorted_allxran( List , key = lambda x, y: x > = y) # 32850.254 msec
IsListSorted_forloop( List , key = lambda x, y: x > = y) # 33918.848 msec
IsListSorted_iterzip( List , key = lambda x, y: x > = y) # 34505.809 msec
IsListSorted_allenumk( List , key = lambda x, y: x > = y) # 35631.625 msec
IsListSorted_allzip( List , key = lambda x, y: x > = y) # 40076.330 msec
ReportTimer()
|
3.4 列表完全乱序
测试代码及结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
def TestUnorderedList():
TEST_NAME = 'UnorderedList' ; scale = int ( 1e5 )
List = random.sample( xrange (scale), scale)
print 'Test 4: %s, list len: %d' % (TEST_NAME, len ( List ))
IsListSorted_fastk( List ) # 0.856 msec
IsListSorted_allxran( List ) # 3.438 msec
IsListSorted_iterzip( List ) # 7.233 msec
IsListSorted_itermap( List ) # 7.595 msec
IsListSorted_numpy(numpy.array( List )) # 207.222 msec
IsListSorted_allenumk( List ) # 953.423 msec
IsListSorted_guess( List ) # 1023.575 msec
IsListSorted_forloop( List ) # 1076.986 msec
IsListSorted_allzip( List ) # 2062.821 msec
ReportTimer()
|
3.5 列表元素相同
测试代码及结果如下:
1
|
```python def TestSameElemList(): TEST_NAME = 'SameElemList' ; scale = int ( 1e5 ) List = [ 5 ] * scale print 'Test 5: %s, list len: %d' % (TEST_NAME, len ( List )) IsListSorted_numpy(numpy.array( List )) # 209.324 msec IsListSorted_sorted(List) # 2760.139 msec IsListSorted_guess(List) # 5843.942 msec IsListSorted_itermap(List) # 20609.704 msec IsListSorted_fastk(List) # 23035.760 msec IsListSorted_forloop(List) # 29043.206 msec IsListSorted_allenumk(List) # 29553.716 msec IsListSorted_allxran(List) # 30348.549 msec IsListSorted_iterzip(List) # 32806.217 msec ReportTimer()
|
3.6 列表升序
测试代码及结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
def TestAscendingList():
TEST_NAME = 'AscendingList' ; scale = int ( 1e5 )
List = range (scale)
print 'Test 6: %s, list len: %d' % (TEST_NAME, len ( List ))
IsListSorted_numpy(numpy.array( List )) # 209.217 msec
IsListSorted_sorted( List ) # 2845.166 msec
IsListSorted_fastd( List ) # 5977.520 msec
IsListSorted_guess( List ) # 10408.204 msec
IsListSorted_allenumd( List ) # 15812.754 msec
IsListSorted_itermap( List ) # 21244.476 msec
IsListSorted_fastk( List ) # 23900.196 msec
IsListSorted_allenumo( List ) # 28607.724 msec
IsListSorted_forloop( List ) # 30075.538 msec
IsListSorted_allenumk( List ) # 30274.017 msec
IsListSorted_allxran( List ) # 31126.404 msec
IsListSorted_reduce( List ) # 32940.108 msec
IsListSorted_iterzip( List ) # 34188.312 msec
IsListSorted_iterzipf( List ) # 34425.097 msec
IsListSorted_allzip( List ) # 37967.447 msec
ReportTimer()
|
可见,列表已排序时,_sorted()的性能较占优势。
3.7 列表降序
剔除不支持降序的函数,测试代码及结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
def TestDescendingList():
TEST_NAME = 'DescendingList' ; scale = int ( 1e2 )
List = range (scale, 0 , - 1 )
print 'Test 7: %s, list len: %d' % (TEST_NAME, len ( List ))
IsListSorted_numpy(numpy.array( List ), key = lambda dif: dif < = 0 ) # 209.318 msec
IsListSorted_sorted( List ) # 5707.067 msec
IsListSorted_guess( List ) # 10549.928 msec
IsListSorted_itermap( List , key = lambda x, y: x > = y) # 21529.547 msec
IsListSorted_fastk( List , key = lambda x, y: x > = y) # 24264.465 msec
import operator; IsListSorted_allenumo( List , oCmp = operator.ge) # 28093.035 msec
IsListSorted_forloop( List , key = lambda x, y: x > = y) # 30745.943 msec
IsListSorted_allenumk( List , key = lambda x, y: x > = y) # 32696.205 msec
IsListSorted_allxran( List , key = lambda x, y: x > = y) # 33431.473 msec
IsListSorted_allzip( List , key = lambda x, y: x > = y) # 34837.019 msec
IsListSorted_iterzip( List , key = lambda x, y: x > = y) # 35237.475 msec
IsListSorted_reduce( List , key = lambda x, y: x > = y) # 37035.270 msec
ReportTimer()
|
3.8 迭代器测试
参数为列表的函数,需要先将列表���过iter()函数转换为迭代器。注意,当iterable参数为iterator时,只能计时一次,因为该次执行将用尽迭代器。
测试代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
def TestIter():
TEST_NAME = 'Iter' ; scale = int ( 1e7 )
List = range (scale) #升序
# List = random.sample(xrange(scale), scale) #乱序
print 'Test 8: %s, list len: %d' % (TEST_NAME, len ( List ))
iterL = iter ( List ); IsListSorted_guess( list (iterL))
iterL = iter ( List ); IsListSorted_sorted(iterL)
iterL = iter ( List ); IsListSorted_itermap(iterL)
iterL = iter ( List ); IsListSorted_iterzipf(iterL)
iterL = iter ( List ); IsListSorted_iterzip(iterL)
iterL = iter ( List ); IsListSorted_reduce(iterL)
iterL = iter ( List ); IsListSorted_fastd(iterL)
iterL = iter ( List ); IsListSorted_fastk(iterL, key = lambda x, y: x > = y)
ReportTimer()
|
运行结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
Test 8 : Iter , list len : 10000000 - - - 升序
IsListSorted_fastd(): True = >Time Elasped: 611.028 msec, repeated 1 time(s).
IsListSorted_sorted(): False = >Time Elasped: 721.751 msec, repeated 1 time(s).
IsListSorted_guess(): True = >Time Elasped: 1142.065 msec, repeated 1 time(s).
IsListSorted_itermap(): True = >Time Elasped: 2097.696 msec, repeated 1 time(s).
IsListSorted_fastk(): True = >Time Elasped: 2337.233 msec, repeated 1 time(s).
IsListSorted_reduce(): True = >Time Elasped: 3307.361 msec, repeated 1 time(s).
IsListSorted_iterzipf(): True = >Time Elasped: 3354.034 msec, repeated 1 time(s).
IsListSorted_iterzip(): True = >Time Elasped: 3442.520 msec, repeated 1 time(s).
Test 8 : Iter , list len : 10000000 - - - 乱序
IsListSorted_fastk(): False = >Time Elasped: 0.004 msec, repeated 1 time(s).
IsListSorted_fastd(): False = >Time Elasped: 0.010 msec, repeated 1 time(s).
IsListSorted_iterzip(): False = >Time Elasped: 0.015 msec, repeated 1 time(s).
IsListSorted_itermap(): False = >Time Elasped: 0.055 msec, repeated 1 time(s).
IsListSorted_iterzipf(): False = >Time Elasped: 0.062 msec, repeated 1 time(s).
IsListSorted_guess(): False = >Time Elasped: 736.810 msec, repeated 1 time(s).
IsListSorted_reduce(): False = >Time Elasped: 8919.611 msec, repeated 1 time(s).
IsListSorted_sorted(): False = >Time Elasped: 12273.018 msec, repeated 1 time(s).
|
其中,_itermap()、_iterzip()、_iterzipf()、_reduce()、_fastd()、_fastk()可直接判断迭代器是否已排序。其他函数需将迭代器转换为列表后才能处理。_sorted()虽然接受迭代器参数,但sorted()返回列表,故无法正确判断迭代器顺序。
3.9 随机采样测试
综合以上测试,可知_fastk()和_numpy()性能较为突出,而且_rand()内置numpy方式。因此,以_fastk()和_numpy()为参照对象,测试代码如下(计时1次):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
|
def TestRandList():
scale = int ( 1e6 )
List = random.sample( xrange (scale), scale) + range (scale)
print 'Test 1: %s, list len: %d' % ( 'HeadUnorderedList' , len ( List ))
IsListSorted_fastk( List )
IsListSorted_numpy(numpy.array( List ))
IsListSorted_rand( List , randNum = 1 )
ReportTimer()
List = range (scale) + random.sample( xrange (scale), scale) + range (scale)
print 'Test 2: %s, list len: %d' % ( 'MiddUnorderedList' , len ( List ))
IsListSorted_fastk( List )
IsListSorted_numpy(numpy.array( List ))
IsListSorted_rand( List , randNum = 1 )
ReportTimer()
List = range (scale, 0 , - 1 ) + random.sample( xrange (scale), scale)
print 'Test 3: %s, list len: %d' % ( 'TailUnorderedList' , len ( List ))
IsListSorted_fastk( List , key = lambda x, y: x > = y)
IsListSorted_numpy(numpy.array( List ), key = lambda dif: dif < = 0 )
IsListSorted_rand( List , randNum = 1 )
ReportTimer()
List = [random.randint( 1 ,scale) for i in xrange (scale)] #random.sample(xrange(scale), scale)
print 'Test 4: %s, list len: %d' % ( 'UnorderedList' , len ( List ))
IsListSorted_fastk( List )
IsListSorted_numpy(numpy.array( List ))
IsListSorted_rand( List , randNum = 1 )
ReportTimer()
List = [ 5 ] * scale
print 'Test 5: %s, list len: %d' % ( 'SameElemList' , len ( List ))
IsListSorted_fastk( List )
IsListSorted_numpy(numpy.array( List ))
IsListSorted_rand( List , randNum = 1 )
ReportTimer()
List = range (scale)
print 'Test 6: %s, list len: %d' % ( 'AscendingList' , len ( List ))
IsListSorted_fastk( List )
IsListSorted_numpy(numpy.array( List ))
IsListSorted_rand( List , randNum = 1 )
ReportTimer()
List = range (scale, 0 , - 1 )
print 'Test 7: %s, list len: %d' % ( 'DescendingList' , len ( List ))
IsListSorted_fastk( List , key = lambda x, y: x > = y)
IsListSorted_numpy(numpy.array( List ), key = lambda dif: dif < = 0 )
IsListSorted_rand( List , randNum = 1 )
ReportTimer()
List = range (scale, 0 , - 1 ); List [scale / 2 ] = 0
print 'Test 8: %s, list len: %d' % ( 'MiddleNotchList' , len ( List ))
IsListSorted_fastk( List , key = lambda x, y: x > = y)
IsListSorted_numpy(numpy.array( List ), key = lambda dif: dif < = 0 )
IsListSorted_rand( List , randNum = 1 )
IsListSorted_rand( List , randNum = 1 , randLen = scale / 2 )
ReportTimer()
|
运行输出如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
Test 1 : HeadUnorderedList, list len : 2000000
IsListSorted_fastk(): False = >Time Elasped: 0.095 msec, repeated 1 time(s).
IsListSorted_rand(): False = >Time Elasped: 0.347 msec, repeated 1 time(s).
IsListSorted_numpy(): False = >Time Elasped: 7.893 msec, repeated 1 time(s).
Test 2 : MiddUnorderedList, list len : 3000000
IsListSorted_rand(): False = >Time Elasped: 0.427 msec, repeated 1 time(s).
IsListSorted_numpy(): False = >Time Elasped: 11.868 msec, repeated 1 time(s).
IsListSorted_fastk(): False = >Time Elasped: 210.842 msec, repeated 1 time(s).
Test 3 : TailUnorderedList, list len : 2000000
IsListSorted_rand(): False = >Time Elasped: 0.355 msec, repeated 1 time(s).
IsListSorted_numpy(): False = >Time Elasped: 7.548 msec, repeated 1 time(s).
IsListSorted_fastk(): False = >Time Elasped: 280.416 msec, repeated 1 time(s).
Test 4 : UnorderedList, list len : 1000000
IsListSorted_fastk(): False = >Time Elasped: 0.074 msec, repeated 1 time(s).
IsListSorted_rand(): False = >Time Elasped: 0.388 msec, repeated 1 time(s).
IsListSorted_numpy(): False = >Time Elasped: 3.757 msec, repeated 1 time(s).
Test 5 : SameElemList, list len : 1000000
IsListSorted_rand(): True = >Time Elasped: 0.304 msec, repeated 1 time(s).
IsListSorted_numpy(): True = >Time Elasped: 3.955 msec, repeated 1 time(s).
IsListSorted_fastk(): True = >Time Elasped: 210.977 msec, repeated 1 time(s).
Test 6 : AscendingList, list len : 1000000
IsListSorted_rand(): True = >Time Elasped: 0.299 msec, repeated 1 time(s).
IsListSorted_numpy(): True = >Time Elasped: 4.822 msec, repeated 1 time(s).
IsListSorted_fastk(): True = >Time Elasped: 214.171 msec, repeated 1 time(s).
Test 7 : DescendingList, list len : 1000000
IsListSorted_rand(): True = >Time Elasped: 0.336 msec, repeated 1 time(s).
IsListSorted_numpy(): True = >Time Elasped: 3.867 msec, repeated 1 time(s).
IsListSorted_fastk(): True = >Time Elasped: 279.322 msec, repeated 1 time(s).
Test 8 : MiddleNotchList, list len : 1000000
IsListSorted_rand(): True = >Time Elasped: 0.387 msec, repeated 1 time(s).
IsListSorted_numpy(): False = >Time Elasped: 4.792 msec, repeated 1 time(s).
IsListSorted_rand(): False = >Time Elasped: 78.903 msec, repeated 1 time(s).
IsListSorted_fastk(): False = >Time Elasped: 110.444 msec, repeated 1 time(s).
|
可见,在绝大部分测试场景中,_rand()性能均为最佳,且不失正确率。注意测试8,当降序列表中间某个元素被置0(开槽)时,随机采样很容易遗漏该元素,导致误判。然而,这种场景在实际使用中非常罕见。