
时间:2022-02-28 09:51:32

Using python2.7 I found a weird time execution case:


data = dict( zip( a[0].split( ':' ), a[1].split( ':' ) ) )

data = { name: value for name, value in zip(a[0].split( ':' ), a[1].split( ':' )) }

Those two calls seems absolutely the same to me, however, I found that the dictionary comprehension version is about 4% faster - not too much, but very stable.

这两个电话对我来说似乎完全一样,但是,我发现字典理解版本的速度提高了约4% - 不是太多,而是非常稳定。

Is this true, and if so, why? Or is it just my imagination?


1 个解决方案


Your input sample is too small. Looking up the global name dict() takes more time than just running the dict comprehension (the latter doesn't require name lookups), but if you test against a large number of key-value pairs dict() wins as the looping is done entirely in C.


Test the difference against a large number of key-value pairs, and reduce the test to just the dict() call or the dictionary comprehension (the zip() and str.split() calls are executed just once for both cases and can be ignored):


>>> from timeit import timeit
>>> import random
>>> from string import ascii_lowercase
>>> kv_pairs = [(''.join(random.sample(ascii_lowercase, random.randint(10, 20))), ''.join(random.sample(ascii_lowercase, random.randint(10, 20))))
...             for _ in xrange(10000)]
>>> len(dict(kv_pairs))  # the random keys happen to be all unique.
>>> timeit('{k: v for k, v in kv_pairs}', 'from __main__ import kv_pairs', number=1000)
>>> timeit('dict(kv_pairs)', 'from __main__ import kv_pairs', number=1000)
>>> timeit('{k: v for k, v in kv_pairs}', 'from __main__ import kv_pairs; kv_pairs = kv_pairs[:3]')
>>> timeit('dict(kv_pairs)', 'from __main__ import kv_pairs; kv_pairs = kv_pairs[:3]')

So for 10k key-value pairs (first two timing tests), dict() is twice as fast, for just 3 pairs (second two timings), the dict comprehension wins.


You can see why when you decompile the bytecode; the dictionary comprehension uses a nested code object to implement the actual dictionary building:


>>> import dis
>>> dis.dis(compile('{k: v for k, v in kv_pairs}', '', 'exec'))
  1           0 LOAD_CONST               0 (<code object <dictcomp> at 0x102ef69b0, file "", line 1>)
              3 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (kv_pairs)
              9 GET_ITER            
             10 CALL_FUNCTION            1
             13 POP_TOP             
             14 LOAD_CONST               1 (None)
             17 RETURN_VALUE        
>>> dis.dis(compile('{k: v for k, v in kv_pairs}', '', 'exec').co_consts[0])
  1           0 BUILD_MAP                0
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                21 (to 30)
              9 UNPACK_SEQUENCE          2
             12 STORE_FAST               1 (k)
             15 STORE_FAST               2 (v)
             18 LOAD_FAST                2 (v)
             21 LOAD_FAST                1 (k)
             24 MAP_ADD                  2
             27 JUMP_ABSOLUTE            6
        >>   30 RETURN_VALUE        
>>> dis.dis(compile('dict(kv_pairs)', '', 'exec'))
  1           0 LOAD_NAME                0 (dict)
              3 LOAD_NAME                1 (kv_pairs)
              6 CALL_FUNCTION            1
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

By using a very small sample, you gave the LOAD_NAME step for dict too much weight; the dict comprehension involves much more bytecode, executed each iteration.

通过使用非常小的样本,您为LOID_NAME提供了太多重量的dict步骤; dict理解涉及更多​​字节码,每次迭代执行。


Your input sample is too small. Looking up the global name dict() takes more time than just running the dict comprehension (the latter doesn't require name lookups), but if you test against a large number of key-value pairs dict() wins as the looping is done entirely in C.


Test the difference against a large number of key-value pairs, and reduce the test to just the dict() call or the dictionary comprehension (the zip() and str.split() calls are executed just once for both cases and can be ignored):


>>> from timeit import timeit
>>> import random
>>> from string import ascii_lowercase
>>> kv_pairs = [(''.join(random.sample(ascii_lowercase, random.randint(10, 20))), ''.join(random.sample(ascii_lowercase, random.randint(10, 20))))
...             for _ in xrange(10000)]
>>> len(dict(kv_pairs))  # the random keys happen to be all unique.
>>> timeit('{k: v for k, v in kv_pairs}', 'from __main__ import kv_pairs', number=1000)
>>> timeit('dict(kv_pairs)', 'from __main__ import kv_pairs', number=1000)
>>> timeit('{k: v for k, v in kv_pairs}', 'from __main__ import kv_pairs; kv_pairs = kv_pairs[:3]')
>>> timeit('dict(kv_pairs)', 'from __main__ import kv_pairs; kv_pairs = kv_pairs[:3]')

So for 10k key-value pairs (first two timing tests), dict() is twice as fast, for just 3 pairs (second two timings), the dict comprehension wins.


You can see why when you decompile the bytecode; the dictionary comprehension uses a nested code object to implement the actual dictionary building:


>>> import dis
>>> dis.dis(compile('{k: v for k, v in kv_pairs}', '', 'exec'))
  1           0 LOAD_CONST               0 (<code object <dictcomp> at 0x102ef69b0, file "", line 1>)
              3 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (kv_pairs)
              9 GET_ITER            
             10 CALL_FUNCTION            1
             13 POP_TOP             
             14 LOAD_CONST               1 (None)
             17 RETURN_VALUE        
>>> dis.dis(compile('{k: v for k, v in kv_pairs}', '', 'exec').co_consts[0])
  1           0 BUILD_MAP                0
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                21 (to 30)
              9 UNPACK_SEQUENCE          2
             12 STORE_FAST               1 (k)
             15 STORE_FAST               2 (v)
             18 LOAD_FAST                2 (v)
             21 LOAD_FAST                1 (k)
             24 MAP_ADD                  2
             27 JUMP_ABSOLUTE            6
        >>   30 RETURN_VALUE        
>>> dis.dis(compile('dict(kv_pairs)', '', 'exec'))
  1           0 LOAD_NAME                0 (dict)
              3 LOAD_NAME                1 (kv_pairs)
              6 CALL_FUNCTION            1
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

By using a very small sample, you gave the LOAD_NAME step for dict too much weight; the dict comprehension involves much more bytecode, executed each iteration.

通过使用非常小的样本,您为LOID_NAME提供了太多重量的dict步骤; dict理解涉及更多​​字节码,每次迭代执行。