Python:为什么*和**比/和sqrt()快?

While optimising my code I realised the following:

在优化我的代码时，我意识到:

>>> from timeit import Timer as T
>>> T(lambda : 1234567890 / 4.0).repeat()
[0.22256922721862793, 0.20560789108276367, 0.20530295372009277]
>>> from __future__ import division
>>> T(lambda : 1234567890 / 4).repeat()
[0.14969301223754883, 0.14155197143554688, 0.14141488075256348]
>>> T(lambda : 1234567890 * 0.25).repeat()
[0.13619112968444824, 0.1281130313873291, 0.12830305099487305]

and also:

还有:

>>> from math import sqrt
>>> T(lambda : sqrt(1234567890)).repeat()
[0.2597470283508301, 0.2498021125793457, 0.24994492530822754]
>>> T(lambda : 1234567890 ** 0.5).repeat()
[0.15409398078918457, 0.14059877395629883, 0.14049601554870605]

I assume it has to do with the way python is implemented in C, but I wonder if anybody would care to explain why is so?

我认为这与python在C中的实现方式有关，但我想知道是否有人愿意解释为什么会这样?

1 个解决方案

#1

112

The (somewhat unexpected) reason for your results is that Python seems to fold constant expressions involving floating-point multiplication and exponentiation, but not division. math.sqrt() is a different beast altogether since there's no bytecode for it and it involves a function call.

您的结果(有些出乎意料)的原因是Python似乎可以折叠包含浮点乘法和指数运算的常量表达式，而不是除法。math.sqrt()是完全不同的东西，因为它没有字节码，并且包含一个函数调用。

On Python 2.6.5, the following code:

在Python 2.6.5中，代码如下:

x1 = 1234567890.0 / 4.0
x2 = 1234567890.0 * 0.25
x3 = 1234567890.0 ** 0.5
x4 = math.sqrt(1234567890.0)

compiles to the following bytecodes:

编译为以下字节码:

  # x1 = 1234567890.0 / 4.0
  4           0 LOAD_CONST               1 (1234567890.0)
              3 LOAD_CONST               2 (4.0)
              6 BINARY_DIVIDE       
              7 STORE_FAST               0 (x1)

  # x2 = 1234567890.0 * 0.25
  5          10 LOAD_CONST               5 (308641972.5)
             13 STORE_FAST               1 (x2)

  # x3 = 1234567890.0 ** 0.5
  6          16 LOAD_CONST               6 (35136.418286444619)
             19 STORE_FAST               2 (x3)

  # x4 = math.sqrt(1234567890.0)
  7          22 LOAD_GLOBAL              0 (math)
             25 LOAD_ATTR                1 (sqrt)
             28 LOAD_CONST               1 (1234567890.0)
             31 CALL_FUNCTION            1
             34 STORE_FAST               3 (x4)

As you can see, multiplication and exponentiation take no time at all since they're done when the code is compiled. Division takes longer since it happens at runtime. Square root is not only the most computationally expensive operation of the four, it also incurs various overheads that the others do not (attribute lookup, function call etc).

如您所见，乘法和指数运算根本不需要花费时间，因为它们是在编译代码时完成的。因为在运行时发生除法，所以要花更长的时间。平方根不仅是这四种方法中计算开销最大的一种，而且还会产生其他方法都没有的各种开销(属性查找、函数调用等)。

If you eliminate the effect of constant folding, there's little to separate multiplication and division:

如果你消除了常数折叠的影响，就没有什么可以区分乘法和除法了:

In [16]: x = 1234567890.0

In [17]: %timeit x / 4.0
10000000 loops, best of 3: 87.8 ns per loop

In [18]: %timeit x * 0.25
10000000 loops, best of 3: 91.6 ns per loop

math.sqrt(x) is actually a little bit faster than x ** 0.5, presumably because it's a special case of the latter and can therefore be done more efficiently, in spite of the overheads:

math.sqrt(x)实际上比x ** 0.5稍快一些，大概是因为它是后者的特殊情况，因此可以更有效地进行操作，尽管有一些开销:

In [19]: %timeit x ** 0.5
1000000 loops, best of 3: 211 ns per loop

In [20]: %timeit math.sqrt(x)
10000000 loops, best of 3: 181 ns per loop

edit 2011-11-16: Constant expression folding is done by Python's peephole optimizer. The source code (peephole.c) contains the following comment that explains why constant division isn't folded:

编辑2011-11-16:常量表达式折叠是由Python的窥视孔优化器完成的。源代码(peephole.c)包含以下注释，解释为什么不折叠常量分割:

    case BINARY_DIVIDE:
        /* Cannot fold this operation statically since
           the result can depend on the run-time presence
           of the -Qnew flag */
        return 0;

The -Qnew flag enables "true division" defined in PEP 238.

qnew标志允许在PEP 238中定义“真正的划分”。

#1

112