内联Python函数定义的性能

A general question for someone that knows function definition internals better than I do.

一个比我更了解内部函数定义的人的一般问题。

In general, is there a performance trade off to doing something like this:

一般来说，做这样的事情是否有一种性能平衡:

def my_function():
    def other_function():
        pass

    # do some stuff
    other_function()

Versus:

对比:

def other_function():
    pass

def my_function():
    # do some stuff
    other_function()

I've seen developers inline functions before to keep a small, single use function close to the code that actually uses it, but I always wondered if there were a memory (or compute) performance penalty for doing something like this.

我以前见过开发人员内联函数，以便在实际使用它的代码附近保留一个小的、单个的use函数，但我总是想知道这样做是否会导致内存(或计算)性能损失。

Thoughts?

想法吗?

2 个解决方案

#1

Splitting larger functions into more readable, smaller functions is part of writing Pythonic code -- it should be obvious what you're trying to accomplish and smaller functions are easier to read, check for errors, maintain, and reuse.

将较大的函数拆分为可读性更强、更小的函数是编写python代码的一部分——您要完成的事情应该是显而易见的，而较小的函数更易于读取、检查错误、维护和重用。

As always, "which has better performance" questions should always be solved by profiling the code, which is to say that it's often dependent on the signatures of the methods and what your code is doing.

一如既往，“性能更好”的问题应该始终通过分析代码来解决，也就是说，它通常依赖于方法的签名和您的代码正在做什么。

e.g. if you're passing a large dictionary to a separate function instead of referencing a frame local, you'll end up with different performance characteristics than calling a void function from another.

例如，如果你将一个大的字典传递给一个单独的函数，而不是引用一个本地的框架，你将会得到不同于从另一个调用void函数的性能特征。

For example, here's some trivial behavior:

例如，这里有一些琐碎的行为:

import profile
import dis

def callee():
    for x in range(10000):
        x += x
    print("let's have some tea now")

def caller():
    callee()


profile.run('caller()')

let's have some tea now
         26 function calls in 0.002 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    0.000    0.000 :0(decode)
        2    0.000    0.000    0.000    0.000 :0(getpid)
        2    0.000    0.000    0.000    0.000 :0(isinstance)
        1    0.000    0.000    0.000    0.000 :0(range)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        2    0.000    0.000    0.000    0.000 :0(time)
        2    0.000    0.000    0.000    0.000 :0(utf_8_decode)
        2    0.000    0.000    0.000    0.000 :0(write)
        1    0.002    0.002    0.002    0.002 <ipython-input-3-98c87a49b247>:4(callee)
        1    0.000    0.000    0.002    0.002 <ipython-input-3-98c87a49b247>:9(caller)
        1    0.000    0.000    0.002    0.002 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 iostream.py:196(write)
        2    0.000    0.000    0.000    0.000 iostream.py:86(_is_master_process)
        2    0.000    0.000    0.000    0.000 iostream.py:95(_check_mp_mode)
        1    0.000    0.000    0.002    0.002 profile:0(caller())
        0    0.000             0.000          profile:0(profiler)
        2    0.000    0.000    0.000    0.000 utf_8.py:15(decode)

vs.

vs。

import profile
import dis

def all_in_one():
    def passer():
        pass
    passer()
    for x in range(10000):
        x += x
    print("let's have some tea now")

let's have some tea now
         26 function calls in 0.002 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    0.000    0.000 :0(decode)
        2    0.000    0.000    0.000    0.000 :0(getpid)
        2    0.000    0.000    0.000    0.000 :0(isinstance)
        1    0.000    0.000    0.000    0.000 :0(range)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        2    0.000    0.000    0.000    0.000 :0(time)
        2    0.000    0.000    0.000    0.000 :0(utf_8_decode)
        2    0.000    0.000    0.000    0.000 :0(write)
        1    0.002    0.002    0.002    0.002 <ipython-input-3-98c87a49b247>:4(callee)
        1    0.000    0.000    0.002    0.002 <ipython-input-3-98c87a49b247>:9(caller)
        1    0.000    0.000    0.002    0.002 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 iostream.py:196(write)
        2    0.000    0.000    0.000    0.000 iostream.py:86(_is_master_process)
        2    0.000    0.000    0.000    0.000 iostream.py:95(_check_mp_mode)
        1    0.000    0.000    0.002    0.002 profile:0(caller())
        0    0.000             0.000          profile:0(profiler)
        2    0.000    0.000    0.000    0.000 utf_8.py:15(decode)

The two use the same number of function calls and there's no performance difference, which backs up my claim that it really matters to test in specific circumstances.

这两种方法使用相同数量的函数调用，并且没有性能差异，这支持了我的观点，即在特定的环境中进行测试非常重要。

You can see that I have an unused import for the disassembly module. This is another helpful module that will allow you to see what your code is doing (try dis.dis(my_function)). I'd post a profile of the testcode I generated, but it would only show you more details that are not relevant to solving the problem or learning about what's actually happening in your code.

您可以看到我有一个未使用的用于反汇编模块的导入。这是另一个有用的模块，它将允许您查看代码正在做什么(尝试disl .dis(my_function))。我将发布我生成的testcode的概要，但它只会向您展示更多与解决问题或了解代码中实际发生的事情无关的细节。

#2

Using timeit on my mac seems to favor defining the function at the module level (slightly), and obviously the results can vary from one computer to the next ...:

在我的mac电脑上使用timeit似乎更倾向于在模块级别(稍微)定义函数，而且很明显，不同的计算机之间的结果是不同的……

>>> import timeit
>>> def fun1():
...   def foo():
...     pass
...   foo()
... 
>>> def bar():
...   pass
... 
>>> def fun2():
...   bar()
... 
>>> timeit.timeit('fun1()', 'from __main__ import fun1')
0.2706329822540283
>>> timeit.timeit('fun2()', 'from __main__ import fun2')
0.23086285591125488

Note that this difference is small (~10%) so it really won't make a major difference in your program's runtime unless this is in a really tight loop.

请注意，这种差异很小(约10%)，因此它在程序运行时中不会产生重大影响，除非这是一个非常紧密的循环。

The most frequent reason to define a function inside another one is to pick up the out function's local variables in a closure. If you don't need a closure, then you should pick the variant that is easiest to read. (My preference is almost always to put the function at the module level).

在另一个函数中定义函数最常见的原因是在闭包中获取out函数的局部变量。如果不需要闭包，那么应该选择最容易阅读的变体。(我的偏好几乎总是把函数放在模块级)。

#1