I've been exploring the internal implementation of threads in Python this week. It's amazing how everyday I get amazed by how much I didn't know; not knowing what I want to know, that's what makes me itch.
本周我一直在探索Python中线程的内部实现。令人惊讶的是,每天我都被我不知道多少感到惊讶;不知道我想知道什么,这就是让我痒的原因。
I noticed something strange in a piece of code that I ran under Python 2.7 as a mutlithreaded application. We all know that Python 2.7 switches between threads after 100 virtual instructions by default. Calling a function is one virtual instruction, for example:
我注意到在Python 2.7下作为多线程应用程序运行的一段代码中有些奇怪。我们都知道默认情况下,Python 2.7在100个虚拟指令之后切换。调用函数是一个虚拟指令,例如:
>>> from __future__ import print_function
>>> def x(): print('a')
...
>>> dis.dis(x)
1 0 LOAD_GLOBAL 0 (print)
3 LOAD_CONST 1 ('a')
6 CALL_FUNCTION 1
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
As you can see, after loading global print
and after loading the constant a
the function gets called. Calling a function therefore is atomic as it's done with a single instruction. Hence, in a multithreaded program either the function (print
here) runs or the 'running' thread gets interrupted before the function gets the change to run. That is, if a context switch occurs between LOAD_GLOBAL
and LOAD_CONST
, the instruction CALL_FUNCTION
won't run.
如您所见,在加载全局打印之后,加载常量后,函数被调用。因此,调用函数是原子的,因为它是通过单个指令完成的。因此,在多线程程序中,函数(此处打印)运行或“运行”线程在函数获得运行更改之前被中断。也就是说,如果在LOAD_GLOBAL和LOAD_CONST之间发生上下文切换,则指令CALL_FUNCTION不会运行。
Keep in mind that in the above code I'm using from __future__ import print_function
, I'm really calling a builtin function now not the print
statement. Let's take a look at the byte code of function x
but this time with the print
statement:
请记住,在上面的代码我使用的是__future__ import print_function,我实际上是在调用内置函数而不是print语句。让我们看一下函数x的字节码,但这次使用print语句:
>>> def x(): print "a" # print stmt
...
>>> dis.dis(x)
1 0 LOAD_CONST 1 ('a')
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
It's quite possible in this case that a thread context switch may occur between LOAD_CONST
and PRINT_ITEM
, effectively preventing PRINT_NEWLINE
instruction from executing. So if you have a multithreaded program like this (borrowed from Programming Python 4th edition and slightly modified):
在这种情况下,很可能在LOAD_CONST和PRINT_ITEM之间发生线程上下文切换,从而有效地阻止了PRINT_NEWLINE指令的执行。所以如果你有这样的多线程程序(借用Programming Python第4版并稍加修改):
def counter(myId, count):
for i in range(count):
time.sleep(1)
print ('[%s] => %s' % (myId, i)) #print (stmt) 2.X
for i in range(5):
thread.start_new_thread(counter, (i, 5))
time.sleep(6) # don't quit early so other threads don't die
The output may or may not look like this depending on how threads were switched:
根据线程的切换方式,输出可能看起来像这样,也可能看起来不像这样:
[0] => 0
[3] => 0[1] => 0
[4] => 0
[2] => 0
...many more...
This is all okay with the print
statement.
这对print语句来说都没问题。
What happens if we change print
statement with the builtin print
function? Let's see:
如果我们使用内置打印功能更改print语句会发生什么?让我们来看看:
from __future__ import print_function
def counter(myId, count):
for i in range(count):
time.sleep(1)
print('[%s] => %s' % (myId, i)) #print builtin (func)
for i in range(5):
thread.start_new_thread(counter, (i, 5))
time.sleep(6)
If you run this script long enough and multiple times, you'll see something like this:
如果您运行此脚本足够长且多次,您将看到如下内容:
[4] => 0
[3] => 0[1] => 0
[2] => 0
[0] => 0
...many more...
Given all the above explanation how can this be? print
is a function now, how come that it prints the passed-in string but not the new line? The print
prints the value of end
at the end of the printed string, it's set by default to \n
. Essentially, a call to function is atomic, how on planet earth it got interrupted?
鉴于上述所有解释,这怎么可能? print现在是一个函数,为什么它打印传入的字符串而不是新行呢? print在打印字符串的末尾打印end的值,默认设置为\ n。从本质上讲,对函数的调用是原子的,它在地球上是如何被中断的?
Let's blow our minds:
让我们大开眼界:
def counter(myId, count):
for i in range(count):
time.sleep(1)
#sys.stdout.write('[%s] => %s\n' % (myId, i))
print('[%s] => %s\n' % (myId, i), end='')
for i in range(5):
thread.start_new_thread(counter, (i, 5))
time.sleep(6)
Now the new line is always printed, no jumbled output anymore:
现在新行总是打印出来,不再有混乱的输出:
[1] => 0
[2] => 0
[0] => 0
[4] => 0
...many more...
The Addition of \n
to the string now obviously proves that print
function is not atomic (even though it's a function) and essentially it just acts as if it's the print
statement. dis.dis
however informs us incoherently or stupidly that it's a simple function and thus an atomic operation?!
现在,对字符串的添加现在显然证明了print函数不是原子的(即使它是一个函数),实际上它就像是print语句一样。然而,dis.dis通过不连贯或愚蠢的方式告诉我们它是一个简单的函数,因此是一个原子操作?!
Note: I never rely on the order or timing of threads for applications to work properly. This is just for testing purposes only and frankly for geeks like me.
注意:我从不依赖线程的顺序或时间来使应用程序正常工作。这仅仅是出于测试目的,坦率地说就像我这样的极客。
1 个解决方案
#1
2
Your question is based on the central premise
您的问题基于中心前提
Calling a function therefore is atomic as it's done with a single instruction.
因此,调用函数是原子的,因为它是通过单个指令完成的。
which is thoroughly wrong.
这是完全错误的。
First, executing the CALL_FUNCTION
opcode can involve executing additional bytecode. The most obvious case of this is when the executed function is written in Python, but even built-in functions can freely call other code that may be written in Python. For example, print
calls __str__
and write
methods.
首先,执行CALL_FUNCTION操作码可能涉及执行额外的字节码。最明显的情况是执行的函数是用Python编写的,但即使是内置函数也可以*调用可能用Python编写的其他代码。例如,print调用__str__和write方法。
Second, Python is free to release the GIL even in the middle of C code. It commonly does this for I/O and other operations that might take a while without needing to perform Python API calls. There are 23 uses of the FILE_BEGIN_ALLOW_THREADS
and Py_BEGIN_ALLOW_THREADS
macros in the Python 2.7 file object implementation alone, including one in the implementation of file.write
, which print
relies on.
其次,即使在C代码中间,Python也可以*发布GIL。它通常为I / O和其他可能需要一段时间而不需要执行Python API调用的操作执行此操作。仅在Python 2.7文件对象实现中有23个FILE_BEGIN_ALLOW_THREADS和Py_BEGIN_ALLOW_THREADS宏用途,其中一个在file.write的实现中,print依赖于它。
#1
2
Your question is based on the central premise
您的问题基于中心前提
Calling a function therefore is atomic as it's done with a single instruction.
因此,调用函数是原子的,因为它是通过单个指令完成的。
which is thoroughly wrong.
这是完全错误的。
First, executing the CALL_FUNCTION
opcode can involve executing additional bytecode. The most obvious case of this is when the executed function is written in Python, but even built-in functions can freely call other code that may be written in Python. For example, print
calls __str__
and write
methods.
首先,执行CALL_FUNCTION操作码可能涉及执行额外的字节码。最明显的情况是执行的函数是用Python编写的,但即使是内置函数也可以*调用可能用Python编写的其他代码。例如,print调用__str__和write方法。
Second, Python is free to release the GIL even in the middle of C code. It commonly does this for I/O and other operations that might take a while without needing to perform Python API calls. There are 23 uses of the FILE_BEGIN_ALLOW_THREADS
and Py_BEGIN_ALLOW_THREADS
macros in the Python 2.7 file object implementation alone, including one in the implementation of file.write
, which print
relies on.
其次,即使在C代码中间,Python也可以*发布GIL。它通常为I / O和其他可能需要一段时间而不需要执行Python API调用的操作执行此操作。仅在Python 2.7文件对象实现中有23个FILE_BEGIN_ALLOW_THREADS和Py_BEGIN_ALLOW_THREADS宏用途,其中一个在file.write的实现中,print依赖于它。