Python 2.X atomic中的`print`内置函数是什么？

I've been exploring the internal implementation of threads in Python this week. It's amazing how everyday I get amazed by how much I didn't know; not knowing what I want to know, that's what makes me itch.

本周我一直在探索Python中线程的内部实现。令人惊讶的是，每天我都被我不知道多少感到惊讶;不知道我想知道什么，这就是让我痒的原因。

I noticed something strange in a piece of code that I ran under Python 2.7 as a mutlithreaded application. We all know that Python 2.7 switches between threads after 100 virtual instructions by default. Calling a function is one virtual instruction, for example:

我注意到在Python 2.7下作为多线程应用程序运行的一段代码中有些奇怪。我们都知道默认情况下，Python 2.7在100个虚拟指令之后切换。调用函数是一个虚拟指令，例如：

>>> from __future__ import print_function
>>> def x(): print('a')
... 
>>> dis.dis(x)
  1           0 LOAD_GLOBAL              0 (print)
              3 LOAD_CONST               1 ('a')
              6 CALL_FUNCTION            1
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

As you can see, after loading global print and after loading the constant a the function gets called. Calling a function therefore is atomic as it's done with a single instruction. Hence, in a multithreaded program either the function (print here) runs or the 'running' thread gets interrupted before the function gets the change to run. That is, if a context switch occurs between LOAD_GLOBAL and LOAD_CONST, the instruction CALL_FUNCTIONwon't run.

如您所见，在加载全局打印之后，加载常量后，函数被调用。因此，调用函数是原子的，因为它是通过单个指令完成的。因此，在多线程程序中，函数（此处打印）运行或“运行”线程在函数获得运行更改之前被中断。也就是说，如果在LOAD_GLOBAL和LOAD_CONST之间发生上下文切换，则指令CALL_FUNCTION不会运行。

Keep in mind that in the above code I'm using from __future__ import print_function, I'm really calling a builtin function now not the print statement. Let's take a look at the byte code of function x but this time with the print statement:

请记住，在上面的代码我使用的是__future__ import print_function，我实际上是在调用内置函数而不是print语句。让我们看一下函数x的字节码，但这次使用print语句：

>>> def x(): print "a"          # print stmt
... 
>>> dis.dis(x)
  1           0 LOAD_CONST               1 ('a')
              3 PRINT_ITEM          
              4 PRINT_NEWLINE       
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE

It's quite possible in this case that a thread context switch may occur between LOAD_CONST and PRINT_ITEM, effectively preventing PRINT_NEWLINE instruction from executing. So if you have a multithreaded program like this (borrowed from Programming Python 4th edition and slightly modified):

在这种情况下，很可能在LOAD_CONST和PRINT_ITEM之间发生线程上下文切换，从而有效地阻止了PRINT_NEWLINE指令的执行。所以如果你有这样的多线程程序（借用Programming Python第4版并稍加修改）：

def counter(myId, count):
    for i in range(count):
        time.sleep(1)
        print ('[%s] => %s' % (myId, i)) #print (stmt) 2.X 

for i in range(5):
    thread.start_new_thread(counter, (i, 5))

time.sleep(6)  # don't quit early so other threads don't die

The output may or may not look like this depending on how threads were switched:

根据线程的切换方式，输出可能看起来像这样，也可能看起来不像这样：

[0] => 0
[3] => 0[1] => 0
[4] => 0
[2] => 0
...many more...

This is all okay with the print statement.

这对print语句来说都没问题。

What happens if we change print statement with the builtin print function? Let's see:

如果我们使用内置打印功能更改print语句会发生什么？让我们来看看：

from __future__ import print_function
def counter(myId, count):
    for i in range(count):
        time.sleep(1)

        print('[%s] => %s' % (myId, i))  #print builtin (func)

for i in range(5):
    thread.start_new_thread(counter, (i, 5))

time.sleep(6)

If you run this script long enough and multiple times, you'll see something like this:

如果您运行此脚本足够长且多次，您将看到如下内容：

[4] => 0
[3] => 0[1] => 0
[2] => 0
[0] => 0
...many more...

Given all the above explanation how can this be? print is a function now, how come that it prints the passed-in string but not the new line? The print prints the value of end at the end of the printed string, it's set by default to \n. Essentially, a call to function is atomic, how on planet earth it got interrupted?

鉴于上述所有解释，这怎么可能？ print现在是一个函数，为什么它打印传入的字符串而不是新行呢？ print在打印字符串的末尾打印end的值，默认设置为\ n。从本质上讲，对函数的调用是原子的，它在地球上是如何被中断的？

Let's blow our minds:

让我们大开眼界：

def counter(myId, count):
    for i in range(count):
        time.sleep(1)
        #sys.stdout.write('[%s] => %s\n' % (myId, i))
        print('[%s] => %s\n' % (myId, i), end='')

for i in range(5):
    thread.start_new_thread(counter, (i, 5))

time.sleep(6)

Now the new line is always printed, no jumbled output anymore:

现在新行总是打印出来，不再有混乱的输出：

[1] => 0
[2] => 0
[0] => 0
[4] => 0
...many more...

The Addition of \n to the string now obviously proves that print function is not atomic (even though it's a function) and essentially it just acts as if it's the print statement. dis.dis however informs us incoherently or stupidly that it's a simple function and thus an atomic operation?!

现在，对字符串的添加现在显然证明了print函数不是原子的（即使它是一个函数），实际上它就像是print语句一样。然而，dis.dis通过不连贯或愚蠢的方式告诉我们它是一个简单的函数，因此是一个原子操作？！

Note: I never rely on the order or timing of threads for applications to work properly. This is just for testing purposes only and frankly for geeks like me.

注意：我从不依赖线程的顺序或时间来使应用程序正常工作。这仅仅是出于测试目的，坦率地说就像我这样的极客。

1 个解决方案

#1

Your question is based on the central premise

您的问题基于中心前提

Calling a function therefore is atomic as it's done with a single instruction.

因此，调用函数是原子的，因为它是通过单个指令完成的。

which is thoroughly wrong.

这是完全错误的。

First, executing the CALL_FUNCTION opcode can involve executing additional bytecode. The most obvious case of this is when the executed function is written in Python, but even built-in functions can freely call other code that may be written in Python. For example, print calls __str__ and write methods.

首先，执行CALL_FUNCTION操作码可能涉及执行额外的字节码。最明显的情况是执行的函数是用Python编写的，但即使是内置函数也可以*调用可能用Python编写的其他代码。例如，print调用__str__和write方法。

Second, Python is free to release the GIL even in the middle of C code. It commonly does this for I/O and other operations that might take a while without needing to perform Python API calls. There are 23 uses of the FILE_BEGIN_ALLOW_THREADS and Py_BEGIN_ALLOW_THREADS macros in the Python 2.7 file object implementation alone, including one in the implementation of file.write, which print relies on.

其次，即使在C代码中间，Python也可以*发布GIL。它通常为I / O和其他可能需要一段时间而不需要执行Python API调用的操作执行此操作。仅在Python 2.7文件对象实现中有23个FILE_BEGIN_ALLOW_THREADS和Py_BEGIN_ALLOW_THREADS宏用途，其中一个在file.write的实现中，print依赖于它。

#1