Python 2.X atomic中的`print`内置函数是什么?

时间:2022-06-26 18:05:37

I've been exploring the internal implementation of threads in Python this week. It's amazing how everyday I get amazed by how much I didn't know; not knowing what I want to know, that's what makes me itch.

本周我一直在探索Python中线程的内部实现。令人惊讶的是,每天我都被我不知道多少感到惊讶;不知道我想知道什么,这就是让我痒的原因。

I noticed something strange in a piece of code that I ran under Python 2.7 as a mutlithreaded application. We all know that Python 2.7 switches between threads after 100 virtual instructions by default. Calling a function is one virtual instruction, for example:

我注意到在Python 2.7下作为多线程应用程序运行的一段代码中有些奇怪。我们都知道默认情况下,Python 2.7在100个虚拟指令之后切换。调用函数是一个虚拟指令,例如:

>>> from __future__ import print_function
>>> def x(): print('a')
... 
>>> dis.dis(x)
  1           0 LOAD_GLOBAL              0 (print)
              3 LOAD_CONST               1 ('a')
              6 CALL_FUNCTION            1
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

As you can see, after loading global print and after loading the constant a the function gets called. Calling a function therefore is atomic as it's done with a single instruction. Hence, in a multithreaded program either the function (print here) runs or the 'running' thread gets interrupted before the function gets the change to run. That is, if a context switch occurs between LOAD_GLOBAL and LOAD_CONST, the instruction CALL_FUNCTIONwon't run.

如您所见,在加载全局打印之后,加载常量后,函数被调用。因此,调用函数是原子的,因为它是通过单个指令完成的。因此,在多线程程序中,函数(此处打印)运行或“运行”线程在函数获得运行更改之前被中断。也就是说,如果在LOAD_GLOBAL和LOAD_CONST之间发生上下文切换,则指令CALL_FUNCTION不会运行。

Keep in mind that in the above code I'm using from __future__ import print_function, I'm really calling a builtin function now not the print statement. Let's take a look at the byte code of function x but this time with the print statement:

请记住,在上面的代码我使用的是__future__ import print_function,我实际上是在调用内置函数而不是print语句。让我们看一下函数x的字节码,但这次使用print语句:

>>> def x(): print "a"          # print stmt
... 
>>> dis.dis(x)
  1           0 LOAD_CONST               1 ('a')
              3 PRINT_ITEM          
              4 PRINT_NEWLINE       
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE 

It's quite possible in this case that a thread context switch may occur between LOAD_CONST and PRINT_ITEM, effectively preventing PRINT_NEWLINE instruction from executing. So if you have a multithreaded program like this (borrowed from Programming Python 4th edition and slightly modified):

在这种情况下,很可能在LOAD_CONST和PRINT_ITEM之间发生线程上下文切换,从而有效地阻止了PRINT_NEWLINE指令的执行。所以如果你有这样的多线程程序(借用Programming Python第4版并稍加修改):

def counter(myId, count):
    for i in range(count):
        time.sleep(1)
        print ('[%s] => %s' % (myId, i)) #print (stmt) 2.X 

for i in range(5):
    thread.start_new_thread(counter, (i, 5))

time.sleep(6)  # don't quit early so other threads don't die

The output may or may not look like this depending on how threads were switched:

根据线程的切换方式,输出可能看起来像这样,也可能看起来不像这样:

[0] => 0
[3] => 0[1] => 0
[4] => 0
[2] => 0
...many more...

This is all okay with the print statement.

这对print语句来说都没问题。

What happens if we change print statement with the builtin print function? Let's see:

如果我们使用内置打印功能更改print语句会发生什么?让我们来看看:

from __future__ import print_function
def counter(myId, count):
    for i in range(count):
        time.sleep(1)

        print('[%s] => %s' % (myId, i))  #print builtin (func)

for i in range(5):
    thread.start_new_thread(counter, (i, 5))

time.sleep(6) 

If you run this script long enough and multiple times, you'll see something like this:

如果您运行此脚本足够长且多次,您将看到如下内容:

[4] => 0
[3] => 0[1] => 0
[2] => 0
[0] => 0
...many more...

Given all the above explanation how can this be? print is a function now, how come that it prints the passed-in string but not the new line? The print prints the value of end at the end of the printed string, it's set by default to \n. Essentially, a call to function is atomic, how on planet earth it got interrupted?

鉴于上述所有解释,这怎么可能? print现在是一个函数,为什么它打印传入的字符串而不是新行呢? print在打印字符串的末尾打印end的值,默认设置为\ n。从本质上讲,对函数的调用是原子的,它在地球上是如何被中断的?

Let's blow our minds:

让我们大开眼界:

def counter(myId, count):
    for i in range(count):
        time.sleep(1)
        #sys.stdout.write('[%s] => %s\n' % (myId, i))
        print('[%s] => %s\n' % (myId, i), end='')

for i in range(5):
    thread.start_new_thread(counter, (i, 5))

time.sleep(6) 

Now the new line is always printed, no jumbled output anymore:

现在新行总是打印出来,不再有混乱的输出:

[1] => 0
[2] => 0
[0] => 0
[4] => 0
...many more...

The Addition of \n to the string now obviously proves that print function is not atomic (even though it's a function) and essentially it just acts as if it's the print statement. dis.dis however informs us incoherently or stupidly that it's a simple function and thus an atomic operation?!

现在,对字符串的添加现在显然证明了print函数不是原子的(即使它是一个函数),实际上它就像是print语句一样。然而,dis.dis通过不连贯或愚蠢的方式告诉我们它是一个简单的函数,因此是一个原子操作?!

Note: I never rely on the order or timing of threads for applications to work properly. This is just for testing purposes only and frankly for geeks like me.

注意:我从不依赖线程的顺序或时间来使应用程序正常工作。这仅仅是出于测试目的,坦率地说就像我这样的极客。

1 个解决方案

#1


2  

Your question is based on the central premise

您的问题基于中心前提

Calling a function therefore is atomic as it's done with a single instruction.

因此,调用函数是原子的,因为它是通过单个指令完成的。

which is thoroughly wrong.

这是完全错误的。

First, executing the CALL_FUNCTION opcode can involve executing additional bytecode. The most obvious case of this is when the executed function is written in Python, but even built-in functions can freely call other code that may be written in Python. For example, print calls __str__ and write methods.

首先,执行CALL_FUNCTION操作码可能涉及执行额外的字节码。最明显的情况是执行的函数是用Python编写的,但即使是内置函数也可以*调用可能用Python编写的其他代码。例如,print调用__str__和write方法。

Second, Python is free to release the GIL even in the middle of C code. It commonly does this for I/O and other operations that might take a while without needing to perform Python API calls. There are 23 uses of the FILE_BEGIN_ALLOW_THREADS and Py_BEGIN_ALLOW_THREADS macros in the Python 2.7 file object implementation alone, including one in the implementation of file.write, which print relies on.

其次,即使在C代码中间,Python也可以*发布GIL。它通常为I / O和其他可能需要一段时间而不需要执行Python API调用的操作执行此操作。仅在Python 2.7文件对象实现中有23个FILE_BEGIN_ALLOW_THREADS和Py_BEGIN_ALLOW_THREADS宏用途,其中一个在file.write的实现中,print依赖于它。

#1


2  

Your question is based on the central premise

您的问题基于中心前提

Calling a function therefore is atomic as it's done with a single instruction.

因此,调用函数是原子的,因为它是通过单个指令完成的。

which is thoroughly wrong.

这是完全错误的。

First, executing the CALL_FUNCTION opcode can involve executing additional bytecode. The most obvious case of this is when the executed function is written in Python, but even built-in functions can freely call other code that may be written in Python. For example, print calls __str__ and write methods.

首先,执行CALL_FUNCTION操作码可能涉及执行额外的字节码。最明显的情况是执行的函数是用Python编写的,但即使是内置函数也可以*调用可能用Python编写的其他代码。例如,print调用__str__和write方法。

Second, Python is free to release the GIL even in the middle of C code. It commonly does this for I/O and other operations that might take a while without needing to perform Python API calls. There are 23 uses of the FILE_BEGIN_ALLOW_THREADS and Py_BEGIN_ALLOW_THREADS macros in the Python 2.7 file object implementation alone, including one in the implementation of file.write, which print relies on.

其次,即使在C代码中间,Python也可以*发布GIL。它通常为I / O和其他可能需要一段时间而不需要执行Python API调用的操作执行此操作。仅在Python 2.7文件对象实现中有23个FILE_BEGIN_ALLOW_THREADS和Py_BEGIN_ALLOW_THREADS宏用途,其中一个在file.write的实现中,print依赖于它。