Python的多处理不能很好地使用线程。

I have two processes (see sample code) that each attempt to access a threading.local object. I would expect the below code to print "a" and "b" (in either order). Instead, I get "a" and "a". How can I elegantly and robustly reset the threading.local object when I startup whole new processes?

我有两个进程(请参见示例代码)，每个进程都试图访问线程。本地对象。我希望下面的代码打印出“a”和“b”(以任意顺序)。相反，我得到了“a”和“a”。如何才能优雅地、健壮地重新设置线程。当我启动新的进程时，本地对象?

import threading
import multiprocessing
l = threading.local()
l.x = 'a'
def f():
    print getattr(l, 'x', 'b')
multiprocessing.Process(target=f).start()
f()

edit: For reference, when I use threading.Thread instead of multiprocessing.Process, it works as expected.

编辑:供参考，当我使用线程时。线程而不是多处理。过程，它按预期工作。

3 个解决方案

#1

Both operating systems you mentioned are Unix/Linux based and therefore implement the same fork()ing API. A fork() completely duplicates the process object, along with its memory, loaded code, open file descriptors and threads. Moreover, the new process usually shares the very same process object within the kernel until the first memory write operation. This basically means that the local data structures are also being copied into the new process, along with the thread local variables. Thus, you still have the same data structures and l.x is still defined.

您提到的两个操作系统都是基于Unix/Linux的，因此实现了相同的fork() API。fork()完全复制进程对象及其内存、加载的代码、打开的文件描述符和线程。而且，新进程通常在内核*享同一个进程对象，直到第一个内存写入操作。这基本上意味着本地数据结构以及线程本地变量也被复制到新进程中。因此，仍然有相同的数据结构和l。x仍然是定义。

To reset the data structures for the new process, I'd recommend the process starting function to first call for some clearing method. You could, for example, store the parent process pid with process_id = os.getpid() and use

为了重置新流程的数据结构，我建议流程启动函数首先调用一些清除方法。例如，可以使用process_id = os.getpid()存储父进程pid并使用

if process_id != os.getpid(): 
   clear_local_data()

In the child process main function.

在子进程中起主要作用。

#2

Because threading.local does the trick for threads, not for processes, as clearly described in its documentation:

因为线程。local为线程而不是进程提供了诀窍，正如其文档中明确描述的那样:

The instance’s values will be different for separate threads.

对于单独的线程，实例的值将是不同的。

Nothing about processes.

对流程。

And a quote from multiprocessing doc:

多处理文档的引用:

Note

请注意

multiprocessing contains no analogues of threading.active_count(), threading.enumerate(), threading.settrace(), threading.setprofile(), threading.Timer, or threading.local.

multiprocessing不包含thread .active_count()、thread .enumerate()、thread .settrace()、thread .setprofile()、threading.setprofile()等类似的内容。定时器或threading.local。

#3

There is now a multiprocessing-utils (github) library on pypi with a multiprocessing-safe version of threading.local() which can be pip installed.

现在在pypi上有一个多进程的utils (github)库，它有一个多进程安全版本的thread .local()，可以安装pip。

It works by wrapping a standard threading.local() and checking that the PID has not changed since it was last used (as per the answer here from @immortal).

它通过包装一个标准的thread. local()并检查PID自上次使用以来没有变化(从@不朽的答案中)。

Use it exactly like threading.local():

使用它就像使用thread .local():

l = multiprocessing_utils.local()
l.x = 'a'
def f():
    print getattr(l, 'x', 'b')
f()                                        # prints "a"
threading.Thread(target=f).start()         # prints "b"
multiprocessing.Process(target=f).start()  # prints "b"

Full disclosure: I just created this module

完全披露:我刚刚创建了这个模块

#1