线程在Python中是如何工作的? Python线程特定的常见缺陷是什么?

时间:2021-04-09 21:01:40

I've been trying to wrap my head around how threads work in Python, and it's hard to find good information on how they operate. I may just be missing a link or something, but it seems like the official documentation isn't very thorough on the subject, and I haven't been able to find a good write-up.

我一直在尝试用Python来解释线程是如何工作的,并且很难找到关于它们如何运行的良好信息。我可能只是漏掉了一个链接或其他东西,但似乎官方文件在这个问题上不够全面,而且我还没能找到一篇好的文章。

From what I can tell, only one thread can be running at once, and the active thread switches every 10 instructions or so?

据我所知,一次只能运行一个线程,而活动线程每10条指令就会切换一次。

Where is there a good explanation, or can you provide one? It would also be very nice to be aware of common problems that you run into while using threads with Python.

哪里有好的解释,或者你能提供一个吗?如果您知道在使用Python的线程时遇到的常见问题,那将是非常好的。

7 个解决方案

#1


46  

Yes, because of the Global Interpreter Lock (GIL) there can only run one thread at a time. Here are some links with some insights about this:

是的,因为全局解释器锁(GIL)每次只能运行一个线程。以下是一些与此相关的见解:

From the last link an interesting quote:

从最后一个链接有趣的引用:

Let me explain what all that means. Threads run inside the same virtual machine, and hence run on the same physical machine. Processes can run on the same physical machine or in another physical machine. If you architect your application around threads, you’ve done nothing to access multiple machines. So, you can scale to as many cores are on the single machine (which will be quite a few over time), but to really reach web scales, you’ll need to solve the multiple machine problem anyway.

让我解释一下这意味着什么。线程在同一虚拟机上运行,因此在同一物理机器上运行。进程可以在同一台物理机器上运行,也可以在另一台物理机器上运行。如果您围绕线程构建应用程序,那么您没有做任何事情来访问多台机器。因此,您可以扩展到同一台机器上的任意多核(随着时间的推移,这将是相当多的),但是要真正达到web规模,您仍然需要解决多机问题。

If you want to use multi core, pyprocessing defines an process based API to do real parallelization. The PEP also includes some interesting benchmarks.

如果希望使用多核,pyprocessing将定义一个基于进程的API来实现真正的并行。PEP还包含一些有趣的基准。

#2


35  

Python's a fairly easy language to thread in, but there are caveats. The biggest thing you need to know about is the Global Interpreter Lock. This allows only one thread to access the interpreter. This means two things: 1) you rarely ever find yourself using a lock statement in python and 2) if you want to take advantage of multi-processor systems, you have to use separate processes. EDIT: I should also point out that you can put some of the code in C/C++ if you want to get around the GIL as well.

Python是一种非常容易加入的语言,但是有一些注意事项。您需要知道的最大的事情是全局解释器锁。这只允许一个线程访问解释器。这意味着两件事:1)您很少在python中使用锁语句;2)如果您想利用多处理器系统,您必须使用单独的进程。编辑:我还应该指出,如果您想避开GIL,您也可以使用C/ c++编写一些代码。

Thus, you need to re-consider why you want to use threads. If you want to parallelize your app to take advantage of dual-core architecture, you need to consider breaking your app up into multiple processes.

因此,您需要重新考虑为什么要使用线程。如果您想要并行化应用程序以利用双核架构,您需要考虑将应用程序分解为多个进程。

If you want to improve responsiveness, you should CONSIDER using threads. There are other alternatives though, namely microthreading. There are also some frameworks that you should look into:

如果您想提高响应能力,您应该考虑使用线程。不过还有其他的替代方法,即微线程。还有一些框架你应该研究:

#3


19  

Below is a basic threading sample. It will spawn 20 threads; each thread will output its thread number. Run it and observe the order in which they print.

下面是一个基本的线程示例。它将产生20个线程;每个线程将输出它的线程数。运行它并观察它们打印的顺序。

import threading
class Foo (threading.Thread):
    def __init__(self,x):
        self.__x = x
        threading.Thread.__init__(self)
    def run (self):
          print str(self.__x)

for x in xrange(20):
    Foo(x).start()

As you have hinted at Python threads are implemented through time-slicing. This is how they get the "parallel" effect.

正如您所暗示的,Python线程是通过时间分割实现的。这就是他们获得“平行”效应的方式。

In my example my Foo class extends thread, I then implement the run method, which is where the code that you would like to run in a thread goes. To start the thread you call start() on the thread object, which will automatically invoke the run method...

在我的示例中,我的Foo类扩展了thread,然后实现run方法,这就是您希望在线程中运行的代码的位置。要启动您在thread对象上调用start()的线程,该对象将自动调用run方法……

Of course, this is just the very basics. You will eventually want to learn about semaphores, mutexes, and locks for thread synchronization and message passing.

当然,这只是最基本的。您最终将希望了解关于信号量、互斥量和线程同步和消息传递的锁。

#4


10  

Use threads in python if the individual workers are doing I/O bound operations. If you are trying to scale across multiple cores on a machine either find a good IPC framework for python or pick a different language.

如果单个工作者正在执行I/O绑定操作,请在python中使用线程。如果您试图在一台机器上跨多个核心进行扩展,您可以为python找到一个好的IPC框架,或者选择一种不同的语言。

#5


3  

One easy solution to the GIL is the multiprocessing module. It can be used as a drop in replacement to the threading module but uses multiple Interpreter processes instead of threads. Because of this there is a little more overhead than plain threading for simple things but it gives you the advantage of real parallelization if you need it. It also easily scales to multiple physical machines.

一个简单的GIL解决方案是多处理模块。它可以作为替换线程模块的一滴,但是使用多个解释器进程而不是线程。正因为如此,对于简单的事情来说,比普通的线程要多一些开销,但是如果您需要的话,它提供了真正的并行化的优势。它也很容易扩展到多个物理机器。

If you need truly large scale parallelization than I would look further but if you just want to scale to all the cores of one computer or a few different ones without all the work that would go into implementing a more comprehensive framework, than this is for you.

如果你真的需要大规模的并行化而不是我想要的但是如果你只是想要扩展到一台计算机的所有核心或者几个不同的核心而不需要做所有的工作来实现一个更全面的框架,比这对你来说是更好的。

#6


3  

Note: wherever I mention thread i mean specifically threads in python until explicitly stated.

注意:无论我提到什么线程,我指的都是python中的特定线程,直到明确指定为止。

Threads work a little differently in python if you are coming from C/C++ background. In python, Only one thread can be in running state at a given time.This means Threads in python cannot truly leverage the power of multiple processing cores since by design it's not possible for threads to run parallelly on multiple cores.

如果您来自于C/ c++背景,那么在python中,线程的工作方式略有不同。在python中,在给定的时间内,只有一个线程可以处于运行状态。这意味着python中的线程不能真正利用多处理内核的能力,因为按照设计,线程不可能在多个内核上并行运行。

As the memory management in python is not thread-safe each thread require an exclusive access to data structures in python interpreter.This exclusive access is acquired by a mechanism called GIL ( global interpretr lock ).

由于python中的内存管理不是线程安全的,每个线程都需要对python解释器中的数据结构进行独占访问。这种独占访问是通过一种名为GIL(全局解释性锁)的机制获得的。

Why does python use GIL?

为什么python使用GIL?

In order to prevent multiple threads from accessing interpreter state simultaneously and corrupting the interpreter state.

为了防止多个线程同时访问解释器状态并破坏解释器状态。

The idea is whenever a thread is being executed (even if it's the main thread), a GIL is acquired and after some predefined interval of time the GIL is released by the current thread and reacquired by some other thread( if any).

这个想法是当一个线程被执行时(即使它是主线程),一个GIL被获取,在一些预定义的时间间隔之后,GIL被当前线程释放并被其他线程(如果有的话)重新获得。

Why not simply remove GIL?

为什么不干脆去掉GIL呢?

It is not that its impossible to remove GIL, its just that in prcoess of doing so we end up putting mutiple locks inside interpreter in order to serialize access, which makes even a single threaded application less performant.

这并不是说它不可能移除GIL,只是在prcoess中,我们最终将mutiple锁放入解释器中以序列化访问,这使得单线程应用程序的性能更低。

so the cost of removing GIL is paid off by reduced performance of a single threaded application, which is never desired.

因此,去掉GIL的成本是通过降低单线程应用程序的性能来补偿的,这是不可取的。

So when does thread switching occurs in python?

那么,在python中什么时候发生线程切换呢?

Thread switch occurs when GIL is released.So when is GIL Released? There are two scenarios to take into consideration.

当GIL被释放时,线程切换发生。那么GIL是什么时候被释放的呢?有两种情况需要考虑。

If a Thread is doing CPU Bound operations(Ex image processing).

如果一个线程正在执行CPU绑定操作(图像处理)。

In Older versions of python , Thread switching used to occur after a fixed no of python instructions.It was by default set to 100.It turned out that its not a very good policy to decide when switching should occur since the time spent executing a single instruction can very wildly from millisecond to even a second.Therefore releasing GIL after every 100 instructions regardless of the time they take to execute is a poor policy.

在较早的python版本中,线程切换通常发生在python指令的固定no之后。默认设置为100。事实证明,决定何时切换不是一个很好的策略,因为执行一条指令所花费的时间从毫秒到一秒都可能非常大。因此,在每100条指令之后释放GIL,而不管它们花多少时间来执行是一个糟糕的策略。

In new versions instead of using instruction count as a metric to switch thread , a configurable time interval is used. The default switch interval is 5 milliseconds.you can get the current switch interval using sys.getswitchinterval(). This can be altered using sys.setswitchinterval()

在新版本中,不使用指令计数作为交换线程的指标,而是使用可配置的时间间隔。默认的切换间隔是5毫秒。您可以使用sys.getswitchinterval()获得当前的切换间隔。可以使用sys.setswitchinterval()对其进行修改。

If a Thread is doing some IO Bound Operations(Ex filesystem access or
network IO)

如果一个线程正在执行一些IO绑定操作(例如文件系统访问或网络IO)

GIL is release whenever the thread is waiting for some for IO operation to get completed.

每当线程等待IO操作完成时,就释放GIL。

Which thread to switch to next?

哪个线程切换到下一个?

The interpreter doesn’t have its own scheduler.which thread becomes scheduled at the end of the interval is the operating system’s decision. .

解释器没有自己的调度程序。在间隔结束时调度哪个线程是操作系统的决定。

#7


2  

Try to remember that the GIL is set to poll around every so often in order to do show the appearance of multiple tasks. This setting can be fine tuned, but I offer the suggestion that there should be work that the threads are doing or lots of context switches are going to cause problems.

试着记住,为了显示多个任务的外观,GIL经常被设置为轮询。这个设置可以进行很好的调整,但是我建议线程应该做一些工作,或者很多上下文切换会导致问题。

I would go so far as to suggest multiple parents on processors and try to keep like jobs on the same core(s).

我甚至建议在处理器上有多个双亲,并尽量保持相同的核心。

#1


46  

Yes, because of the Global Interpreter Lock (GIL) there can only run one thread at a time. Here are some links with some insights about this:

是的,因为全局解释器锁(GIL)每次只能运行一个线程。以下是一些与此相关的见解:

From the last link an interesting quote:

从最后一个链接有趣的引用:

Let me explain what all that means. Threads run inside the same virtual machine, and hence run on the same physical machine. Processes can run on the same physical machine or in another physical machine. If you architect your application around threads, you’ve done nothing to access multiple machines. So, you can scale to as many cores are on the single machine (which will be quite a few over time), but to really reach web scales, you’ll need to solve the multiple machine problem anyway.

让我解释一下这意味着什么。线程在同一虚拟机上运行,因此在同一物理机器上运行。进程可以在同一台物理机器上运行,也可以在另一台物理机器上运行。如果您围绕线程构建应用程序,那么您没有做任何事情来访问多台机器。因此,您可以扩展到同一台机器上的任意多核(随着时间的推移,这将是相当多的),但是要真正达到web规模,您仍然需要解决多机问题。

If you want to use multi core, pyprocessing defines an process based API to do real parallelization. The PEP also includes some interesting benchmarks.

如果希望使用多核,pyprocessing将定义一个基于进程的API来实现真正的并行。PEP还包含一些有趣的基准。

#2


35  

Python's a fairly easy language to thread in, but there are caveats. The biggest thing you need to know about is the Global Interpreter Lock. This allows only one thread to access the interpreter. This means two things: 1) you rarely ever find yourself using a lock statement in python and 2) if you want to take advantage of multi-processor systems, you have to use separate processes. EDIT: I should also point out that you can put some of the code in C/C++ if you want to get around the GIL as well.

Python是一种非常容易加入的语言,但是有一些注意事项。您需要知道的最大的事情是全局解释器锁。这只允许一个线程访问解释器。这意味着两件事:1)您很少在python中使用锁语句;2)如果您想利用多处理器系统,您必须使用单独的进程。编辑:我还应该指出,如果您想避开GIL,您也可以使用C/ c++编写一些代码。

Thus, you need to re-consider why you want to use threads. If you want to parallelize your app to take advantage of dual-core architecture, you need to consider breaking your app up into multiple processes.

因此,您需要重新考虑为什么要使用线程。如果您想要并行化应用程序以利用双核架构,您需要考虑将应用程序分解为多个进程。

If you want to improve responsiveness, you should CONSIDER using threads. There are other alternatives though, namely microthreading. There are also some frameworks that you should look into:

如果您想提高响应能力,您应该考虑使用线程。不过还有其他的替代方法,即微线程。还有一些框架你应该研究:

#3


19  

Below is a basic threading sample. It will spawn 20 threads; each thread will output its thread number. Run it and observe the order in which they print.

下面是一个基本的线程示例。它将产生20个线程;每个线程将输出它的线程数。运行它并观察它们打印的顺序。

import threading
class Foo (threading.Thread):
    def __init__(self,x):
        self.__x = x
        threading.Thread.__init__(self)
    def run (self):
          print str(self.__x)

for x in xrange(20):
    Foo(x).start()

As you have hinted at Python threads are implemented through time-slicing. This is how they get the "parallel" effect.

正如您所暗示的,Python线程是通过时间分割实现的。这就是他们获得“平行”效应的方式。

In my example my Foo class extends thread, I then implement the run method, which is where the code that you would like to run in a thread goes. To start the thread you call start() on the thread object, which will automatically invoke the run method...

在我的示例中,我的Foo类扩展了thread,然后实现run方法,这就是您希望在线程中运行的代码的位置。要启动您在thread对象上调用start()的线程,该对象将自动调用run方法……

Of course, this is just the very basics. You will eventually want to learn about semaphores, mutexes, and locks for thread synchronization and message passing.

当然,这只是最基本的。您最终将希望了解关于信号量、互斥量和线程同步和消息传递的锁。

#4


10  

Use threads in python if the individual workers are doing I/O bound operations. If you are trying to scale across multiple cores on a machine either find a good IPC framework for python or pick a different language.

如果单个工作者正在执行I/O绑定操作,请在python中使用线程。如果您试图在一台机器上跨多个核心进行扩展,您可以为python找到一个好的IPC框架,或者选择一种不同的语言。

#5


3  

One easy solution to the GIL is the multiprocessing module. It can be used as a drop in replacement to the threading module but uses multiple Interpreter processes instead of threads. Because of this there is a little more overhead than plain threading for simple things but it gives you the advantage of real parallelization if you need it. It also easily scales to multiple physical machines.

一个简单的GIL解决方案是多处理模块。它可以作为替换线程模块的一滴,但是使用多个解释器进程而不是线程。正因为如此,对于简单的事情来说,比普通的线程要多一些开销,但是如果您需要的话,它提供了真正的并行化的优势。它也很容易扩展到多个物理机器。

If you need truly large scale parallelization than I would look further but if you just want to scale to all the cores of one computer or a few different ones without all the work that would go into implementing a more comprehensive framework, than this is for you.

如果你真的需要大规模的并行化而不是我想要的但是如果你只是想要扩展到一台计算机的所有核心或者几个不同的核心而不需要做所有的工作来实现一个更全面的框架,比这对你来说是更好的。

#6


3  

Note: wherever I mention thread i mean specifically threads in python until explicitly stated.

注意:无论我提到什么线程,我指的都是python中的特定线程,直到明确指定为止。

Threads work a little differently in python if you are coming from C/C++ background. In python, Only one thread can be in running state at a given time.This means Threads in python cannot truly leverage the power of multiple processing cores since by design it's not possible for threads to run parallelly on multiple cores.

如果您来自于C/ c++背景,那么在python中,线程的工作方式略有不同。在python中,在给定的时间内,只有一个线程可以处于运行状态。这意味着python中的线程不能真正利用多处理内核的能力,因为按照设计,线程不可能在多个内核上并行运行。

As the memory management in python is not thread-safe each thread require an exclusive access to data structures in python interpreter.This exclusive access is acquired by a mechanism called GIL ( global interpretr lock ).

由于python中的内存管理不是线程安全的,每个线程都需要对python解释器中的数据结构进行独占访问。这种独占访问是通过一种名为GIL(全局解释性锁)的机制获得的。

Why does python use GIL?

为什么python使用GIL?

In order to prevent multiple threads from accessing interpreter state simultaneously and corrupting the interpreter state.

为了防止多个线程同时访问解释器状态并破坏解释器状态。

The idea is whenever a thread is being executed (even if it's the main thread), a GIL is acquired and after some predefined interval of time the GIL is released by the current thread and reacquired by some other thread( if any).

这个想法是当一个线程被执行时(即使它是主线程),一个GIL被获取,在一些预定义的时间间隔之后,GIL被当前线程释放并被其他线程(如果有的话)重新获得。

Why not simply remove GIL?

为什么不干脆去掉GIL呢?

It is not that its impossible to remove GIL, its just that in prcoess of doing so we end up putting mutiple locks inside interpreter in order to serialize access, which makes even a single threaded application less performant.

这并不是说它不可能移除GIL,只是在prcoess中,我们最终将mutiple锁放入解释器中以序列化访问,这使得单线程应用程序的性能更低。

so the cost of removing GIL is paid off by reduced performance of a single threaded application, which is never desired.

因此,去掉GIL的成本是通过降低单线程应用程序的性能来补偿的,这是不可取的。

So when does thread switching occurs in python?

那么,在python中什么时候发生线程切换呢?

Thread switch occurs when GIL is released.So when is GIL Released? There are two scenarios to take into consideration.

当GIL被释放时,线程切换发生。那么GIL是什么时候被释放的呢?有两种情况需要考虑。

If a Thread is doing CPU Bound operations(Ex image processing).

如果一个线程正在执行CPU绑定操作(图像处理)。

In Older versions of python , Thread switching used to occur after a fixed no of python instructions.It was by default set to 100.It turned out that its not a very good policy to decide when switching should occur since the time spent executing a single instruction can very wildly from millisecond to even a second.Therefore releasing GIL after every 100 instructions regardless of the time they take to execute is a poor policy.

在较早的python版本中,线程切换通常发生在python指令的固定no之后。默认设置为100。事实证明,决定何时切换不是一个很好的策略,因为执行一条指令所花费的时间从毫秒到一秒都可能非常大。因此,在每100条指令之后释放GIL,而不管它们花多少时间来执行是一个糟糕的策略。

In new versions instead of using instruction count as a metric to switch thread , a configurable time interval is used. The default switch interval is 5 milliseconds.you can get the current switch interval using sys.getswitchinterval(). This can be altered using sys.setswitchinterval()

在新版本中,不使用指令计数作为交换线程的指标,而是使用可配置的时间间隔。默认的切换间隔是5毫秒。您可以使用sys.getswitchinterval()获得当前的切换间隔。可以使用sys.setswitchinterval()对其进行修改。

If a Thread is doing some IO Bound Operations(Ex filesystem access or
network IO)

如果一个线程正在执行一些IO绑定操作(例如文件系统访问或网络IO)

GIL is release whenever the thread is waiting for some for IO operation to get completed.

每当线程等待IO操作完成时,就释放GIL。

Which thread to switch to next?

哪个线程切换到下一个?

The interpreter doesn’t have its own scheduler.which thread becomes scheduled at the end of the interval is the operating system’s decision. .

解释器没有自己的调度程序。在间隔结束时调度哪个线程是操作系统的决定。

#7


2  

Try to remember that the GIL is set to poll around every so often in order to do show the appearance of multiple tasks. This setting can be fine tuned, but I offer the suggestion that there should be work that the threads are doing or lots of context switches are going to cause problems.

试着记住,为了显示多个任务的外观,GIL经常被设置为轮询。这个设置可以进行很好的调整,但是我建议线程应该做一些工作,或者很多上下文切换会导致问题。

I would go so far as to suggest multiple parents on processors and try to keep like jobs on the same core(s).

我甚至建议在处理器上有多个双亲,并尽量保持相同的核心。