多少个线程太多?

I am writing a server, and I branch each action of into a thread when the request is incoming. I do this because almost every request makes database query. I am using a threadpool library to cut down on construction/destruction of threads.

我正在编写一个服务器，当请求传入时，我将每个动作都分支到一个线程中。我这样做是因为几乎每个请求都会生成数据库查询。我正在使用一个threadpool库来减少线程的构建/销毁。

My question is though - what is a good cutoff point for I/O threads like these? I know it would just be a rough estimate, but are we talking hundreds? thousands?

我的问题是，对于像这样的I/O线程，什么是好的截止点?我知道这只是一个粗略的估计，但是我们说的是几百吗?成千上万的吗?

EDIT:

Thank you all for your responses, it seems like I am just going to have to test it to find out my thread count ceiling. The question is though: how do I know I've hit that ceiling? What exactly should I measure?

谢谢大家的回复，我好像要测试一下才能找到我的线程数上限。但问题是:我怎么知道我撞到了天花板?我到底应该测量什么?

12 个解决方案

#1

163

Some people would say that two threads is too many - I'm not quite in that camp :-)

有些人会说两个线程太多了——我不太喜欢这个阵营:-)

Here's my advice: measure, don't guess. One suggestion is to make it configurable and initially set it to 100, then release your software to the wild and monitor what happens.

我的建议是:测量，不要猜测。一个建议是让它可配置，并在最初将其设置为100，然后将你的软件发布到野外并监控发生了什么。

If your thread usage peaks at 3, then 100 is too much. If it remains at 100 for most of the day, bump it up to 200 and see what happens.

如果您的线程使用率在3时达到峰值，那么100就太多了。如果它在一天的大部分时间里保持在100，那么就把它提高到200，看看会发生什么。

You could actually have your code itself monitor usage and adjust the configuration for the next time it starts but that's probably overkill.

实际上，您可以让代码本身监视使用情况，并在下次启动时调整配置，但这可能是过量的。

For clarification and elaboration:

为澄清和阐述:

I'm not advocating rolling your own thread pooling subsystem, by all means use the one you have. But, since you were asking about a good cut-off point for threads, I assume your thread pool implementation has the ability to limit the maximum number of threads created (which is a good thing).

我并不提倡使用自己的线程池系统，而是使用已有的线程池。但是，既然您已经询问了关于线程的一个好的截止点，我假定您的线程池实现有能力限制创建的线程的最大数量(这是一件好事)。

I've written thread and database connection pooling code and they have the following features (which I believe are essential for performance):

我已经编写了线程和数据库连接池代码，它们具有以下特性(我认为这对性能非常重要):

a minimum number of active threads.
最小活动线程数。
a maximum number of threads.
线程的最大数量。
shutting down threads that haven't been used for a while.
关闭没有使用一段时间的线程。

The first sets a baseline for minimum performance in terms of the thread pool client (this number of threads is always available for use). The second sets a restriction on resource usage by active threads. The third returns you to the baseline in quiet times so as to minimise resource use.

首先为线程池客户端设置最低性能基准(此线程的数量总是可用的)。第二种方法是通过活动线程限制资源的使用。第三种方法将您在静默时间内返回基线，以尽量减少资源的使用。

You need to balance the resource usage of having unused threads (A) against the resource usage of not having enough threads to do the work (B).

您需要平衡使用未使用线程(A)的资源使用量与没有足够的线程来完成工作(B)的资源使用情况。

(A) is generally memory usage (stacks and so on) since a thread doing no work will not be using much of the CPU. (B) will generally be a delay in the processing of requests as they arrive as you need to wait for a thread to become available.

(A)通常是内存使用(堆栈等)，因为没有工作的线程不会占用太多CPU。(B)当请求到达时，通常会延迟处理请求，因为您需要等待线程可用。

That's why you measure. As you state, the vast majority of your threads will be waiting for a response from the database so they won't be running. There are two factors that affect how many threads you should allow for.

这就是为什么你的措施。当您状态时，绝大多数线程将等待数据库的响应，这样它们就不会运行了。有两个因素影响您应该考虑多少线程。

The first is the number of DB connections available. This may be a hard limit unless you can increase it at the DBMS - I'm going to assume your DBMS can take an unlimited number of connections in this case (although you should ideally be measuring that as well).

第一个是可用的DB连接数。这可能是一个硬的限制，除非你可以在DBMS中增加它——我将假设你的DBMS可以在这个情况下获得无限的连接(尽管你最好也应该测量它)。

Then, the number of threads you should have depend on your historical use. The minimum you should have running is the minimum number that you've ever had running + A%, with an absolute minimum of (for example, and make it configurable just like A) 5.

那么，您应该依赖于您的历史使用的线程数。您应该运行的最小值是您曾经运行过的最小值+ %，并且具有绝对最小值(例如，使其可配置为A)。

The maximum number of threads should be your historical maximum + B%.

线程的最大数量应该是您的历史最大值+ B%。

You should also be monitoring for behaviour changes. If, for some reason, your usage goes to 100% of available for a significant time (so that it would affect the performance of clients), you should bump up the maximum allowed until it's once again B% higher.

您还应该监视行为变化。如果，出于某种原因，您的使用会在很长时间内达到100%(这样会影响客户的性能)，那么您应该提高允许的最大值，直到再次提高B%。

In response to the "what exactly should I measure?" question:

针对“我究竟应该测量什么?”问题:

What you should measure specifically is the maximum amount of threads in concurrent use (e.g., waiting on a return from the DB call) under load. Then add a safety factor of 10% for example (emphasised, since other posters seem to take my examples as fixed recommendations).

您应该特别度量的是在负载下并发使用(例如，等待从DB调用返回)的最大线程数量。然后加上一个10%的安全系数(强调，因为其他的海报似乎把我的例子作为固定的建议)。

In addition, this should be done in the production environment for tuning. It's okay to get an estimate beforehand but you never know what production will throw your way (which is why all these things should be configurable at runtime). This is to catch a situation such as unexpected doubling of the client calls coming in.

此外，这应该在生产环境中进行调优。事先得到一个预估是可以的，但是你永远不知道生产会给你带来什么(这就是为什么所有这些东西在运行时都是可配置的)。这是为了捕获一种情况，例如客户端调用的意外翻倍。

#2

This question has been discussed quite thoroughly and I didn't get a chance to read all the responses. But here's few things to take into consideration while looking at the upper limit on number of simultaneous threads that can co-exist peacefully in a given system.

这个问题已经讨论得很透彻了，我没有机会读到所有的回复。但是，在考虑在给定系统中可以和平共存的同时线程数量的上限时，需要考虑的事情很少。

Thread Stack Size : In Linux the default thread stack size is 8MB (you can use ulimit -a to find it out).
线程堆栈大小:在Linux中默认的线程堆栈大小是8MB(可以使用ulimit -a来查找)。
Max Virtual memory that a given OS variant supports. Linux Kernel 2.4 supports a memory address space of 2 GB. with Kernel 2.6 , I a bit bigger (3GB )
一个给定OS变体支持的最大虚拟内存。Linux内核2.4支持2 GB的内存地址空间。内核2.6，我稍大一点(3GB)
[1] shows the calculations for the max number of threads per given Max VM Supported. For 2.4 it turns out to be about 255 threads. for 2.6 the number is a bit larger.
[1]显示了每个给定的max VM支持的最大线程数的计算。对于2.4，结果是大约255个线程。对于2.6，这个数字要大一些。
What kindda kernel scheduler you have . Comparing Linux 2.4 kernel scheduler with 2.6 , the later gives you a O(1) scheduling with no dependence upon number of tasks existing in a system while first one is more of a O(n). So also the SMP Capabilities of the kernel schedule also play a good role in max number of sustainable threads in a system.
您有什么kindda内核调度程序。将Linux 2.4内核调度器与2.6进行比较，稍后给出一个O(1)调度，不依赖于系统中现有的任务数，而第一个是O(n)。因此，内核调度的SMP能力也在系统中可持续线程的最大数量中发挥了良好的作用。

Now you can tune your stack size to incorporate more threads but then you have to take into account the overheads of thread management(creation/destruction and scheduling). You can enforce CPU Affinity to a given process as well as to a given thread to tie them down to specific CPUs to avoid thread migration overheads between the CPUs and avoid cold cash issues.

现在您可以调整堆栈大小来合并更多的线程，但是您必须考虑到线程管理的开销(创建/销毁和调度)。您可以强制执行CPU与给定进程的关联，以及给定的线程将它们绑定到特定的CPU，以避免CPU之间的线程迁移开销，避免现金流问题。

Note that one can create thousands of threads at his/her wish , but when Linux runs out of VM it just randomly starts killing processes (thus threads). This is to keep the utility profile from being maxed out. (The utility function tells about system wide utility for a given amount of resources. With a constant resources in this case CPU Cycles and Memory, the utility curve flattens out with more and more number of tasks ).

注意，一个人可以在他/她的愿望中创建成千上万个线程，但是当Linux从VM中耗尽时，它只是随机地开始杀死进程(因此线程)。这是为了使效用曲线不被透支。(效用函数说明了给定数量的资源的系统范围。在这种情况下，随着CPU周期和内存的不断增加，效用曲线会随着越来越多的任务而变平。

I am sure windows kernel scheduler also does something of this sort to deal with over utilization of the resources

我相信windows内核调度器也会做一些类似的事情来处理资源的过度使用。

[1] http://adywicaksono.wordpress.com/2007/07/10/i-can-not-create-more-than-255-threads-on-linux-what-is-the-solutions/

[1]http://adywicaksono.wordpress.com/2007/07/10/i -不-创造-更多-比- 255 -线程- linux - -是什么solutions/

#3

If your threads are performing any kind of resource-intensive work (CPU/Disk) then you'll rarely see benefits beyond one or two, and too many will kill performance very quickly.

如果您的线程正在执行任何类型的资源密集型工作(CPU/磁盘)，那么您很少会看到超过一两个的好处，而且太多会很快杀死性能。

The 'best-case' is that your later threads will stall while the first ones complete, or some will have low-overhead blocks on resources with low contention. Worst-case is that you start thrashing the cache/disk/network and your overall throughput drops through the floor.

“最好的情况”是，当第一个线程完成时，您的后续线程将会停止运行，或者一些线程会在低开销的资源上使用低开销的块。最坏的情况是，你开始在缓存/磁盘/网络上打滚，你的总吞吐量降低了。

A good solution is to place requests in a pool that are then dispatched to worker threads from a thread-pool (and yes, avoiding continuous thread creation/destruction is a great first step).

一个好的解决方案是将请求放在一个池中，然后将这些请求发送到线程池中的工作线程(是的，避免连续的线程创建/销毁是一个伟大的第一步)。

The number of active threads in this pool can then be tweaked and scaled based on the findings of your profiling, the hardware you are running on, and other things that may be occurring on the machine.

然后，根据您的分析结果、正在运行的硬件以及机器上可能发生的其他事情，可以对这个池中活动线程的数量进行调整和缩放。

#4

One thing you should keep in mind is that python (at least the C based version) uses what's called a global interpreter lock that can have a huge impact on performance on mult-core machines.

您应该记住的一点是，python(至少是基于C的版本)使用了所谓的全局解释器锁，它可以对mult-core机器的性能产生巨大的影响。

If you really need the most out of multithreaded python, you might want to consider using Jython or something.

如果您确实需要使用多线程的python，那么您可能需要考虑使用Jython或其他东西。

#5

As Pax rightly said, measure, don't guess. That what I did for DNSwitness and the results were suprising: the ideal number of threads was much higher than I thought, something like 15,000 threads to get the fastest results.

正如帕克斯所说的，测量，不要猜测。我为DNSwitness所做的事情和结果都是令人惊奇的:理想的线程数比我想象的要高得多，大约有15,000个线程，以获得最快的结果。

Of course, it depends on many things, that's why you must measure yourself.

当然，这取决于很多事情，所以你必须衡量自己。

Complete measures (in French only) in Combien de fils d'exécution ?.

完整的措施(只在法语中)在执行中?。

#6

I've written a number of heavily multi-threaded apps. I generally allow the number of potential threads to be specified by a configuration file. When I've tuned for specific customers, I've set the number high enough that my utilization of the all the CPU cores was pretty high, but not so high that I ran into memory problems (these were 32-bit operating systems at the time).

我已经写了很多多线程的应用程序。我通常允许由配置文件指定潜在线程的数量。当我为特定的客户进行调优时，我已经设置了足够高的数字，使我对所有CPU内核的利用率都很高，但不是很高，以至于我遇到了内存问题(当时是32位操作系统)。

Put differently, once you reach some bottleneck be it CPU, database throughput, disk throughput, etc, adding more threads won't increase the overall performance. But until you hit that point, add more threads!

换句话说，一旦遇到瓶颈，比如CPU、数据库吞吐量、磁盘吞吐量等，添加更多的线程不会增加整体性能。但是，在您达到这一点之前，添加更多的线程!

Note that this assumes the system(s) in question are dedicated to your app, and you don't have to play nicely (avoid starving) other apps.

请注意，这是假定系统(s)是专门针对你的应用程序的，而且你不需要很好地玩(避免挨饿)其他应用。

#7

I think this is a bit of a dodge to your question, but why not fork them into processes? My understanding of networking (from the hazy days of yore, I don't really code networks at all) was that each incoming connection can be handled as a separate process, because then if someone does something nasty in your process, it doesn't nuke the entire program.

我认为这是对你的问题的一种回避，但是为什么不把它们分成过程呢?我对网络的理解(从过去的模糊的日子里，我根本没有真正的编码网络)是每一个进入的连接都可以作为一个单独的过程来处理，因为如果某人在你的过程中做了一些令人讨厌的事情，它就不会破坏整个程序。

#8

The "big iron" answer is generally one thread per limited resource -- processor (CPU bound), arm (I/O bound), etc -- but that only works if you can route the work to the correct thread for the resource to be accessed.

“大铁”的答案通常是一个线程，每个有限的资源——处理器(CPU限制)、arm (I/O绑定)等等——但是只有当您能够将工作路由到正确的线程以访问资源时，这才会起作用。

Where that's not possible, consider that you have fungible resources (CPUs) and non-fungible resources (arms). For CPUs it's not critical to assign each thread to a specific CPU (though it helps with cache management), but for arms, if you can't assign a thread to the arm, you get into queuing theory and what's optimal number to keep arms busy. Generally I'm thinking that if you can't route requests based on the arm used, then having 2-3 threads per arm is going to be about right.

在不可能的情况下，考虑你有可替代的资源(cpu)和不可替代的资源(武器)。对于CPU，将每个线程分配给一个特定的CPU并不重要(虽然它有助于缓存管理)，但是对于arm，如果您不能为arm分配一个线程，那么您就会进入排队论和什么是最优的数量来保持手臂繁忙。一般来说，我认为如果你不能根据使用的手臂来发送请求，那么每个手臂有2-3个线程将会是正确的。

A complication comes about when the unit of work passed to the thread doesn't execute a reasonably atomic unit of work. Eg, you may have the thread at one point access the disk, at another point wait on a network. This increases the number of "cracks" where additional threads can get in and do useful work, but it also increases the opportunity for additional threads to pollute each other's caches, etc, and bog the system down.

当传递给线程的工作单元不执行一个合理的原子单元时，会出现一个复杂的问题。例如，您可能有一个线程在一个点访问磁盘，在另一个点在网络上等待。这增加了“裂缝”的数量，增加了额外的线程可以进入和做有用的工作，但是它也增加了额外的线程污染彼此的缓存等的机会，并使系统崩溃。

Of course, you must weigh all this against the "weight" of a thread. Unfortunately, most systems have very heavyweight threads (and what they call "lightweight threads" often aren't threads at all), so it's better to err on the low side.

当然，您必须权衡所有这些与线程的“重量”。不幸的是，大多数系统都有非常重量级的线程(他们称之为“轻量级线程”通常都不是线程)，所以最好在低的方面犯错。

What I've seen in practice is that very subtle differences can make an enormous difference in how many threads are optimal. In particular, cache issues and lock conflicts can greatly limit the amount of practical concurrency.

我在实践中看到的是，非常细微的差别可以在多少线程最优的情况下产生巨大的差异。特别是，缓存问题和锁冲突可以极大地限制实际并发性的数量。

#9

One thing to consider is how many cores exist on the machine that will be executing the code. That represents a hard limit on how many threads can be proceeding at any given time. However, if, as in your case, threads are expected to be frequently waiting for a database to execute a query, you will probably want to tune your threads based on how many concurrent queries the database can process.

需要考虑的一件事是，在执行代码的机器上存在多少内核。这表示在任何给定的时间内可以执行多少线程的硬限制。但是，如果像在您的情况中一样，希望线程经常等待数据库执行查询，那么您可能需要根据数据库可以处理多少并发查询来调整线程。

#10

ryeguy, I am currently developing a similar application and my threads number is set to 15. Unfortunately if I increase it at 20, it crashes. So, yes, I think the best way to handle this is to measure whether or not your current configuration allows more or less than a number X of threads.

ryeguy，我现在正在开发一个类似的应用程序，我的线程数被设置为15。不幸的是，如果我把它增加到20，它就会崩溃。所以，是的，我认为最好的处理方法是测量你当前的配置是否允许多于或少于一个数字X的线程。

#11

-6

In most cases you should allow the thread pool to handle this. If you post some code or give more details it might be easier to see if there is some reason the default behavior of the thread pool would not be best.

在大多数情况下，您应该允许线程池来处理这个问题。如果您发布了一些代码或提供了更多的细节，那么就更容易看出线程池的默认行为是不是最好的了。

You can find more information on how this should work here: http://en.wikipedia.org/wiki/Thread_pool_pattern

您可以在这里找到更多关于该如何工作的信息:http://en.wikipedia.org/wiki/Thread_pool_pattern。

#12

-10

As many threads as the CPU cores is what I've heard very often.

和CPU核心一样多的线程是我经常听到的。

#1

163