应用程序中的并发线程数是多少?

时间:2022-10-06 21:02:58

5, 100, 1000?

5,100,1000?

I guess, "it depends", but on what?

我猜,“这取决于”,但是在什么?

What is common in applications that run as server daemons / services?

在作为服务器守护程序/服务运行的应用程序中常见的是什么?

What are hard limits?

什么是硬限制?

Given that the machine can handle the overall workload, how do I determine at how many threads the overhead starts to have an impact on performance?

鉴于机器可以处理整体工作负载,我如何确定开销会对性能产生影响的线程数量?

What are important differences between OS's?

OS的重要区别是什么?

What else should be considered?

还应该考虑什么?

I'm asking because I would like to employ threads in an application to organize subcomponents of my application that do not share data and are designed to do their work in parallel. As the application would also use thread pools for parallelizing some tasks, I was wondering at what point I should start to think about the number of threads that's going to run in total.

我问,因为我想在应用程序中使用线程来组织我的应用程序的子组件,这些子组件不共享数据并且旨在并行执行它们的工作。由于应用程序也会使用线程池来并行化某些任务,我想知道在什么时候我应该开始考虑将要运行的线程总数。

I know the n+1 rule as a guideline for determining the number of threads that simultaneously work on the same task to gain performance. However, I want to use threads like one might use processes in a larger scope, i. e. to organize independent tasks that should not interfere with each other.

我知道n + 1规则是确定同时处理同一任务以获得性能的线程数的指导原则。但是,我想使用线程,比如可能在更大的范围内使用进程,i。即组织不应互相干扰的独立任务。

In this related question, some people advise to minimise the number of threads because of the added complexity. To me it seems that threads can also help to keep things sorted more orderly and actually reduce interference. Isn't that correct?

在这个相关问题中,有些人建议最小化线程数量,因为增加了复杂性。对我而言,线程似乎也可以帮助保持事物排序更有序,并实际上减少干扰。这不对吗?

4 个解决方案

#1


6  

I can't answer your question about "how much is many" but I agree that you should not use threads for every task possible.

我无法回答你关于“多少是多少”的问题,但我同意你不应该为每项任务使用线程。

The optimal amount of threads for performance of application is (n+1), where n is the amount of processors/cores your computer/claster has.

应用程序性能的最佳线程数为(n + 1),其中n是计算机/ claster所具有的处理器/核心数量。

The more your actual thread amount differs from n+1, the less optimal it gets and gets your system resources wasted on thread calculations.

您的实际线程数量与n + 1的差异越大,它获得的最佳效果就越差,并且会在线程计算中浪费您的系统资源。

So usually you use 1 thread for the UI, 1 thread for some generic tasks, and (n+1) threads for some huge-calculation tasks.

因此,通常您使用1个线程用于UI,1个线程用于某些通用任务,以及(n + 1)个线程用于某些大型计算任务。

#2


1  

Actually Ajmastrean is a little out of date. Quoting from his own link

实际上Ajmastrean有点过时了。引用他自己的链接

The thread pool has a default size of 250 worker threads per available processor, and 1000 I/O completion threads. The number of threads in the thread pool can be changed by using the SetMaxThreads method.

线程池的默认大小为每个可用处理器250个工作线程,以及1000个I / O完成线程。可以使用SetMaxThreads方法更改线程池中的线程数。

But generally I think 25 is really where the law of diminishing returns (and programmers abilities to keep track of what is going on) starts coming into effect. Although Max is right, as long as all of the threads are performing non-blocking calculations n+1 is the optimal number, in the real world most of the threading tasks I perform tend to be done on stuff with some kind of IO.

但一般来说,我认为25实际上是收益递减规律(以及程序员跟踪正在发生的事情的能力)开始生效的地方。尽管Max是正确的,但只要所有线程都执行非阻塞计算,n + 1是最佳数字,在现实世界中,我执行的大多数线程任务往往是在具有某种IO的东西上完成的。

#3


1  

Also depends on your architecture. E.g. in NVIDIA GPGPU lib CUDA you can put on an 8 thread multiprocessor 512 threads simoultanously. You may ask why assign each of the scalar processors 64 threads? The answer is easy: If the computation is not compute bound but memory IO bound, you can hide the mem latencies by executing other threads. Similar applies to normal CPUs. I can remember that a recommendation for the parallel option for make "-j" is to use approx 1.5 times the number of cores you got. Many of the compiling tasks are heavy IO burden and if a task has to wait for harddisk, mem ... whatever, CPU could work on a different thread.

还取决于您的架构。例如。在NVIDIA GPGPU lib CUDA中你可以同时放置一个8线程的多处理器512线程。您可能会问为什么要为每个标量处理器分配64个线程?答案很简单:如果计算不是计算绑定而是内存IO绑定,则可以通过执行其他线程来隐藏mem延迟。类似适用于普通CPU。我记得对于make“-j”的并行选项的建议是使用大约1.5倍的核心数量。许多编译任务都是沉重的IO负担,如果任务必须等待硬盘,mem ......无论如何,CPU可以在不同的线程上工作。

Next you have to consider, how expensive a task/thread switch is. E.g. it is comes free, while CPU has to perform some work for a context switch. So in general you have to estimate if the penalty for two task switches is longer than the time the thread would block (which depends heavily on your applications).

接下来,您必须考虑任务/线程切换的成本。例如。它是免费的,而CPU必须为上下文切换执行一些工作。因此,通常您必须估计两个任务切换的惩罚是否比线程阻塞的时间长(这在很大程度上取决于您的应用程序)。

#4


0  

Microsoft's ThreadPool class limits you to 25 threads per processor. The limit is based on context switching between threads and the memory consumed by each thread. So, that's a good guideline if you're on the Windows platform.

Microsoft的ThreadPool类将每个处理器限制为25个线程。该限制基于线程与每个线程消耗的内存之间的上下文切换。所以,如果您使用的是Windows平台,这是一个很好的指导。

#1


6  

I can't answer your question about "how much is many" but I agree that you should not use threads for every task possible.

我无法回答你关于“多少是多少”的问题,但我同意你不应该为每项任务使用线程。

The optimal amount of threads for performance of application is (n+1), where n is the amount of processors/cores your computer/claster has.

应用程序性能的最佳线程数为(n + 1),其中n是计算机/ claster所具有的处理器/核心数量。

The more your actual thread amount differs from n+1, the less optimal it gets and gets your system resources wasted on thread calculations.

您的实际线程数量与n + 1的差异越大,它获得的最佳效果就越差,并且会在线程计算中浪费您的系统资源。

So usually you use 1 thread for the UI, 1 thread for some generic tasks, and (n+1) threads for some huge-calculation tasks.

因此,通常您使用1个线程用于UI,1个线程用于某些通用任务,以及(n + 1)个线程用于某些大型计算任务。

#2


1  

Actually Ajmastrean is a little out of date. Quoting from his own link

实际上Ajmastrean有点过时了。引用他自己的链接

The thread pool has a default size of 250 worker threads per available processor, and 1000 I/O completion threads. The number of threads in the thread pool can be changed by using the SetMaxThreads method.

线程池的默认大小为每个可用处理器250个工作线程,以及1000个I / O完成线程。可以使用SetMaxThreads方法更改线程池中的线程数。

But generally I think 25 is really where the law of diminishing returns (and programmers abilities to keep track of what is going on) starts coming into effect. Although Max is right, as long as all of the threads are performing non-blocking calculations n+1 is the optimal number, in the real world most of the threading tasks I perform tend to be done on stuff with some kind of IO.

但一般来说,我认为25实际上是收益递减规律(以及程序员跟踪正在发生的事情的能力)开始生效的地方。尽管Max是正确的,但只要所有线程都执行非阻塞计算,n + 1是最佳数字,在现实世界中,我执行的大多数线程任务往往是在具有某种IO的东西上完成的。

#3


1  

Also depends on your architecture. E.g. in NVIDIA GPGPU lib CUDA you can put on an 8 thread multiprocessor 512 threads simoultanously. You may ask why assign each of the scalar processors 64 threads? The answer is easy: If the computation is not compute bound but memory IO bound, you can hide the mem latencies by executing other threads. Similar applies to normal CPUs. I can remember that a recommendation for the parallel option for make "-j" is to use approx 1.5 times the number of cores you got. Many of the compiling tasks are heavy IO burden and if a task has to wait for harddisk, mem ... whatever, CPU could work on a different thread.

还取决于您的架构。例如。在NVIDIA GPGPU lib CUDA中你可以同时放置一个8线程的多处理器512线程。您可能会问为什么要为每个标量处理器分配64个线程?答案很简单:如果计算不是计算绑定而是内存IO绑定,则可以通过执行其他线程来隐藏mem延迟。类似适用于普通CPU。我记得对于make“-j”的并行选项的建议是使用大约1.5倍的核心数量。许多编译任务都是沉重的IO负担,如果任务必须等待硬盘,mem ......无论如何,CPU可以在不同的线程上工作。

Next you have to consider, how expensive a task/thread switch is. E.g. it is comes free, while CPU has to perform some work for a context switch. So in general you have to estimate if the penalty for two task switches is longer than the time the thread would block (which depends heavily on your applications).

接下来,您必须考虑任务/线程切换的成本。例如。它是免费的,而CPU必须为上下文切换执行一些工作。因此,通常您必须估计两个任务切换的惩罚是否比线程阻塞的时间长(这在很大程度上取决于您的应用程序)。

#4


0  

Microsoft's ThreadPool class limits you to 25 threads per processor. The limit is based on context switching between threads and the memory consumed by each thread. So, that's a good guideline if you're on the Windows platform.

Microsoft的ThreadPool类将每个处理器限制为25个线程。该限制基于线程与每个线程消耗的内存之间的上下文切换。所以,如果您使用的是Windows平台,这是一个很好的指导。