I have a networking Linux application which receives RTP streams from multiple destinations, does very simple packet modification and then forwards the streams to the final destination.
我有一个网络Linux应用程序,它接收来自多个目的地的RTP流,进行非常简单的包修改,然后将流转发到最终目的地。
How do I decide how many threads I should have to process the data? I suppose, I cannot open a thread for each RTP stream as there could be thousands. Should I take into account the number of CPU cores? What else matters? Thanks.
如何确定需要处理多少个线程?我想,我不能为每个RTP流打开一个线程,因为可能有数千个。我应该考虑CPU内核的数量吗?还有什么问题吗?谢谢。
7 个解决方案
#1
22
It is important to understand the purpose of using multiple threads on a server; many threads in a server serve to decrease latency rather than to increase speed. You don't make the cpu more faster by having more threads but you make it more likely a thread will always appear at within a given period to handle a request.
理解在服务器上使用多个线程的目的是很重要的;服务器中的许多线程用于降低延迟而不是提高速度。您不必通过拥有更多的线程来使cpu运行得更快,但您可以使线程在给定的时间段内出现以处理请求的可能性更大。
Having a bunch of threads which just move data in parallel is a rather inefficient shot-gun (Creating one thread per request naturally just fails completely). Using the thread pool pattern can be a more effective, focused approach to decreasing latency.
有一堆线程只是并行地移动数据,这是一种相当低效的枪炮(每个请求创建一个线程自然就会完全失败)。使用线程池模式可以是减少延迟的更有效、更集中的方法。
Now, in the thread pool, you want to have at least as many threads as you have CPUs/cores. You can have more than this but the extra threads will again only decrease latency and not increase speed.
现在,在线程池中,您希望拥有至少与cpu /内核相同的线程数。您可以有更多的线程,但额外的线程将再次减少延迟,而不是增加速度。
Think the problem of organizing server threads as akin to organizing a line in a super market. Would you like to have a lot of cashiers who work more slowly or one cashier who works super fast? The problem with the fast cashier isn't speed but rather that one customer with a lot of groceries might still take up a lot of their time. The need for many threads comes from the possibility that a few request that will take a lot of time and block all your threads. By this reasoning, whether you benefit from many slower cashiers depends on whether your have the same number of groceries or wildly different numbers. Getting back to the basic model, what this means is that you have to play with your thread number to figure what is optimal given the particular characteristics of your traffic, looking at the time taken to process each request.
认为组织服务器线程的问题类似于在超级市场中组织一条线。你想要很多工作比较慢的收银员,还是一个超级快的收银员?快速收银员的问题不在于速度,而在于一个拥有大量杂货的顾客可能仍然会占用他们很多时间。需要许多线程的原因是可能有几个请求会占用大量时间并阻塞所有线程。通过这种推理,你是否能从许多较慢的收银员那里受益,取决于你是否拥有相同数量的杂货或完全不同的数字。回到基本模型,这意味着您必须利用您的线程数,根据您的流量的特定特性,考虑处理每个请求所需的时间,确定什么是最优的。
#2
10
Classically the number of reasonable threads is depending on the number of execution units, the ratio of IO to computation and the available memory.
一般来说,合理线程的数量取决于执行单元的数量、IO与计算的比率以及可用内存。
Number of Execution Units (XU
)
That counts how many threads can be active at the same time. Depending on your computations that might or might not count stuff like hyperthreads -- mixed instruction workloads work better.
它计算同时可以激活多少个线程。混合指令工作负载的工作效果更好,这取决于您的计算,它可能会计数超线程之类的东西,也可能不会计数。
Ratio of IO to Computation (%IO
)
If the threads never wait for IO but always compute (%IO = 0), using more threads than XUs only increase the overhead of memory pressure and context switching. If the threads always wait for IO and never compute (%IO = 1) then using a variant of poll()
or select()
might be a good idea.
如果线程从不等待IO而总是计算(%IO = 0),那么使用比XUs更多的线程只会增加内存压力和上下文切换的开销。如果线程总是等待IO而从不计算(%IO = 1),那么使用poll()或select()的变体可能是个好主意。
For all other situations XU / %IO
gives an approximation of how many threads are needed to fully use the available XUs.
对于所有其他情况,XU / %IO给出了充分使用可用的XUs所需的线程数量的近似值。
Available Memory (Mem
)
This is more of a upper limit. Each thread uses a certain amount of system resources (MemUse
). Mem / MemUse
gives you an approximation of how many threads can be supported by the system.
这是一个上限。每个线程使用一定数量的系统资源(MemUse)。Mem / MemUse提供了系统可以支持的线程数量的近似值。
Other Factors
The performance of the whole system can still be constrained by other factors even if you can guess or (better) measure the numbers above. For example, there might be another service running on the system, which uses some of the XUs and memory. Another problem is general available IO bandwidth (IOCap
). If you need less computing resources per transferred byte than your XUs provide, obviously you'll need to care less about using them completely and more about increasing IO throughput.
即使您能够猜测或(更好地)测量上述数字,整个系统的性能仍然会受到其他因素的限制。例如,系统上可能运行另一个服务,它使用一些XUs和内存。另一个问题是一般可用IO带宽(IOCap)。如果您需要的计算资源比您的XUs提供的要少,那么很明显,您需要关心的不是完全使用它们,而是增加IO吞吐量。
For more about this latter problem, see this Google Talk about the Roofline Model.
关于后一个问题的更多信息,请参见谷歌讨论Roofline模型。
#3
4
I'd say, try using just ONE thread; it makes programming much easier. Although you'll need to use something like libevent to multiplex the connections, you won't have any unexpected synchronisation issues.
我想说,试着只用一个线程;它使编程更容易。尽管您需要使用libevent之类的东西对连接进行多路复用,但您不会遇到任何意外的同步问题。
Once you've got a working single-threaded implementation, you can do performance testing and make a decision on whether a multi-threaded one is necessary.
一旦您有了一个工作的单线程实现,您就可以进行性能测试并决定是否需要一个多线程实现。
Even if a multithreaded implementation is necessary, it may be easier to break it into several processes instead of threads (i.e. not sharing address space; either fork() or exec multiple copies of the process from a parent) if they don't have a lot of shared data.
即使需要多线程实现,也可以更容易地将其分解为多个进程,而不是线程(即不共享地址空间;如果它们没有很多共享数据,那么可以使用fork()或执行父进程的多个副本)。
You could also consider using something like Python's "Twisted" to make implementation easier (this is what it's designed for).
您还可以考虑使用类似Python的“Twisted”来简化实现(这正是它的目的)。
Really there's probably not a good case for using threads over processes - but maybe there is in your case, it's difficult to say. It depends how much data you need to share between threads.
实际上,在进程上使用线程可能并不是一个好例子——但在您的例子中,这很难说。这取决于您需要在线程之间共享多少数据。
#4
2
I would look into a thread pool for this application.
我将查看这个应用程序的线程池。
http://threadpool.sourceforge.net/
http://threadpool.sourceforge.net/
Allow the thread pool to manage your threads and the queue.
允许线程池管理线程和队列。
You can tweak the maximum and minimum number of threads used based on performance profiling later.
您可以根据稍后的性能分析调整使用的最大和最小线程数。
#5
2
Listen to the people advising you to use libevent (or OS specific utilities such as epoll/kqueue). In the case of many connections this is an absolute must because, like you said, creating threads will be an enormous perfomance hit, and select() also doesn't quite cut it.
听听人们建议您使用libevent(或特定于OS的实用程序,如epoll/kqueue)。在许多连接的情况下,这是绝对必须的,因为,正如您所说,创建线程将是一个巨大的性能,并且select()也没有完全减少。
#6
1
Let your program decide. Add code to it that measures throughput and increases/decreases the number of threads dynamically to maximize it.
让你的程序决定。添加代码以度量吞吐量,并动态地增加/减少线程的数量以使其最大化。
This way, your application will always perform well, regardless of the number of execution cores and other factors
通过这种方式,无论执行核心的数量和其他因素如何,您的应用程序将始终表现良好
#7
0
It is a good idea to avoid trying to create one (or even N) threads per client request. This approach is classically non-scalable and you will definitely run into problems with memory usage or context switching. You should look at using a thread pool approach instead and look at the incoming requests as tasks for any thread in the pool to handle. The scalability of this approach is then limited by the ideal number of threads in the pool - usually this is related to the number of CPU cores. You want to try to have each thread use exactly 100% of the CPU on a single core - so in the ideal case you would have 1 thread per core, this will reduce context switching to zero. Depending on the nature of the tasks, this might not be possible, maybe the threads have to wait for external data, or read from disk or whatever so you may find that the number of threads is increased by some scaling factor.
最好避免为每个客户端请求创建一个(甚至N个)线程。这种方法通常是不可扩展的,您肯定会遇到内存使用或上下文切换的问题。您应该考虑使用线程池方法,而将传入的请求看作是池中任何线程处理的任务。这种方法的可扩展性随后受到池中理想线程数的限制——通常这与CPU内核的数量有关。您希望让每个线程在一个核心上使用100%的CPU—所以在理想情况下,每个内核有一个线程,这将减少上下文切换到0。根据任务的性质,这可能是不可能的,也许线程必须等待外部数据,或者从磁盘读取数据,或者其他什么,因此您可能会发现线程的数量增加了一些比例因子。
#1
22
It is important to understand the purpose of using multiple threads on a server; many threads in a server serve to decrease latency rather than to increase speed. You don't make the cpu more faster by having more threads but you make it more likely a thread will always appear at within a given period to handle a request.
理解在服务器上使用多个线程的目的是很重要的;服务器中的许多线程用于降低延迟而不是提高速度。您不必通过拥有更多的线程来使cpu运行得更快,但您可以使线程在给定的时间段内出现以处理请求的可能性更大。
Having a bunch of threads which just move data in parallel is a rather inefficient shot-gun (Creating one thread per request naturally just fails completely). Using the thread pool pattern can be a more effective, focused approach to decreasing latency.
有一堆线程只是并行地移动数据,这是一种相当低效的枪炮(每个请求创建一个线程自然就会完全失败)。使用线程池模式可以是减少延迟的更有效、更集中的方法。
Now, in the thread pool, you want to have at least as many threads as you have CPUs/cores. You can have more than this but the extra threads will again only decrease latency and not increase speed.
现在,在线程池中,您希望拥有至少与cpu /内核相同的线程数。您可以有更多的线程,但额外的线程将再次减少延迟,而不是增加速度。
Think the problem of organizing server threads as akin to organizing a line in a super market. Would you like to have a lot of cashiers who work more slowly or one cashier who works super fast? The problem with the fast cashier isn't speed but rather that one customer with a lot of groceries might still take up a lot of their time. The need for many threads comes from the possibility that a few request that will take a lot of time and block all your threads. By this reasoning, whether you benefit from many slower cashiers depends on whether your have the same number of groceries or wildly different numbers. Getting back to the basic model, what this means is that you have to play with your thread number to figure what is optimal given the particular characteristics of your traffic, looking at the time taken to process each request.
认为组织服务器线程的问题类似于在超级市场中组织一条线。你想要很多工作比较慢的收银员,还是一个超级快的收银员?快速收银员的问题不在于速度,而在于一个拥有大量杂货的顾客可能仍然会占用他们很多时间。需要许多线程的原因是可能有几个请求会占用大量时间并阻塞所有线程。通过这种推理,你是否能从许多较慢的收银员那里受益,取决于你是否拥有相同数量的杂货或完全不同的数字。回到基本模型,这意味着您必须利用您的线程数,根据您的流量的特定特性,考虑处理每个请求所需的时间,确定什么是最优的。
#2
10
Classically the number of reasonable threads is depending on the number of execution units, the ratio of IO to computation and the available memory.
一般来说,合理线程的数量取决于执行单元的数量、IO与计算的比率以及可用内存。
Number of Execution Units (XU
)
That counts how many threads can be active at the same time. Depending on your computations that might or might not count stuff like hyperthreads -- mixed instruction workloads work better.
它计算同时可以激活多少个线程。混合指令工作负载的工作效果更好,这取决于您的计算,它可能会计数超线程之类的东西,也可能不会计数。
Ratio of IO to Computation (%IO
)
If the threads never wait for IO but always compute (%IO = 0), using more threads than XUs only increase the overhead of memory pressure and context switching. If the threads always wait for IO and never compute (%IO = 1) then using a variant of poll()
or select()
might be a good idea.
如果线程从不等待IO而总是计算(%IO = 0),那么使用比XUs更多的线程只会增加内存压力和上下文切换的开销。如果线程总是等待IO而从不计算(%IO = 1),那么使用poll()或select()的变体可能是个好主意。
For all other situations XU / %IO
gives an approximation of how many threads are needed to fully use the available XUs.
对于所有其他情况,XU / %IO给出了充分使用可用的XUs所需的线程数量的近似值。
Available Memory (Mem
)
This is more of a upper limit. Each thread uses a certain amount of system resources (MemUse
). Mem / MemUse
gives you an approximation of how many threads can be supported by the system.
这是一个上限。每个线程使用一定数量的系统资源(MemUse)。Mem / MemUse提供了系统可以支持的线程数量的近似值。
Other Factors
The performance of the whole system can still be constrained by other factors even if you can guess or (better) measure the numbers above. For example, there might be another service running on the system, which uses some of the XUs and memory. Another problem is general available IO bandwidth (IOCap
). If you need less computing resources per transferred byte than your XUs provide, obviously you'll need to care less about using them completely and more about increasing IO throughput.
即使您能够猜测或(更好地)测量上述数字,整个系统的性能仍然会受到其他因素的限制。例如,系统上可能运行另一个服务,它使用一些XUs和内存。另一个问题是一般可用IO带宽(IOCap)。如果您需要的计算资源比您的XUs提供的要少,那么很明显,您需要关心的不是完全使用它们,而是增加IO吞吐量。
For more about this latter problem, see this Google Talk about the Roofline Model.
关于后一个问题的更多信息,请参见谷歌讨论Roofline模型。
#3
4
I'd say, try using just ONE thread; it makes programming much easier. Although you'll need to use something like libevent to multiplex the connections, you won't have any unexpected synchronisation issues.
我想说,试着只用一个线程;它使编程更容易。尽管您需要使用libevent之类的东西对连接进行多路复用,但您不会遇到任何意外的同步问题。
Once you've got a working single-threaded implementation, you can do performance testing and make a decision on whether a multi-threaded one is necessary.
一旦您有了一个工作的单线程实现,您就可以进行性能测试并决定是否需要一个多线程实现。
Even if a multithreaded implementation is necessary, it may be easier to break it into several processes instead of threads (i.e. not sharing address space; either fork() or exec multiple copies of the process from a parent) if they don't have a lot of shared data.
即使需要多线程实现,也可以更容易地将其分解为多个进程,而不是线程(即不共享地址空间;如果它们没有很多共享数据,那么可以使用fork()或执行父进程的多个副本)。
You could also consider using something like Python's "Twisted" to make implementation easier (this is what it's designed for).
您还可以考虑使用类似Python的“Twisted”来简化实现(这正是它的目的)。
Really there's probably not a good case for using threads over processes - but maybe there is in your case, it's difficult to say. It depends how much data you need to share between threads.
实际上,在进程上使用线程可能并不是一个好例子——但在您的例子中,这很难说。这取决于您需要在线程之间共享多少数据。
#4
2
I would look into a thread pool for this application.
我将查看这个应用程序的线程池。
http://threadpool.sourceforge.net/
http://threadpool.sourceforge.net/
Allow the thread pool to manage your threads and the queue.
允许线程池管理线程和队列。
You can tweak the maximum and minimum number of threads used based on performance profiling later.
您可以根据稍后的性能分析调整使用的最大和最小线程数。
#5
2
Listen to the people advising you to use libevent (or OS specific utilities such as epoll/kqueue). In the case of many connections this is an absolute must because, like you said, creating threads will be an enormous perfomance hit, and select() also doesn't quite cut it.
听听人们建议您使用libevent(或特定于OS的实用程序,如epoll/kqueue)。在许多连接的情况下,这是绝对必须的,因为,正如您所说,创建线程将是一个巨大的性能,并且select()也没有完全减少。
#6
1
Let your program decide. Add code to it that measures throughput and increases/decreases the number of threads dynamically to maximize it.
让你的程序决定。添加代码以度量吞吐量,并动态地增加/减少线程的数量以使其最大化。
This way, your application will always perform well, regardless of the number of execution cores and other factors
通过这种方式,无论执行核心的数量和其他因素如何,您的应用程序将始终表现良好
#7
0
It is a good idea to avoid trying to create one (or even N) threads per client request. This approach is classically non-scalable and you will definitely run into problems with memory usage or context switching. You should look at using a thread pool approach instead and look at the incoming requests as tasks for any thread in the pool to handle. The scalability of this approach is then limited by the ideal number of threads in the pool - usually this is related to the number of CPU cores. You want to try to have each thread use exactly 100% of the CPU on a single core - so in the ideal case you would have 1 thread per core, this will reduce context switching to zero. Depending on the nature of the tasks, this might not be possible, maybe the threads have to wait for external data, or read from disk or whatever so you may find that the number of threads is increased by some scaling factor.
最好避免为每个客户端请求创建一个(甚至N个)线程。这种方法通常是不可扩展的,您肯定会遇到内存使用或上下文切换的问题。您应该考虑使用线程池方法,而将传入的请求看作是池中任何线程处理的任务。这种方法的可扩展性随后受到池中理想线程数的限制——通常这与CPU内核的数量有关。您希望让每个线程在一个核心上使用100%的CPU—所以在理想情况下,每个内核有一个线程,这将减少上下文切换到0。根据任务的性质,这可能是不可能的,也许线程必须等待外部数据,或者从磁盘读取数据,或者其他什么,因此您可能会发现线程的数量增加了一些比例因子。