我应该在Java程序中使用多少个线程?

时间:2022-04-27 00:29:55

I recently inherited a small Java program that takes information from a large database, does some processing and produces a detailed image regarding the information. The original author wrote the code using a single thread, then later modified it to allow it to use multiple threads.

我最近继承了一个小型Java程序,它从大型数据库中获取信息,进行一些处理并生成有关信息的详细图像。原作者使用单个线程编写代码,然后修改它以允许它使用多个线程。

In the code he defines a constant;

在代码中他定义了一个常量;

//  number of threads
public static final int THREADS =  Runtime.getRuntime().availableProcessors();

Which then sets the number of threads that are used to create the image.

然后设置用于创建映像的线程数。

I understand his reasoning that the number of threads cannot be greater than the number of available processors, so set it the the amount to get the full potential out of the processor(s). Is this correct? or is there a better way to utilize the full potential of the processor(s)?

我理解他的理由是线程数不能大于可用处理器的数量,因此将其设置为从处理器中获取全部潜力的数量。它是否正确?或者是否有更好的方法来充分利用处理器的潜力?

EDIT: To give some more clarification, The specific algorithm that is being threaded scales to the resolution of the picture being created, (1 thread per pixel). That is obviously not the best solution though. The work that this algorithm does is what takes all the time, and is wholly mathematical operations, there are no locks or other factors that will cause any given thread to sleep. I just want to maximize the programs CPU utilization to decrease the time to completion.

编辑:为了进一步澄清,正在线程化的特定算法扩展到正在创建的图片的分辨率(每个像素1个线程)。这显然不是最好的解决方案。该算法所做的工作是一直需要的,并且是完全数学运算,没有锁或其他因素会导致任何给定的线程休眠。我只想最大化程序CPU利用率,以减少完成时间。

7 个解决方案

#1


15  

Threads are fine, but as others have noted, you have to be highly aware of your bottlenecks. Your algorithm sounds like it would be susceptible to cache contention between multiple CPUs - this is particularly nasty because it has the potential to hit the performance of all of your threads (normally you think of using multiple threads to continue processing while waiting for slow or high latency IO operations).

线程很好,但正如其他人所说,你必须高度意识到你的瓶颈。您的算法听起来很容易受到多个CPU之间的缓存争用的影响 - 这尤其令人讨厌,因为它有可能达到所有线程的性能(通常您会想到使用多个线程继续处理,同时等待慢或高延迟IO操作)。

Cache contention is a very important aspect of using multi CPUs to process a highly parallelized algorithm: Make sure that you take your memory utilization into account. If you can construct your data objects so each thread has it's own memory that it is working on, you can greatly reduce cache contention between the CPUs. For example, it may be easier to have a big array of ints and have different threads working on different parts of that array - but in Java, the bounds checks on that array are going to be trying to access the same address in memory, which can cause a given CPU to have to reload data from L2 or L3 cache.

缓存争用是使用多CPU处理高度并行化算法的一个非常重要的方面:确保将内存利用率考虑在内。如果您可以构造数据对象,以便每个线程都有自己正在处理的内存,则可以大大减少CPU之间的缓存争用。例如,拥有大量的int并使不同的线程处理该阵列的不同部分可能更容易 - 但在Java中,对该阵列的边界检查将尝试访问内存中的相同地址,可能导致给定的CPU必须从L2或L3缓存重新加载数据。

Splitting the data into it's own data structures, and configure those data structures so they are thread local (might even be more optimal to use ThreadLocal - that actually uses constructs in the OS that provide guarantees that the CPU can use to optimize cache.

将数据拆分为自己的数据结构,并配置这些数据结构,使它们是线程本地的(甚至可能更优化使用ThreadLocal - 实际上使用OS中的结构,提供CPU可用于优化缓存的保证。

The best piece of advice I can give you is test, test, test. Don't make assumptions about how CPUs will perform - there is a huge amount of magic going on in CPUs these days, often with counterintuitive results. Note also that the JIT runtime optimization will add an additional layer of complexity here (maybe good, maybe not).

我能给你的最好建议是测试,测试,测试。不要假设CPU将如何执行 - 这些天CPU中存在大量的魔力,通常会产生违反直觉的结果。另请注意,JIT运行时优化将在此处添加额外的复杂层(可能很好,可能不是)。

#2


10  

On the one hand, you'd like to think Threads == CPU/Cores makes perfect sense. Why have a thread if there's nothing to run it?

一方面,你想要认为Threads == CPU / Cores非常有意义。为什么有一个线程,如果没有什么可以运行它?

The detail boils down to "what are the threads doing". A thread that's idle waiting for a network packet or a disk block is CPU time wasted.

细节归结为“线程在做什么”。空闲等待网络数据包或磁盘块的线程浪费了CPU时间。

If your threads are CPU heavy, then a 1:1 correlation makes some sense. If you have a single "read the DB" thread that feeds the other threads, and a single "Dump the data" thread and pulls data from the CPU threads and create output, those two could most likely easily share a CPU while the CPU heavy threads keep churning away.

如果您的线程CPU很重,那么1:1的相关性就会有所帮助。如果你有一个“读取数据库”线程来提供其他线程,并且单个“转储数据”线程并从CPU线程中提取数据并创建输出,这两个很可能很容易共享CPU而CPU重线程继续搅拌。

The real answer, as with all sorts of things, is to measure it. Since the number is configurable (apparently), configure it! Run it with 1:1 threads to CPUs, 2:1, 1.5:1, whatever, and time the results. Fast one wins.

与各种事物一样,真正的答案就是衡量它。由于该数字是可配置的(显然),请配置它!用1:1线程运行它到CPU,2:1,1.5:1,无论如何,并为结果计时。快一胜。

#3


3  

The number that your application needs; no more, and no less.

您的应用程序需要的数量;不多也不少。

Obviously, if you're writing an application which contains some parallelisable algorithm, then you can probably start benchmarking to find a good balance in the number of threads, but bear in mind that hundreds of threads won't speed up any operation.

显然,如果你正在编写一个包含一些可并行算法的应用程序,那么你可以开始基准测试以找到线程数量的良好平衡,但请记住,数百个线程不会加速任何操作。

If your algorithm can't be parallelised, then no number of additional threads is going to help.

如果您的算法无法并行化,那么没有多少额外的线程可以提供帮助。

#4


1  

Yes, that's a perfectly reasonable approach. One thread per processor/core will maximize processing power and minimize context switching. I'd probably leave that as-is unless I found a problem via benchmarking/profiling.

是的,这是一种非常合理的方法。每个处理器/核心一个线程将最大化处理能力并最小化上下文切换。除非我通过基准测试/分析发现问题,否则我可能会保留原样。

One thing to note is that the JVM does not guarantee availableProcessors() will be constant, so technically, you should check it immediately before spawning your threads. I doubt that this value is likely to change at runtime on typical computers, though.

需要注意的一点是,JVM不保证availableProcessors()将是常量,因此从技术上讲,您应该在生成线程之前立即检查它。我怀疑这个值在典型的计算机上可能会在运行时发生变化。

P.S. As others have pointed out, if your process is not CPU-bound, this approach is unlikely to be optimal. Since you say these threads are being used to generate images, though, I assume you are CPU bound.

附:正如其他人指出的那样,如果您的流程不受CPU限制,那么这种方法不太可能是最优的。既然你说这些线程用于生成图像,我认为你是CPU绑定的。

#5


1  

number of processors is a good start; but if those threads do a lot of i/o, then might be better with more... or less.

处理器数量是一个良好的开端;但如果这些线程做了很多i / o,那么可能会更好......或更少。

first think of what are the resources available and what do you want to optimise (least time to finish, least impact to other tasks, etc). then do the math.

首先想一想可用的资源是什么,你想要优化什么(最短的完成时间,对其他任务的影响最小等)。然后做数学。

sometimes it could be better if you dedicate a thread or two to each i/o resource, and the others fight for CPU. the analisys is usually easier on these designs.

有时,如果你为每个i / o资源专门设置一个或两个线程,而其他人争夺CPU,那么可能会更好。在这些设计中,分析通常更容易。

#6


0  

The benefit of using threads is to reduce wall-clock execution time of your program by allowing your program to work on a different part of the job while another part is waiting for something to happen (usually I/O). If your program is totally CPU bound adding threads will only slow it down. If it is fully or partially I/O bound, adding threads may help but there's a balance point to be struck between the overhead of adding threads and the additional work that will get accomplished. To make the number of threads equal to the number of processors will yield peak performance if the program is totally, or near-totally CPU-bound.

使用线程的好处是通过允许程序在作业的不同部分工作而另一部分正在等待某些事情发生(通常是I / O)来减少程序的挂钟执行时间。如果你的程序完全是CPU绑定的,那么添加线程只会降低它的速度。如果它是完全或部分I / O绑定,添加线程可能会有所帮助,但是在添加线程的开销和将要完成的额外工作之间有一个平衡点。如果程序完全或几乎完全受CPU限制,那么使线程数等于处理器数将产生最佳性能。

As with many questions with the word "should" in them, the answer is, "It depends". If you think you can get better performance, adjust the number of threads up or down and benchmark the application's performance. Also take into account any other factors that might influence the decision (if your application is eating 100% of the computer's available horsepower, the performance of other applications will be reduced).

正如许多关于“应该”这个词的问题一样,答案是“它取决于”。如果您认为可以获得更好的性能,请调整线程的数量,并对应用程序的性能进行基准测试。还要考虑可能影响决策的任何其他因素(如果您的应用程序正在吃掉100%的计算机可用马力,其他应用程序的性能将会降低)。

This assumes that the multi-threaded code is written properly etc. If the original developer only had one CPU, he would never have had a chance to experience problems with poorly-written threading code. So you should probably test behaviour as well as performance when adjusting the number of threads.

这假设多线程代码写得正确等。如果原始开发人员只有一个CPU,他将永远不会有机会遇到写得不好的线程代码问题。因此,在调整线程数时,您应该测试行为和性能。

By the way, you might want to consider allowing the number of threads to be configured at run time instead of compile time to make this whole process easier.

顺便说一下,您可能需要考虑允许在运行时配置线程数而不是编译时间,以使整个过程更容易。

#7


0  

After seeing your edit, it's quite possible that one thread per CPU is as good as it gets. Your application seems quite parallelizable. If you have extra hardware you can use GridGain to grid-enable your app and have it run on multiple machines. That's probably about the only thing, beyond buying faster / more cores, that will speed it up.

在看到你的编辑之后,每个CPU的一个线程很可能和它一样好。您的应用程序似乎可以并行化。如果您有额外的硬件,您可以使用GridGain为您的应用程序启用网格,并让它在多台计算机上运行。除了购买更快/更多核心之外,这可能是唯一能够加快速度的因素。

#1


15  

Threads are fine, but as others have noted, you have to be highly aware of your bottlenecks. Your algorithm sounds like it would be susceptible to cache contention between multiple CPUs - this is particularly nasty because it has the potential to hit the performance of all of your threads (normally you think of using multiple threads to continue processing while waiting for slow or high latency IO operations).

线程很好,但正如其他人所说,你必须高度意识到你的瓶颈。您的算法听起来很容易受到多个CPU之间的缓存争用的影响 - 这尤其令人讨厌,因为它有可能达到所有线程的性能(通常您会想到使用多个线程继续处理,同时等待慢或高延迟IO操作)。

Cache contention is a very important aspect of using multi CPUs to process a highly parallelized algorithm: Make sure that you take your memory utilization into account. If you can construct your data objects so each thread has it's own memory that it is working on, you can greatly reduce cache contention between the CPUs. For example, it may be easier to have a big array of ints and have different threads working on different parts of that array - but in Java, the bounds checks on that array are going to be trying to access the same address in memory, which can cause a given CPU to have to reload data from L2 or L3 cache.

缓存争用是使用多CPU处理高度并行化算法的一个非常重要的方面:确保将内存利用率考虑在内。如果您可以构造数据对象,以便每个线程都有自己正在处理的内存,则可以大大减少CPU之间的缓存争用。例如,拥有大量的int并使不同的线程处理该阵列的不同部分可能更容易 - 但在Java中,对该阵列的边界检查将尝试访问内存中的相同地址,可能导致给定的CPU必须从L2或L3缓存重新加载数据。

Splitting the data into it's own data structures, and configure those data structures so they are thread local (might even be more optimal to use ThreadLocal - that actually uses constructs in the OS that provide guarantees that the CPU can use to optimize cache.

将数据拆分为自己的数据结构,并配置这些数据结构,使它们是线程本地的(甚至可能更优化使用ThreadLocal - 实际上使用OS中的结构,提供CPU可用于优化缓存的保证。

The best piece of advice I can give you is test, test, test. Don't make assumptions about how CPUs will perform - there is a huge amount of magic going on in CPUs these days, often with counterintuitive results. Note also that the JIT runtime optimization will add an additional layer of complexity here (maybe good, maybe not).

我能给你的最好建议是测试,测试,测试。不要假设CPU将如何执行 - 这些天CPU中存在大量的魔力,通常会产生违反直觉的结果。另请注意,JIT运行时优化将在此处添加额外的复杂层(可能很好,可能不是)。

#2


10  

On the one hand, you'd like to think Threads == CPU/Cores makes perfect sense. Why have a thread if there's nothing to run it?

一方面,你想要认为Threads == CPU / Cores非常有意义。为什么有一个线程,如果没有什么可以运行它?

The detail boils down to "what are the threads doing". A thread that's idle waiting for a network packet or a disk block is CPU time wasted.

细节归结为“线程在做什么”。空闲等待网络数据包或磁盘块的线程浪费了CPU时间。

If your threads are CPU heavy, then a 1:1 correlation makes some sense. If you have a single "read the DB" thread that feeds the other threads, and a single "Dump the data" thread and pulls data from the CPU threads and create output, those two could most likely easily share a CPU while the CPU heavy threads keep churning away.

如果您的线程CPU很重,那么1:1的相关性就会有所帮助。如果你有一个“读取数据库”线程来提供其他线程,并且单个“转储数据”线程并从CPU线程中提取数据并创建输出,这两个很可能很容易共享CPU而CPU重线程继续搅拌。

The real answer, as with all sorts of things, is to measure it. Since the number is configurable (apparently), configure it! Run it with 1:1 threads to CPUs, 2:1, 1.5:1, whatever, and time the results. Fast one wins.

与各种事物一样,真正的答案就是衡量它。由于该数字是可配置的(显然),请配置它!用1:1线程运行它到CPU,2:1,1.5:1,无论如何,并为结果计时。快一胜。

#3


3  

The number that your application needs; no more, and no less.

您的应用程序需要的数量;不多也不少。

Obviously, if you're writing an application which contains some parallelisable algorithm, then you can probably start benchmarking to find a good balance in the number of threads, but bear in mind that hundreds of threads won't speed up any operation.

显然,如果你正在编写一个包含一些可并行算法的应用程序,那么你可以开始基准测试以找到线程数量的良好平衡,但请记住,数百个线程不会加速任何操作。

If your algorithm can't be parallelised, then no number of additional threads is going to help.

如果您的算法无法并行化,那么没有多少额外的线程可以提供帮助。

#4


1  

Yes, that's a perfectly reasonable approach. One thread per processor/core will maximize processing power and minimize context switching. I'd probably leave that as-is unless I found a problem via benchmarking/profiling.

是的,这是一种非常合理的方法。每个处理器/核心一个线程将最大化处理能力并最小化上下文切换。除非我通过基准测试/分析发现问题,否则我可能会保留原样。

One thing to note is that the JVM does not guarantee availableProcessors() will be constant, so technically, you should check it immediately before spawning your threads. I doubt that this value is likely to change at runtime on typical computers, though.

需要注意的一点是,JVM不保证availableProcessors()将是常量,因此从技术上讲,您应该在生成线程之前立即检查它。我怀疑这个值在典型的计算机上可能会在运行时发生变化。

P.S. As others have pointed out, if your process is not CPU-bound, this approach is unlikely to be optimal. Since you say these threads are being used to generate images, though, I assume you are CPU bound.

附:正如其他人指出的那样,如果您的流程不受CPU限制,那么这种方法不太可能是最优的。既然你说这些线程用于生成图像,我认为你是CPU绑定的。

#5


1  

number of processors is a good start; but if those threads do a lot of i/o, then might be better with more... or less.

处理器数量是一个良好的开端;但如果这些线程做了很多i / o,那么可能会更好......或更少。

first think of what are the resources available and what do you want to optimise (least time to finish, least impact to other tasks, etc). then do the math.

首先想一想可用的资源是什么,你想要优化什么(最短的完成时间,对其他任务的影响最小等)。然后做数学。

sometimes it could be better if you dedicate a thread or two to each i/o resource, and the others fight for CPU. the analisys is usually easier on these designs.

有时,如果你为每个i / o资源专门设置一个或两个线程,而其他人争夺CPU,那么可能会更好。在这些设计中,分析通常更容易。

#6


0  

The benefit of using threads is to reduce wall-clock execution time of your program by allowing your program to work on a different part of the job while another part is waiting for something to happen (usually I/O). If your program is totally CPU bound adding threads will only slow it down. If it is fully or partially I/O bound, adding threads may help but there's a balance point to be struck between the overhead of adding threads and the additional work that will get accomplished. To make the number of threads equal to the number of processors will yield peak performance if the program is totally, or near-totally CPU-bound.

使用线程的好处是通过允许程序在作业的不同部分工作而另一部分正在等待某些事情发生(通常是I / O)来减少程序的挂钟执行时间。如果你的程序完全是CPU绑定的,那么添加线程只会降低它的速度。如果它是完全或部分I / O绑定,添加线程可能会有所帮助,但是在添加线程的开销和将要完成的额外工作之间有一个平衡点。如果程序完全或几乎完全受CPU限制,那么使线程数等于处理器数将产生最佳性能。

As with many questions with the word "should" in them, the answer is, "It depends". If you think you can get better performance, adjust the number of threads up or down and benchmark the application's performance. Also take into account any other factors that might influence the decision (if your application is eating 100% of the computer's available horsepower, the performance of other applications will be reduced).

正如许多关于“应该”这个词的问题一样,答案是“它取决于”。如果您认为可以获得更好的性能,请调整线程的数量,并对应用程序的性能进行基准测试。还要考虑可能影响决策的任何其他因素(如果您的应用程序正在吃掉100%的计算机可用马力,其他应用程序的性能将会降低)。

This assumes that the multi-threaded code is written properly etc. If the original developer only had one CPU, he would never have had a chance to experience problems with poorly-written threading code. So you should probably test behaviour as well as performance when adjusting the number of threads.

这假设多线程代码写得正确等。如果原始开发人员只有一个CPU,他将永远不会有机会遇到写得不好的线程代码问题。因此,在调整线程数时,您应该测试行为和性能。

By the way, you might want to consider allowing the number of threads to be configured at run time instead of compile time to make this whole process easier.

顺便说一下,您可能需要考虑允许在运行时配置线程数而不是编译时间,以使整个过程更容易。

#7


0  

After seeing your edit, it's quite possible that one thread per CPU is as good as it gets. Your application seems quite parallelizable. If you have extra hardware you can use GridGain to grid-enable your app and have it run on multiple machines. That's probably about the only thing, beyond buying faster / more cores, that will speed it up.

在看到你的编辑之后,每个CPU的一个线程很可能和它一样好。您的应用程序似乎可以并行化。如果您有额外的硬件,您可以使用GridGain为您的应用程序启用网格,并让它在多台计算机上运行。除了购买更快/更多核心之外,这可能是唯一能够加快速度的因素。