并行程序:如何找到瓶颈(CPU绑定线程)

时间:2021-05-17 20:59:05

I have written a parallel program using OpenMP. It uses two threads because my laptop is dual core and the threads do a lot of matrix operations, so they are CPU bound. There is no data sharing among the threads. A single instance of the program runs quite fast. But when I run multiple instances of the same program simultaneously, the performance degrades. Here is a plot:并行程序:如何找到瓶颈(CPU绑定线程)

我使用OpenMP编写了一个并行程序。它使用两个线程,因为我的笔记本电脑是双核,并且线程执行大量矩阵操作,因此它们受CPU限制。线程之间没有数据共享。该程序的单个实例运行得非常快。但是当我同时运行同一程序的多个实例时,性能会下降。这是一个情节:

The running time for a single instance (two threads) is 0.78 seconds. The running time for two instances (total of four threads) is 2.06, which is more than double of 0.78. After that, the running time increases in proportion with the number of instances (number of threads).

单个实例(两个线程)的运行时间为0.78秒。两个实例(总共四个线程)的运行时间是2.06,这是0.78的两倍多。之后,运行时间与实例数(线程数)成比例增加。

Here is the timing profile of one of the instances when multiple were run in parallel:

以下是多个并行运行的实例之一的时序配置文件:

并行程序:如何找到瓶颈(CPU绑定线程)

Can someone offer insights into what could be going on? The profile shows that 50% of the time is being consumed by OpenMP. What does that mean?

有人可以提供有关可能发生的事情的见解吗?该配置文件显示OpenMP正在消耗50%的时间。那是什么意思?

1 个解决方案

#1


3  

Similar to what @Bort said, you made the application multithreaded (two threads) because you have two cores.

与@Bort所说的类似,你使应用程序成为多线程(两个线程),因为你有两个核心。

This means that when only one instance of your program is running (ideally) it gets to use the whole CPU.

这意味着当只有一个程序实例正在运行时(理想情况下)它可以使用整个CPU。

However, if two instances of the application are running, there are no more resources available. They will each take twice the time. Same for more instances.

但是,如果正在运行两个应用程序实例,则不再有可用资源。他们每人需要两倍的时间。更多实例也是如此。

You cannot fix this issue without also increasing the number of cores available for each instance (i.e. keeping it at 2 per instance, rather than a shrinking percentage).

如果不增加每个实例可用的内核数量(即每个实例保持2个,而不是缩小百分比),则无法解决此问题。

#1


3  

Similar to what @Bort said, you made the application multithreaded (two threads) because you have two cores.

与@Bort所说的类似,你使应用程序成为多线程(两个线程),因为你有两个核心。

This means that when only one instance of your program is running (ideally) it gets to use the whole CPU.

这意味着当只有一个程序实例正在运行时(理想情况下)它可以使用整个CPU。

However, if two instances of the application are running, there are no more resources available. They will each take twice the time. Same for more instances.

但是,如果正在运行两个应用程序实例,则不再有可用资源。他们每人需要两倍的时间。更多实例也是如此。

You cannot fix this issue without also increasing the number of cores available for each instance (i.e. keeping it at 2 per instance, rather than a shrinking percentage).

如果不增加每个实例可用的内核数量(即每个实例保持2个,而不是缩小百分比),则无法解决此问题。