当一个pragma内部有一个pragma时，OpenMP会发生什么？

At the start of #pragma omp parallel a bunch of threads are created, then when we get to #pragma omp for the workload is distributed. What happens if this for loop has a for loop inside it, and I place a #pragma omp for before it as well? Does each thread create new threads? If not, which threads are assigned this task? What exactly happens in this situation?

在#pragma omp并行开始时,会创建一堆线程,然后当我们到达#pragma omp时,就会分配工作负载。如果这个for循环里面有for循环会发生什么,我也会在它之前放置一个#pragma omp?每个线程都创建新线程吗?如果没有,哪个线程被分配了这个任务?在这种情况下究竟发生了什么?

2 个解决方案

#1

By default, no threads are spawned for the inner loop. It is done sequentially using the thread that reaches it.

默认情况下,不会为内循环生成任何线程。它是使用到达它的线程顺序完成的。

This is because nesting is disabled by default. However, if you enable nesting via omp_set_nested(), then a new set of threads will be spawned.

这是因为默认情况下禁用嵌套。但是,如果通过omp_set_nested()启用嵌套,则会生成一组新线程。

However, if you aren't careful, this will result in p^2 number of threads (since each of the original p threads will spawn another p threads.) Therefore nesting is disabled by default.

但是,如果您不小心,这将导致p ^ 2个线程数(因为每个原始p线程将产生另一个p线程。)因此,默认情况下禁用嵌套。

#2

In a situation like the following:

在以下情况中:

#pragma omp parallel
{
#pragma omp for
  for(int ii = 0; ii < n; ii++) {
    /* ... */
#pragma omp for 
    for(int jj = 0; jj < m; jj++) {
      /* ... */
    }
  }
}

what happens is that you trigger an undefined behavior as you violate the OpenMP standard. More precisely you violate the restrictions appearing in section 2.5 (worksharing constructs):

当您违反OpenMP标准时,会触发未定义的行为。更准确地说,您违反了第2.5节(工作共享结构)中出现的限制:

The following restrictions apply to worksharing constructs:

以下限制适用于工作共享构造:

Each worksharing region must be encountered by all threads in a team or by none at all.

每个工作共享区域必须由团队中的所有线程遇到或根本不遇到。

The sequence of worksharing regions and barrier regions encountered must be the same for every thread in a team.

对于团队中的每个线程,遇到的工作共享区域和障碍区域的顺序必须相同。

This is clearly shown in the examples A.39.1c and A.40.1c:

这在实例A.39.1c和A.40.1c中清楚地显示:

Example A.39.1c: The following example of loop construct nesting is conforming because the inner and outer loop regions bind to different parallel regions:

示例A.39.1c:以下循环构造嵌套示例是符合的,因为内部和外部循环区域绑定到不同的并行区域:

void work(int i, int j) {}
void good_nesting(int n)
{
  int i, j;
#pragma omp parallel default(shared)
  {
#pragma omp for
    for (i=0; i<n; i++) {
#pragma omp parallel shared(i, n)
    {
#pragma omp for
      for (j=0; j < n; j++)
        work(i, j);
    }
    }
  }
}

Example A.40.1c: The following example is non-conforming because the inner and outer loop regions are closely nested

示例A.40.1c:以下示例不符合,因为内部和外部循环区域紧密嵌套

void work(int i, int j) {}
void wrong1(int n)
{
#pragma omp parallel default(shared)
  {
    int i, j;
#pragma omp for
    for (i=0; i<n; i++) {
    /* incorrect nesting of loop regions */
#pragma omp for
      for (j=0; j<n; j++)
        work(i, j);
    }
  }    
}

Notice that this is different from:

请注意,这与以下内容不同:

#pragma omp parallel for
  for(int ii = 0; ii < n; ii++) {
    /* ... */
#pragma omp parallel for 
    for(int jj = 0; jj < m; jj++) {
      /* ... */
    }
  }

in which you try to spawn a nested parallel region. Only in this case the discussion of Mysticial answer holds.

您尝试生成嵌套的并行区域。只有在这种情况下才能讨论Mysticial的答案。

#1