我能用c++ 11安全地使用OpenMP吗?

时间:2022-01-10 00:39:43

The OpenMP standard only considers C++ 98 (ISO/IEC 14882:1998). This means that there is no standard supporting usage of OpenMP under C++03 or even C++11. Thus, any program that uses C++ >98 and OpenMP operates outside of standards, implying that even if it works under certain conditions, it's unlikely to be portable but definitely never guaranteed.

OpenMP标准只考虑c++ 98 (ISO/IEC 1488:1998)。这意味着在c++ 03甚至c++ 11下,没有标准支持OpenMP的使用。因此,任何使用c++ >98和OpenMP的程序都是在标准之外运行的,这意味着即使它在一定条件下工作,也不太可能是可移植的,但绝对不能保证。

The situation is even worse with C++11 with its own multi-threading support, which very likely will * with OpenMP for certain implementations.

c++ 11拥有自己的多线程支持,这种情况更糟,对于某些实现,它很可能与OpenMP发生冲突。

So, how safe is it to use OpenMP with C++03 and C++11?

那么,在c++ 03和c++ 11中使用OpenMP安全吗?

Can one safely use C++11 multi-threading as well as OpenMP in one and the same program but without interleaving them (i.e. no OpenMP statement in any code passed to C++11 concurrent features and no C++11 concurrency in threads spawned by OpenMP)?

可以在一个和同一个程序中安全地使用c++ +11多线程和OpenMP,但不交叉它们(即,在传递给c++ 11并发特性的代码中没有OpenMP语句,在OpenMP生成的线程中没有c++ +11并发)吗?

I'm particularly interested in the situation where I first call some code using OpenMP and then some other code using C++11 concurrency on the same data structures.

我特别感兴趣的是,我首先使用OpenMP调用一些代码,然后在相同的数据结构上使用c++ 11并发调用其他代码。

4 个解决方案

#1


24  

Walter, I believe I not only told you the current state of things in that other discussion, but also provided you with information directly from the source (i.e. from my colleague who is part of the OpenMP language committee).

Walter,我相信我不仅告诉了你在其他讨论中事情的现状,而且还直接向你提供了来自来源的信息(例如,我的同事是OpenMP语言委员会的成员)。

OpenMP was designed as a lightweight data-parallel addition to FORTRAN and C, later extended to C++ idioms (e.g. parallel loops over random-access iterators) and to task parallelism with the introduction of explicit tasks. It is meant to be as portable across as many platforms as possible and to provide essentially the same functionality in all three languages. Its execution model is quite simple - a single-threaded application forks teams of threads in parallel regions, runs some computational tasks inside and then joins the teams back into serial execution. Each thread from a parallel team can later fork its own team if nested parallelism is enabled.

OpenMP是作为一个轻量级的数据并行添加到FORTRAN和C中,后来扩展到c++习语(例如,在随机访问迭代器上的并行循环),以及引入显式任务的任务并行性。它的目的是尽可能多地移植到各种平台上,并在所有三种语言中提供基本相同的功能。它的执行模型非常简单——一个单线程的应用程序在并行区域中分离多个线程团队,在其中运行一些计算任务,然后将这些团队加入到串行执行中。如果启用了嵌套并行性,那么来自并行团队的每个线程以后都可以为自己的团队提供分支。

Since the main usage of OpenMP is in High Performance Computing (after all, its directive and execution model was borrowed from High Performance Fortran), the main goal of any OpenMP implementation is efficiency and not interoperability with other threading paradigms. On some platforms efficient implementation could only be achieved if the OpenMP run-time is the only one in control of the process threads. Also there are certain aspects of OpenMP that might not play well with other threading constructs, for example the limit on the number of threads set by OMP_THREAD_LIMIT when forking two or more concurrent parallel regions.

由于OpenMP的主要用途是高性能计算(毕竟,它的指令和执行模型是从高性能的Fortran借鉴而来的),任何OpenMP实现的主要目标都是效率,而不是与其他线程模式的互操作性。在某些平台上,只有当OpenMP运行时是控制进程线程的惟一运行时,才能实现有效的实现。此外,OpenMP的某些方面可能不能很好地应用于其他线程结构,例如,当划分两个或多个并发并行区域时,OMP_THREAD_LIMIT设置的线程数量的限制。

Since the OpenMP standard itself does not strictly forbid using other threading paradigms, but neither standardises the interoperability with such, supporting such functionality is up to the implementers. This means that some implementations might provide safe concurrent execution of top-level OpenMP regions, some might not. The x86 implementers pledge to supporting it, may be because most of them are also proponents of other execution models (e.g. Intel with Cilk and TBB, GCC with C++11, etc.) and x86 is usually considered an "experimental" platform (other vendors are usually much more conservative).

由于OpenMP标准本身并没有严格禁止使用其他线程范式,但也没有对与此类的互操作性进行标准化,支持此类功能取决于实现者。这意味着有些实现可能提供*OpenMP区域的安全并发执行,有些则可能不提供。x86实现者承诺支持它,可能是因为他们中的大多数人都是其他执行模型的支持者(例如,使用Cilk和TBB的Intel、使用c++ 11的GCC等),而x86通常被认为是一个“试验性”平台(其他供应商通常更为保守)。

OpenMP 4.0 is also not going further than ISO/IEC 14882:1998 for the C++ features it employs (the SC12 draft is here). The standard now includes things like portable thread affinity - this definitely does not play well with other threading paradigms, which might provide their own binding mechanisms that * with those of OpenMP. Once again, the OpenMP language is targeted at HPC (data and task parallel scientific and engineering applications). The C++11 constructs are targeted at general purpose computing applications. If you want fancy C++11 concurrent stuff, then use C++11 only, or if you really need to mix it with OpenMP, then stick to the C++98 subset of language features if you want to stay portable.

OpenMP 4.0对于它所使用的c++特性也不会比ISO/IEC 14882:1998更进一步(SC12草稿在这里)。标准现在包括了诸如可移植线程关联之类的东西——这显然与其他线程模式不太一样,它们可能会提供与OpenMP冲突的绑定机制。OpenMP语言同样针对HPC(数据和任务并行的科学和工程应用程序)。c++ 11结构针对通用计算应用程序。如果您想要高级的c++ 11并发的东西,那么只使用c++ 11,或者如果您确实需要将它与OpenMP混合,那么如果您希望保持可移植性,请坚持使用c++ 98语言特性子集。

I'm particularly interested in the situation where I first call some code using OpenMP and then some other code using C++11 concurrency on the same data structures.

我特别感兴趣的是,我首先使用OpenMP调用一些代码,然后在相同的数据结构上使用c++ 11并发调用其他代码。

There are no obvious reasons for what you want to not be possible, but it is up to your OpenMP compiler and run-time. There are free and commercial libraries that use OpenMP for parallel execution (for example MKL), but there are always warnings (although sometimes hidden deeply in their user manuals) of possible incompatibility with multithreaded code that give information on what and when is possible. As always, this is outside of the scope of the OpenMP standard and hence YMMV.

没有明显的理由说明您不希望出现的情况,但是这取决于您的OpenMP编译器和运行时。有一些免费的和商业的库使用OpenMP进行并行执行(例如MKL),但是总有一些警告(尽管有时隐藏在用户手册中)可能与多线程代码不兼容,这些代码提供了关于什么和什么时候是可能的信息。一如既往,这超出了OpenMP标准的范围,因此也超出了YMMV的范围。

#2


7  

I'm actually interested in high-performance computing, but OpenMP (currently) does not serve my purpose well enough: it's not flexible enough (my algorithm is not loop based)

实际上我对高性能计算很感兴趣,但是OpenMP(目前)并不能很好地满足我的目的:它不够灵活(我的算法不是基于循环的)

Maybe you are really looking for TBB? That provides support for loop and task based parallelism, as well as a variety of parallel data structures, in standard C++, and is both portable and open-source.

也许你真的在找TBB?这提供了对循环和基于任务的并行性的支持,以及标准c++中的各种并行数据结构,并且是可移植的和开源的。

(Full disclaimer: I work for Intel who are heavily involved with TBB, though I don't actually work on TBB but on OpenMP :-); I am certainly not speaking for Intel!).

(完全免责声明:我为英特尔公司工作,该公司与TBB有密切的关系,虽然我不是在TBB上工作,而是在OpenMP:-);我当然不是在为英特尔说话!

#3


5  

Like Jim Cownie, I’m also an Intel employee. I agree with him that Intel Threading Building Blocks (Intel TBB) might be a good option since it has loop-level parallelism like OpenMP but also other parallel algorithms, concurrent containers, and lower-level features too. And TBB tries to keep up with the current C++ standard.

和吉姆·考尼一样,我也是英特尔的员工。我同意他的观点,英特尔线程构建块(Intel TBB)可能是一个不错的选择,因为它有像OpenMP这样的循环级并行,也有其他并行算法、并发容器和较低级的特性。TBB试图跟上当前的c++标准。

And to clarify for Walter, Intel TBB includes a parallel_reduce algorithm as well as high-level support for atomics and mutexes.

为了向Walter澄清,Intel TBB包括一个parallel_reduce算法以及对原子和互斥体的高级支持。

You can find the Intel® Threading Building Block’s User Guide at http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/tbb_userguide/title.htm The User Guide gives an overview of the features in the library.

你可以找到Intel®线程构建块的用户指南http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/tbb_userguide/title.htm用户指南概述在图书馆的功能。

#4


3  

OpenMP is often (I am aware of no exceptions) implemented on top of Pthreads, so you can reason about some of the interoperability questions by thinking about how C++11 concurrency interoperates with Pthread code.

OpenMP经常(我知道没有例外)在Pthreads上实现,因此您可以通过思考c++ +11并发如何与Pthread代码交互来推理一些互操作性问题。

I don't know if oversubscription due to the use of multiple threading models is an issue for you, but this is definitely an issue for OpenMP. There is a proposal to address this in OpenMP 5. Until then, how you solve this is implementation defined. They are heavy hammers, but you can use OMP_WAIT_POLICY (OpenMP 4.5+), KMP_BLOCKTIME (Intel and LLVM), and GOMP_SPINCOUNT (GCC) to address this. I'm sure other implementations have something similar.

我不知道是否由于使用多个线程模型而导致的过度订阅对您来说是一个问题,但这肯定是OpenMP的问题。在OpenMP 5中有一个解决这个问题的建议。在此之前,解决这个问题的方法是实现定义。它们是重型锤子,但是您可以使用OMP_WAIT_POLICY (OpenMP 4.5+)、KMP_BLOCKTIME (Intel和LLVM)和GOMP_SPINCOUNT (GCC)来解决这个问题。我确信其他实现也有类似的东西。

One issue where interoperability is a real concern is w.r.t. the memory model, i.e. how atomic operations behave. This is currently undefined, but you can still reason about it. For example, if you use C++11 atomics with OpenMP parallelism, you should be fine, but you are responsible for using C++11 atomics correctly from OpenMP threads.

互操作性真正关心的一个问题是内存模型,即原子操作的行为。这目前还没有定义,但是您仍然可以对此进行推理。例如,如果您使用带有OpenMP并行性的c++ 11 atomics,应该没问题,但是您负责从OpenMP线程中正确地使用c++ 11 atomics。

Mixing OpenMP atomics and C++11 atomics is a bad idea. We (the OpenMP language committee working group charged with looking at OpenMP 5 base language support) are currently trying to sort this out. Personally, I think C++11 atomics are better than OpenMP atomics in every way, so my recommendation is that you use C++11 (or C11, or __atomic) for your atomics and leave #pragma omp atomic for the Fortran programmers.

混合OpenMP原子和c++ 11原子不是一个好主意。我们(负责查看OpenMP 5基本语言支持的OpenMP语言委员会工作组)目前正在尝试解决这个问题。就我个人而言,我认为c++ 11原子学在各个方面都优于OpenMP原子,所以我的建议是,您可以使用c++ 11(或C11,或__atomic)作为您的原子,并为Fortran编程人员保留#pragma omp原子。

Below is an example code that uses C++11 atomics with OpenMP threads. It works as designed everywhere I have tested it.

下面是使用c++ 11 atomics和OpenMP线程的示例代码。它在我测试过的任何地方都适用。

Full disclosure: Like Jim and Mike, I work for Intel :-)

完全披露:像吉姆和迈克一样,我在英特尔工作:-)

#if defined(__cplusplus) && (__cplusplus >= 201103L)

#include <iostream>
#include <iomanip>

#include <atomic>

#include <chrono>

#ifdef _OPENMP
# include <omp.h>
#else
# error No OpenMP support!
#endif

#ifdef SEQUENTIAL_CONSISTENCY
auto load_model  = std::memory_order_seq_cst;
auto store_model = std::memory_order_seq_cst;
#else
auto load_model  = std::memory_order_acquire;
auto store_model = std::memory_order_release;
#endif

int main(int argc, char * argv[])
{
    int nt = omp_get_max_threads();
#if 1
    if (nt != 2) omp_set_num_threads(2);
#else
    if (nt < 2)      omp_set_num_threads(2);
    if (nt % 2 != 0) omp_set_num_threads(nt-1);
#endif

    int iterations = (argc>1) ? atoi(argv[1]) : 1000000;

    std::cout << "thread ping-pong benchmark\n";
    std::cout << "num threads  = " << omp_get_max_threads() << "\n";
    std::cout << "iterations   = " << iterations << "\n";
#ifdef SEQUENTIAL_CONSISTENCY
    std::cout << "memory model = " << "seq_cst";
#else
    std::cout << "memory model = " << "acq-rel";
#endif
    std::cout << std::endl;

    std::atomic<int> left_ready  = {-1};
    std::atomic<int> right_ready = {-1};

    int left_payload  = 0;
    int right_payload = 0;

    #pragma omp parallel
    {
        int me      = omp_get_thread_num();
        /// 0=left 1=right
        bool parity = (me % 2 == 0);

        int junk = 0;

        /// START TIME
        #pragma omp barrier
        std::chrono::high_resolution_clock::time_point t0 = std::chrono::high_resolution_clock::now();

        for (int i=0; i<iterations; ++i) {

            if (parity) {

                /// send to left
                left_payload = i;
                left_ready.store(i, store_model);

                /// recv from right
                while (i != right_ready.load(load_model));
                //std::cout << i << ": left received " << right_payload << std::endl;
                junk += right_payload;

            } else {

                /// recv from left
                while (i != left_ready.load(load_model));
                //std::cout << i << ": right received " << left_payload << std::endl;
                junk += left_payload;

                ///send to right
                right_payload = i;
                right_ready.store(i, store_model);

            }

        }

        /// STOP TIME
        #pragma omp barrier
        std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();

        /// PRINT TIME
        std::chrono::duration<double> dt = std::chrono::duration_cast<std::chrono::duration<double>>(t1-t0);
        #pragma omp critical
        {
            std::cout << "total time elapsed = " << dt.count() << "\n";
            std::cout << "time per iteration = " << dt.count()/iterations  << "\n";
            std::cout << junk << std::endl;
        }
    }

    return 0;
}

#else  // C++11
#error You need C++11 for this test!
#endif // C++11

#1


24  

Walter, I believe I not only told you the current state of things in that other discussion, but also provided you with information directly from the source (i.e. from my colleague who is part of the OpenMP language committee).

Walter,我相信我不仅告诉了你在其他讨论中事情的现状,而且还直接向你提供了来自来源的信息(例如,我的同事是OpenMP语言委员会的成员)。

OpenMP was designed as a lightweight data-parallel addition to FORTRAN and C, later extended to C++ idioms (e.g. parallel loops over random-access iterators) and to task parallelism with the introduction of explicit tasks. It is meant to be as portable across as many platforms as possible and to provide essentially the same functionality in all three languages. Its execution model is quite simple - a single-threaded application forks teams of threads in parallel regions, runs some computational tasks inside and then joins the teams back into serial execution. Each thread from a parallel team can later fork its own team if nested parallelism is enabled.

OpenMP是作为一个轻量级的数据并行添加到FORTRAN和C中,后来扩展到c++习语(例如,在随机访问迭代器上的并行循环),以及引入显式任务的任务并行性。它的目的是尽可能多地移植到各种平台上,并在所有三种语言中提供基本相同的功能。它的执行模型非常简单——一个单线程的应用程序在并行区域中分离多个线程团队,在其中运行一些计算任务,然后将这些团队加入到串行执行中。如果启用了嵌套并行性,那么来自并行团队的每个线程以后都可以为自己的团队提供分支。

Since the main usage of OpenMP is in High Performance Computing (after all, its directive and execution model was borrowed from High Performance Fortran), the main goal of any OpenMP implementation is efficiency and not interoperability with other threading paradigms. On some platforms efficient implementation could only be achieved if the OpenMP run-time is the only one in control of the process threads. Also there are certain aspects of OpenMP that might not play well with other threading constructs, for example the limit on the number of threads set by OMP_THREAD_LIMIT when forking two or more concurrent parallel regions.

由于OpenMP的主要用途是高性能计算(毕竟,它的指令和执行模型是从高性能的Fortran借鉴而来的),任何OpenMP实现的主要目标都是效率,而不是与其他线程模式的互操作性。在某些平台上,只有当OpenMP运行时是控制进程线程的惟一运行时,才能实现有效的实现。此外,OpenMP的某些方面可能不能很好地应用于其他线程结构,例如,当划分两个或多个并发并行区域时,OMP_THREAD_LIMIT设置的线程数量的限制。

Since the OpenMP standard itself does not strictly forbid using other threading paradigms, but neither standardises the interoperability with such, supporting such functionality is up to the implementers. This means that some implementations might provide safe concurrent execution of top-level OpenMP regions, some might not. The x86 implementers pledge to supporting it, may be because most of them are also proponents of other execution models (e.g. Intel with Cilk and TBB, GCC with C++11, etc.) and x86 is usually considered an "experimental" platform (other vendors are usually much more conservative).

由于OpenMP标准本身并没有严格禁止使用其他线程范式,但也没有对与此类的互操作性进行标准化,支持此类功能取决于实现者。这意味着有些实现可能提供*OpenMP区域的安全并发执行,有些则可能不提供。x86实现者承诺支持它,可能是因为他们中的大多数人都是其他执行模型的支持者(例如,使用Cilk和TBB的Intel、使用c++ 11的GCC等),而x86通常被认为是一个“试验性”平台(其他供应商通常更为保守)。

OpenMP 4.0 is also not going further than ISO/IEC 14882:1998 for the C++ features it employs (the SC12 draft is here). The standard now includes things like portable thread affinity - this definitely does not play well with other threading paradigms, which might provide their own binding mechanisms that * with those of OpenMP. Once again, the OpenMP language is targeted at HPC (data and task parallel scientific and engineering applications). The C++11 constructs are targeted at general purpose computing applications. If you want fancy C++11 concurrent stuff, then use C++11 only, or if you really need to mix it with OpenMP, then stick to the C++98 subset of language features if you want to stay portable.

OpenMP 4.0对于它所使用的c++特性也不会比ISO/IEC 14882:1998更进一步(SC12草稿在这里)。标准现在包括了诸如可移植线程关联之类的东西——这显然与其他线程模式不太一样,它们可能会提供与OpenMP冲突的绑定机制。OpenMP语言同样针对HPC(数据和任务并行的科学和工程应用程序)。c++ 11结构针对通用计算应用程序。如果您想要高级的c++ 11并发的东西,那么只使用c++ 11,或者如果您确实需要将它与OpenMP混合,那么如果您希望保持可移植性,请坚持使用c++ 98语言特性子集。

I'm particularly interested in the situation where I first call some code using OpenMP and then some other code using C++11 concurrency on the same data structures.

我特别感兴趣的是,我首先使用OpenMP调用一些代码,然后在相同的数据结构上使用c++ 11并发调用其他代码。

There are no obvious reasons for what you want to not be possible, but it is up to your OpenMP compiler and run-time. There are free and commercial libraries that use OpenMP for parallel execution (for example MKL), but there are always warnings (although sometimes hidden deeply in their user manuals) of possible incompatibility with multithreaded code that give information on what and when is possible. As always, this is outside of the scope of the OpenMP standard and hence YMMV.

没有明显的理由说明您不希望出现的情况,但是这取决于您的OpenMP编译器和运行时。有一些免费的和商业的库使用OpenMP进行并行执行(例如MKL),但是总有一些警告(尽管有时隐藏在用户手册中)可能与多线程代码不兼容,这些代码提供了关于什么和什么时候是可能的信息。一如既往,这超出了OpenMP标准的范围,因此也超出了YMMV的范围。

#2


7  

I'm actually interested in high-performance computing, but OpenMP (currently) does not serve my purpose well enough: it's not flexible enough (my algorithm is not loop based)

实际上我对高性能计算很感兴趣,但是OpenMP(目前)并不能很好地满足我的目的:它不够灵活(我的算法不是基于循环的)

Maybe you are really looking for TBB? That provides support for loop and task based parallelism, as well as a variety of parallel data structures, in standard C++, and is both portable and open-source.

也许你真的在找TBB?这提供了对循环和基于任务的并行性的支持,以及标准c++中的各种并行数据结构,并且是可移植的和开源的。

(Full disclaimer: I work for Intel who are heavily involved with TBB, though I don't actually work on TBB but on OpenMP :-); I am certainly not speaking for Intel!).

(完全免责声明:我为英特尔公司工作,该公司与TBB有密切的关系,虽然我不是在TBB上工作,而是在OpenMP:-);我当然不是在为英特尔说话!

#3


5  

Like Jim Cownie, I’m also an Intel employee. I agree with him that Intel Threading Building Blocks (Intel TBB) might be a good option since it has loop-level parallelism like OpenMP but also other parallel algorithms, concurrent containers, and lower-level features too. And TBB tries to keep up with the current C++ standard.

和吉姆·考尼一样,我也是英特尔的员工。我同意他的观点,英特尔线程构建块(Intel TBB)可能是一个不错的选择,因为它有像OpenMP这样的循环级并行,也有其他并行算法、并发容器和较低级的特性。TBB试图跟上当前的c++标准。

And to clarify for Walter, Intel TBB includes a parallel_reduce algorithm as well as high-level support for atomics and mutexes.

为了向Walter澄清,Intel TBB包括一个parallel_reduce算法以及对原子和互斥体的高级支持。

You can find the Intel® Threading Building Block’s User Guide at http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/tbb_userguide/title.htm The User Guide gives an overview of the features in the library.

你可以找到Intel®线程构建块的用户指南http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/tbb_userguide/title.htm用户指南概述在图书馆的功能。

#4


3  

OpenMP is often (I am aware of no exceptions) implemented on top of Pthreads, so you can reason about some of the interoperability questions by thinking about how C++11 concurrency interoperates with Pthread code.

OpenMP经常(我知道没有例外)在Pthreads上实现,因此您可以通过思考c++ +11并发如何与Pthread代码交互来推理一些互操作性问题。

I don't know if oversubscription due to the use of multiple threading models is an issue for you, but this is definitely an issue for OpenMP. There is a proposal to address this in OpenMP 5. Until then, how you solve this is implementation defined. They are heavy hammers, but you can use OMP_WAIT_POLICY (OpenMP 4.5+), KMP_BLOCKTIME (Intel and LLVM), and GOMP_SPINCOUNT (GCC) to address this. I'm sure other implementations have something similar.

我不知道是否由于使用多个线程模型而导致的过度订阅对您来说是一个问题,但这肯定是OpenMP的问题。在OpenMP 5中有一个解决这个问题的建议。在此之前,解决这个问题的方法是实现定义。它们是重型锤子,但是您可以使用OMP_WAIT_POLICY (OpenMP 4.5+)、KMP_BLOCKTIME (Intel和LLVM)和GOMP_SPINCOUNT (GCC)来解决这个问题。我确信其他实现也有类似的东西。

One issue where interoperability is a real concern is w.r.t. the memory model, i.e. how atomic operations behave. This is currently undefined, but you can still reason about it. For example, if you use C++11 atomics with OpenMP parallelism, you should be fine, but you are responsible for using C++11 atomics correctly from OpenMP threads.

互操作性真正关心的一个问题是内存模型,即原子操作的行为。这目前还没有定义,但是您仍然可以对此进行推理。例如,如果您使用带有OpenMP并行性的c++ 11 atomics,应该没问题,但是您负责从OpenMP线程中正确地使用c++ 11 atomics。

Mixing OpenMP atomics and C++11 atomics is a bad idea. We (the OpenMP language committee working group charged with looking at OpenMP 5 base language support) are currently trying to sort this out. Personally, I think C++11 atomics are better than OpenMP atomics in every way, so my recommendation is that you use C++11 (or C11, or __atomic) for your atomics and leave #pragma omp atomic for the Fortran programmers.

混合OpenMP原子和c++ 11原子不是一个好主意。我们(负责查看OpenMP 5基本语言支持的OpenMP语言委员会工作组)目前正在尝试解决这个问题。就我个人而言,我认为c++ 11原子学在各个方面都优于OpenMP原子,所以我的建议是,您可以使用c++ 11(或C11,或__atomic)作为您的原子,并为Fortran编程人员保留#pragma omp原子。

Below is an example code that uses C++11 atomics with OpenMP threads. It works as designed everywhere I have tested it.

下面是使用c++ 11 atomics和OpenMP线程的示例代码。它在我测试过的任何地方都适用。

Full disclosure: Like Jim and Mike, I work for Intel :-)

完全披露:像吉姆和迈克一样,我在英特尔工作:-)

#if defined(__cplusplus) && (__cplusplus >= 201103L)

#include <iostream>
#include <iomanip>

#include <atomic>

#include <chrono>

#ifdef _OPENMP
# include <omp.h>
#else
# error No OpenMP support!
#endif

#ifdef SEQUENTIAL_CONSISTENCY
auto load_model  = std::memory_order_seq_cst;
auto store_model = std::memory_order_seq_cst;
#else
auto load_model  = std::memory_order_acquire;
auto store_model = std::memory_order_release;
#endif

int main(int argc, char * argv[])
{
    int nt = omp_get_max_threads();
#if 1
    if (nt != 2) omp_set_num_threads(2);
#else
    if (nt < 2)      omp_set_num_threads(2);
    if (nt % 2 != 0) omp_set_num_threads(nt-1);
#endif

    int iterations = (argc>1) ? atoi(argv[1]) : 1000000;

    std::cout << "thread ping-pong benchmark\n";
    std::cout << "num threads  = " << omp_get_max_threads() << "\n";
    std::cout << "iterations   = " << iterations << "\n";
#ifdef SEQUENTIAL_CONSISTENCY
    std::cout << "memory model = " << "seq_cst";
#else
    std::cout << "memory model = " << "acq-rel";
#endif
    std::cout << std::endl;

    std::atomic<int> left_ready  = {-1};
    std::atomic<int> right_ready = {-1};

    int left_payload  = 0;
    int right_payload = 0;

    #pragma omp parallel
    {
        int me      = omp_get_thread_num();
        /// 0=left 1=right
        bool parity = (me % 2 == 0);

        int junk = 0;

        /// START TIME
        #pragma omp barrier
        std::chrono::high_resolution_clock::time_point t0 = std::chrono::high_resolution_clock::now();

        for (int i=0; i<iterations; ++i) {

            if (parity) {

                /// send to left
                left_payload = i;
                left_ready.store(i, store_model);

                /// recv from right
                while (i != right_ready.load(load_model));
                //std::cout << i << ": left received " << right_payload << std::endl;
                junk += right_payload;

            } else {

                /// recv from left
                while (i != left_ready.load(load_model));
                //std::cout << i << ": right received " << left_payload << std::endl;
                junk += left_payload;

                ///send to right
                right_payload = i;
                right_ready.store(i, store_model);

            }

        }

        /// STOP TIME
        #pragma omp barrier
        std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();

        /// PRINT TIME
        std::chrono::duration<double> dt = std::chrono::duration_cast<std::chrono::duration<double>>(t1-t0);
        #pragma omp critical
        {
            std::cout << "total time elapsed = " << dt.count() << "\n";
            std::cout << "time per iteration = " << dt.count()/iterations  << "\n";
            std::cout << junk << std::endl;
        }
    }

    return 0;
}

#else  // C++11
#error You need C++11 for this test!
#endif // C++11