并发性:c++ 11内存模型中的原子性和易失性

时间:2021-02-05 13:53:47

A global variable is shared across 2 concurrently running threads on 2 different cores. The threads writes to and read from the variables. For the atomic variable can one thread read a stale value? Each core might have a value of the shared variable in its cache and when one threads writes to its copy in a cache the other thread on a different core might read stale value from its own cache. Or the compiler does strong memory ordering to read the latest value from the other cache? The c++11 standard library has std::atomic support. How this is different from the volatile keyword? How volatile and atomic types will behave differently in the above scenario?

全局变量在两个不同内核上同时运行的线程之间共享。线程对变量进行写入和读取。对于原子变量,一个线程可以读取一个过时的值吗?每个内核可能在其缓存中都有共享变量的值,当一个线程在缓存中写入它的副本时,另一个线程可能从它自己的缓存中读取过时的值。或者编译器进行强内存排序以从其他缓存读取最新值?c++11标准库具有std::原子支持。这与volatile关键字有什么不同?在上面的场景中,可变类型和原子类型的行为会有什么不同?

4 个解决方案

#1


66  

Firstly, volatile does not imply atomic access. It is designed for things like memory mapped I/O and signal handling. volatile is completely unnecessary when used with std::atomic, and unless your platform documents otherwise, volatile has no bearing on atomic access or memory ordering between threads.

首先,volatile并不意味着原子访问。它是为内存映射I/O和信号处理之类的东西而设计的。当与std::atomic一起使用时,volatile是完全不必要的,而且除非您的平台文档另有规定,否则volatile不会影响线程之间的原子访问或内存排序。

If you have a global variable which is shared between threads, such as:

如果您有一个在线程之间共享的全局变量,例如:

std::atomic<int> ai;

then the visibility and ordering constraints depend on the memory ordering parameter you use for operations, and the synchronization effects of locks, threads and accesses to other atomic variables.

然后,可见性和排序约束取决于用于操作的内存排序参数,以及锁、线程和访问其他原子变量的同步效果。

In the absence of any additional synchronization, if one thread writes a value to ai then there is nothing that guarantees that another thread will see the value in any given time period. The standard specifies that it should be visible "in a reasonable period of time", but any given access may return a stale value.

在没有任何附加同步的情况下,如果一个线程向人工智能写入一个值,那么就没有任何保证在任何给定的时间段内,另一个线程将看到该值。标准指定它应该在“合理的时间内”可见,但是任何给定的访问都可能返回一个过时的值。

The default memory ordering of std::memory_order_seq_cst provides a single global total order for all std::memory_order_seq_cst operations across all variables. This doesn't mean that you can't get stale values, but it does mean that the value you do get determines and is determined by where in this total order your operation lies.

std::memory_order_seq_cst的默认内存排序为所有std::memory_order_seq_cst在所有变量上的操作提供了一个全局的总体顺序。这并不意味着你不能得到过时的值,但是它确实意味着你得到的值决定了你的操作的顺序。

If you have 2 shared variables x and y, initially zero, and have one thread write 1 to x and another write 2 to y, then a third thread that reads both may see either (0,0), (1,0), (0,2) or (1,2) since there is no ordering constraint between the operations, and thus the operations may appear in any order in the global order.

如果你有两个共享变量x和y,最初为零,并有一个线程写1 - x和另一个写2 y,然后第三个线程读都可以看到(0,0)、(1,0)、(0,2)或(1、2)因为没有约束之间的操作顺序,从而可能出现在任何顺序的操作在全球秩序。

If both writes are from the same thread, which does x=1 before y=2 and the reading thread reads y before x then (0,2) is no longer a valid option, since the read of y==2 implies that the earlier write to x is visible. The other 3 pairings (0,0), (1,0) and (1,2) are still possible, depending how the 2 reads interleave with the 2 writes.

如果两个写操作都来自同一个线程,即x=1在y=2之前,并且读线程在x之前读y,那么(0,2)不再是一个有效的选项,因为y==2的读操作意味着前面对x的写操作是可见的。另外三个pairings(0,0)、(1,0)和(1,2)仍然是可能的,这取决于2读与2写的交错。

If you use other memory orderings such as std::memory_order_relaxed or std::memory_order_acquire then the constraints are relaxed even further, and the single global ordering no longer applies. Threads don't even necessarily have to agree on the ordering of two stores to separate variables if there is no additional synchronization.

如果您使用其他的内存排序,比如std:: memory_order_弛豫或std:::memory_order_acquire,那么约束就会进一步放松,并且不再应用单个全局排序。如果没有额外的同步,线程甚至不需要同意两个存储的顺序来分离变量。

The only way to guarantee you have the "latest" value is to use a read-modify-write operation such as exchange(), compare_exchange_strong() or fetch_add(). Read-modify-write operations have an additional constraint that they always operate on the "latest" value, so a sequence of ai.fetch_add(1) operations by a series of threads will return a sequence of values with no duplicates or gaps. In the absence of additional constraints, there's still no guarantee which threads will see which values though.

确保您拥有“最新”值的唯一方法是使用一个读-修改-写操作,如exchange()、compare_exchange_strong()或fetch_add()。读-修改-写操作有一个附加的约束,即它们总是对“最新”值进行操作,因此由一系列线程执行的ai.fetch_add(1)操作序列将返回一个没有重复或间隔的值序列。在没有附加约束的情况下,仍然不能保证哪些线程会看到哪些值。

Working with atomic operations is a complex topic. I suggest you read a lot of background material, and examine published code before writing production code with atomics. In most cases it is easier to write code that uses locks, and not noticeably less efficient.

使用原子操作是一个复杂的主题。我建议您阅读大量的背景材料,并在使用atomics编写生产代码之前检查已发布的代码。在大多数情况下,编写使用锁的代码更容易,而且效率也不会明显降低。

#2


26  

volatile and the atomic operations have a different background, and were introduced with a different intent.

volatile和原子操作有不同的背景,并引入了不同的意图。

volatile dates from way back, and is principally designed to prevent compiler optimizations when accessing memory mapped IO. Modern compilers tend to do no more than suppress optimizations for volatile, although on some machines, this isn't sufficient for even memory mapped IO. Except for the special case of signal handlers, and setjmp, longjmp and getjmp sequences (where the C standard, and in the case of signals, the Posix standard, gives additional guarantees), it must be considered useless on a modern machine, where without special additional instructions (fences or memory barriers), the hardware may reorder or even suppress certain accesses. Since you shouldn't be using setjmp et al. in C++, this more or less leaves signal handlers, and in a multithreaded environment, at least under Unix, there are better solutions for those as well. And possibly memory mapped IO, if you're working on kernal code and can ensure that the compiler generates whatever is needed for the platform in question. (According to the standard, volatile access is observable behavior, which the compiler must respect. But the compiler gets to define what is meant by “access”, and most seem to define it as “a load or store machine instruction was executed”. Which, on a modern processor, doesn't even mean that there is necessarily a read or write cycle on the bus, much less that it's in the order you expect.)

反复无常的日期,主要是为了防止编译器在访问内存映射IO时进行优化。现代编译器往往只会抑制对volatile的优化,尽管在某些机器上,这对于甚至内存映射IO来说都是不够的。除了特殊情况的信号处理程序,和setjmp longjmp和getjmp序列(C标准,在信号的情况下,Posix标准、提供额外的担保),它必须被认为是无用的在现代的机器上,在没有特殊附加指令(栅栏或记忆障碍),硬件可能重新排序,甚至抑制某些访问。由于您不应该在c++中使用setjmp等,这或多或少会留下信号处理程序,而且在多线程环境中(至少在Unix下),也有更好的解决方案。可能还有内存映射IO,如果您正在处理内核代码,并且可以确保编译器生成有关平台所需的任何内容。(根据标准,volatile访问是可观察的行为,编译器必须尊重这些行为。但是编译器可以定义“访问”的含义,并且大多数似乎将其定义为“已执行的加载或存储机器指令”。在现代处理器上,它甚至不意味着总线上一定有一个读或写周期,更不用说它的顺序了。

Given this situation, the C++ standard added atomic access, which does provide a certain number of guarantees accross threads; in particular, the code generated around an atomic access will contain the necessary additional instructions to prevent the hardware from reordering the accesses, and to ensure that the accesses propagate down to the global memory shared between cores on a multicore machine. (At one point in the standardization effort, Microsoft proposed adding these semantics to volatile, and I think some of their C++ compilers do. After discussion of the issues in the committee, however, the general consensus—including the Microsoft representative—was that it was better to leave volatile with its orginal meaning, and to define the atomic types.) Or just use the system level primitives, like mutexes, which execute whatever instructions are needed in their code. (They have to. You can't implement a mutex without some guarantees concerning the order of memory accesses.)

在这种情况下,c++标准增加了原子访问,它确实提供了一定数量的保证线程;特别是,围绕原子访问生成的代码将包含必要的额外指令,以防止硬件重新排序访问,并确保访问传播到多核机器上的内核之间共享的全局内存。(在标准化工作中,微软曾建议将这些语义添加到volatile中,我认为他们的c++编译器会这样做。然而,在委员会讨论了这些问题之后,包括微软代表在内的普遍共识是,最好将volatile保留其原始含义,并定义原子类型。或者只使用系统级原语,比如互斥对象,它执行代码中需要的任何指令。(他们必须。如果没有对内存访问顺序的一些保证,就不能实现互斥锁。

#3


3  

Volatile and Atomic serve different purposes.

挥发物和原子有不同的用途。

Volatile : Informs the compiler to avoid optimization. This keyword is used for variables that shall change unexpectedly. So, it can be used to represent the Hardware status registers, variables of ISR, Variables shared in a multi-threaded application.

Volatile:通知编译器避免优化。这个关键字用于那些将要发生意外变化的变量。因此,它可以用来表示硬件状态寄存器、ISR变量、多线程应用程序*享的变量。

Atomic : It is also used in case of multi-threaded application. However, this ensures that there is no lock/stall while using in a multi-threaded application. Atomic operations are free of races and indivisble. Few of the key scenario of usage is to check whether a lock is free or used, atomically add to the value and return the added value etc. in multi-threaded application.

原子性:在多线程应用程序中也使用。但是,这可以确保在多线程应用程序中使用时没有锁/失速。原子操作是没有种族和不可分割的。使用的关键场景很少是检查锁是免费的还是使用的,在多线程应用程序中自动添加值并返回添加值等。

#4


2  

Here's a basic synopsis of what the 2 things are:

以下是这两件事的基本概要:

1) Volatile keyword:
Tells the compiler that this value could alter at any moment and therefore it should not EVER cache it in a register. Look up the old "register" keyword in C. "Volatile" is basically the "-" operator to "register"'s "+". Modern compilers now do the optimization that "register" used to explicitly request by default, so you only see 'volatile' anymore. Using the volatile qualifier will guarantee that your processing never uses a stale value, but nothing more.

1) Volatile关键字:告诉编译器这个值随时可能改变,因此它不应该在寄存器中缓存它。在C中查找旧的“register”关键字。“Volatile”基本上是“-”操作符来“注册”s“+”。现代编译器现在执行的优化是“register”,以前它在默认情况下显式地请求,因此您只能看到“volatile”。使用volatile限定符将保证您的处理永远不会使用过时的值,但仅此而已。

2) Atomic:
Atomic operations modify data in a single clock tick, so that it is impossible for ANY other thread to access the data in the middle of such an update. They're usually limited to whatever single-clock assembly instructions the hardware supports; things like ++,--, and swapping 2 pointers. Note that this says nothing about the ORDER the different threads will RUN the atomic instructions, only that they will never run in parallel. That's why you have all those additional options for forcing an ordering.

2)原子操作:原子操作在一个时钟滴答声中修改数据,这样在更新过程中,任何其他线程都不可能访问数据。它们通常被限制在硬件支持的任何单时钟组装指令上;比如++,-,交换2个指针。请注意,这并没有说明不同线程将运行原子指令的顺序,只是说明它们永远不会并行运行。这就是为什么您有所有这些额外的选项来强制排序。

#1


66  

Firstly, volatile does not imply atomic access. It is designed for things like memory mapped I/O and signal handling. volatile is completely unnecessary when used with std::atomic, and unless your platform documents otherwise, volatile has no bearing on atomic access or memory ordering between threads.

首先,volatile并不意味着原子访问。它是为内存映射I/O和信号处理之类的东西而设计的。当与std::atomic一起使用时,volatile是完全不必要的,而且除非您的平台文档另有规定,否则volatile不会影响线程之间的原子访问或内存排序。

If you have a global variable which is shared between threads, such as:

如果您有一个在线程之间共享的全局变量,例如:

std::atomic<int> ai;

then the visibility and ordering constraints depend on the memory ordering parameter you use for operations, and the synchronization effects of locks, threads and accesses to other atomic variables.

然后,可见性和排序约束取决于用于操作的内存排序参数,以及锁、线程和访问其他原子变量的同步效果。

In the absence of any additional synchronization, if one thread writes a value to ai then there is nothing that guarantees that another thread will see the value in any given time period. The standard specifies that it should be visible "in a reasonable period of time", but any given access may return a stale value.

在没有任何附加同步的情况下,如果一个线程向人工智能写入一个值,那么就没有任何保证在任何给定的时间段内,另一个线程将看到该值。标准指定它应该在“合理的时间内”可见,但是任何给定的访问都可能返回一个过时的值。

The default memory ordering of std::memory_order_seq_cst provides a single global total order for all std::memory_order_seq_cst operations across all variables. This doesn't mean that you can't get stale values, but it does mean that the value you do get determines and is determined by where in this total order your operation lies.

std::memory_order_seq_cst的默认内存排序为所有std::memory_order_seq_cst在所有变量上的操作提供了一个全局的总体顺序。这并不意味着你不能得到过时的值,但是它确实意味着你得到的值决定了你的操作的顺序。

If you have 2 shared variables x and y, initially zero, and have one thread write 1 to x and another write 2 to y, then a third thread that reads both may see either (0,0), (1,0), (0,2) or (1,2) since there is no ordering constraint between the operations, and thus the operations may appear in any order in the global order.

如果你有两个共享变量x和y,最初为零,并有一个线程写1 - x和另一个写2 y,然后第三个线程读都可以看到(0,0)、(1,0)、(0,2)或(1、2)因为没有约束之间的操作顺序,从而可能出现在任何顺序的操作在全球秩序。

If both writes are from the same thread, which does x=1 before y=2 and the reading thread reads y before x then (0,2) is no longer a valid option, since the read of y==2 implies that the earlier write to x is visible. The other 3 pairings (0,0), (1,0) and (1,2) are still possible, depending how the 2 reads interleave with the 2 writes.

如果两个写操作都来自同一个线程,即x=1在y=2之前,并且读线程在x之前读y,那么(0,2)不再是一个有效的选项,因为y==2的读操作意味着前面对x的写操作是可见的。另外三个pairings(0,0)、(1,0)和(1,2)仍然是可能的,这取决于2读与2写的交错。

If you use other memory orderings such as std::memory_order_relaxed or std::memory_order_acquire then the constraints are relaxed even further, and the single global ordering no longer applies. Threads don't even necessarily have to agree on the ordering of two stores to separate variables if there is no additional synchronization.

如果您使用其他的内存排序,比如std:: memory_order_弛豫或std:::memory_order_acquire,那么约束就会进一步放松,并且不再应用单个全局排序。如果没有额外的同步,线程甚至不需要同意两个存储的顺序来分离变量。

The only way to guarantee you have the "latest" value is to use a read-modify-write operation such as exchange(), compare_exchange_strong() or fetch_add(). Read-modify-write operations have an additional constraint that they always operate on the "latest" value, so a sequence of ai.fetch_add(1) operations by a series of threads will return a sequence of values with no duplicates or gaps. In the absence of additional constraints, there's still no guarantee which threads will see which values though.

确保您拥有“最新”值的唯一方法是使用一个读-修改-写操作,如exchange()、compare_exchange_strong()或fetch_add()。读-修改-写操作有一个附加的约束,即它们总是对“最新”值进行操作,因此由一系列线程执行的ai.fetch_add(1)操作序列将返回一个没有重复或间隔的值序列。在没有附加约束的情况下,仍然不能保证哪些线程会看到哪些值。

Working with atomic operations is a complex topic. I suggest you read a lot of background material, and examine published code before writing production code with atomics. In most cases it is easier to write code that uses locks, and not noticeably less efficient.

使用原子操作是一个复杂的主题。我建议您阅读大量的背景材料,并在使用atomics编写生产代码之前检查已发布的代码。在大多数情况下,编写使用锁的代码更容易,而且效率也不会明显降低。

#2


26  

volatile and the atomic operations have a different background, and were introduced with a different intent.

volatile和原子操作有不同的背景,并引入了不同的意图。

volatile dates from way back, and is principally designed to prevent compiler optimizations when accessing memory mapped IO. Modern compilers tend to do no more than suppress optimizations for volatile, although on some machines, this isn't sufficient for even memory mapped IO. Except for the special case of signal handlers, and setjmp, longjmp and getjmp sequences (where the C standard, and in the case of signals, the Posix standard, gives additional guarantees), it must be considered useless on a modern machine, where without special additional instructions (fences or memory barriers), the hardware may reorder or even suppress certain accesses. Since you shouldn't be using setjmp et al. in C++, this more or less leaves signal handlers, and in a multithreaded environment, at least under Unix, there are better solutions for those as well. And possibly memory mapped IO, if you're working on kernal code and can ensure that the compiler generates whatever is needed for the platform in question. (According to the standard, volatile access is observable behavior, which the compiler must respect. But the compiler gets to define what is meant by “access”, and most seem to define it as “a load or store machine instruction was executed”. Which, on a modern processor, doesn't even mean that there is necessarily a read or write cycle on the bus, much less that it's in the order you expect.)

反复无常的日期,主要是为了防止编译器在访问内存映射IO时进行优化。现代编译器往往只会抑制对volatile的优化,尽管在某些机器上,这对于甚至内存映射IO来说都是不够的。除了特殊情况的信号处理程序,和setjmp longjmp和getjmp序列(C标准,在信号的情况下,Posix标准、提供额外的担保),它必须被认为是无用的在现代的机器上,在没有特殊附加指令(栅栏或记忆障碍),硬件可能重新排序,甚至抑制某些访问。由于您不应该在c++中使用setjmp等,这或多或少会留下信号处理程序,而且在多线程环境中(至少在Unix下),也有更好的解决方案。可能还有内存映射IO,如果您正在处理内核代码,并且可以确保编译器生成有关平台所需的任何内容。(根据标准,volatile访问是可观察的行为,编译器必须尊重这些行为。但是编译器可以定义“访问”的含义,并且大多数似乎将其定义为“已执行的加载或存储机器指令”。在现代处理器上,它甚至不意味着总线上一定有一个读或写周期,更不用说它的顺序了。

Given this situation, the C++ standard added atomic access, which does provide a certain number of guarantees accross threads; in particular, the code generated around an atomic access will contain the necessary additional instructions to prevent the hardware from reordering the accesses, and to ensure that the accesses propagate down to the global memory shared between cores on a multicore machine. (At one point in the standardization effort, Microsoft proposed adding these semantics to volatile, and I think some of their C++ compilers do. After discussion of the issues in the committee, however, the general consensus—including the Microsoft representative—was that it was better to leave volatile with its orginal meaning, and to define the atomic types.) Or just use the system level primitives, like mutexes, which execute whatever instructions are needed in their code. (They have to. You can't implement a mutex without some guarantees concerning the order of memory accesses.)

在这种情况下,c++标准增加了原子访问,它确实提供了一定数量的保证线程;特别是,围绕原子访问生成的代码将包含必要的额外指令,以防止硬件重新排序访问,并确保访问传播到多核机器上的内核之间共享的全局内存。(在标准化工作中,微软曾建议将这些语义添加到volatile中,我认为他们的c++编译器会这样做。然而,在委员会讨论了这些问题之后,包括微软代表在内的普遍共识是,最好将volatile保留其原始含义,并定义原子类型。或者只使用系统级原语,比如互斥对象,它执行代码中需要的任何指令。(他们必须。如果没有对内存访问顺序的一些保证,就不能实现互斥锁。

#3


3  

Volatile and Atomic serve different purposes.

挥发物和原子有不同的用途。

Volatile : Informs the compiler to avoid optimization. This keyword is used for variables that shall change unexpectedly. So, it can be used to represent the Hardware status registers, variables of ISR, Variables shared in a multi-threaded application.

Volatile:通知编译器避免优化。这个关键字用于那些将要发生意外变化的变量。因此,它可以用来表示硬件状态寄存器、ISR变量、多线程应用程序*享的变量。

Atomic : It is also used in case of multi-threaded application. However, this ensures that there is no lock/stall while using in a multi-threaded application. Atomic operations are free of races and indivisble. Few of the key scenario of usage is to check whether a lock is free or used, atomically add to the value and return the added value etc. in multi-threaded application.

原子性:在多线程应用程序中也使用。但是,这可以确保在多线程应用程序中使用时没有锁/失速。原子操作是没有种族和不可分割的。使用的关键场景很少是检查锁是免费的还是使用的,在多线程应用程序中自动添加值并返回添加值等。

#4


2  

Here's a basic synopsis of what the 2 things are:

以下是这两件事的基本概要:

1) Volatile keyword:
Tells the compiler that this value could alter at any moment and therefore it should not EVER cache it in a register. Look up the old "register" keyword in C. "Volatile" is basically the "-" operator to "register"'s "+". Modern compilers now do the optimization that "register" used to explicitly request by default, so you only see 'volatile' anymore. Using the volatile qualifier will guarantee that your processing never uses a stale value, but nothing more.

1) Volatile关键字:告诉编译器这个值随时可能改变,因此它不应该在寄存器中缓存它。在C中查找旧的“register”关键字。“Volatile”基本上是“-”操作符来“注册”s“+”。现代编译器现在执行的优化是“register”,以前它在默认情况下显式地请求,因此您只能看到“volatile”。使用volatile限定符将保证您的处理永远不会使用过时的值,但仅此而已。

2) Atomic:
Atomic operations modify data in a single clock tick, so that it is impossible for ANY other thread to access the data in the middle of such an update. They're usually limited to whatever single-clock assembly instructions the hardware supports; things like ++,--, and swapping 2 pointers. Note that this says nothing about the ORDER the different threads will RUN the atomic instructions, only that they will never run in parallel. That's why you have all those additional options for forcing an ordering.

2)原子操作:原子操作在一个时钟滴答声中修改数据,这样在更新过程中,任何其他线程都不可能访问数据。它们通常被限制在硬件支持的任何单时钟组装指令上;比如++,-,交换2个指针。请注意,这并没有说明不同线程将运行原子指令的顺序,只是说明它们永远不会并行运行。这就是为什么您有所有这些额外的选项来强制排序。