内存栅栏是否会降低所有CPU内核的速度？

Somewhere, one time I read about memory fences (barriers). It was said that memory fence causes cache synchronisation between several CPU cores.

在某个地方，有一次我读到了内存栅栏（障碍物）。据说内存栅栏会导致多个CPU内核之间的缓存同步。

So my questions are:

所以我的问题是：

How does the OS (or CPU itself) know which cores need to be synchronised?

操作系统（或CPU本身）如何知道需要同步哪些内核？
Does it synchronise cache of all CPU cores?

它是否同步所有CPU核心的缓存？
If answer to (2) is 'yes' and assuming that sync operations are not cheap, does using memory fences slow down cores that are not used by my application? If for example I have a single threaded app running on my 8-core CPU, will it slow down all other 7 cores of the CPU, because some cache lines must be synced with all those cores?

如果对（2）的回答为“是”并假设同步操作不便宜，那么使用内存栅栏是否会减慢我的应用程序未使用的内核？例如，如果我在我的8核CPU上运行单线程应用程序，它是否会降低CPU的所有其他7个内核的速度，因为某些缓存行必须与所有这些内核同步？
Are the questions above totally ignorant and fences work completely differently?

上面的问题是完全无知的，围栏的工作完全不同吗？

3 个解决方案

#1

The OS does not need to know, and each CPU core does what it's told: each core with a memory fence has to do certain operations before or after, and that's all. A core isn't synchronizing "with" other cores, it's synchronizing memory accesses relative to itself.
操作系统不需要知道，并且每个CPU核心都按照它所说的去做：每个带有内存栅栏的核心必须在之前或之后进行某些操作，这就是全部。核心不与其他核心“同步”，它正在同步相对于自身的内存访问。
A fence in one core does not mean other cores are synchronized with it, so typically you would have two (or more) fences: one in the writer and one in the reader. A fence executed on one core does not need to impact any other cores. Of course there is no guarantee about this in general, just a hope that sane architectures will not unduly serialize multi-core execution.
一个核心中的栅栏并不意味着其他核心与它同步，因此通常您将拥有两个（或更多）栅栏：一个在写入器中，另一个在读取器中。在一个核心上执行的栅栏不需要影响任何其他核心。当然，一般来说并不能保证这一点，只是希望理智的架构不会过度地序列化多核执行。

#2

Generally, memory fences are used for ordering local operations. Take for instance this pseudo-assembler code:

通常，内存栅栏用于订购本地操作。以这个伪汇编程序代码为例：

load A
load B

Many CPU's do not guarantee that B is indeed loaded after A, B may be in a cache line that was loaded into cache earlier due to some other memory load. If you introduce a fence,

许多CPU不保证在A之后确实加载B，B可能在由于某些其他内存加载而早先加载到高速缓存中的高速缓存行中。如果你引入围栏，

load A
readFence
load B

you have the guarantee that B is loaded from memory after A is. If B were in cache but older than A, it would be reloaded.

你有保证在A之后从内存加载B.如果B在缓存中但比A早，则会重新加载。

The situation with stores is the same the other way around. With

商店的情况与其他方式相同。同

store A
store B

some CPUs may decide to write B to memory before they write A. Again, a fence between the two instructions may be needed to enforce ordering of the operations. Whether a memory fence is required always depends on the architecture.

一些CPU可能决定在写入A之前将B写入存储器。同样，可能需要两个指令之间的栅栏来强制执行操作的排序。是否需要内存栅栏总是取决于架构。

Generally, you use memory fences in pairs:

通常，您成对使用内存栅栏：

If one thread wants to publish an object, it first constructs the object, then it performs a write fence before it writes the pointer to the object into a publicly known location.

如果一个线程想要发布一个对象，它首先构造该对象，然后在将指向对象的指针写入公知位置之前执行写入栅栏。
The thread that wants to receive the object, reads the pointer from the publicly know memory location, then it executes a read fence to ensure that all further reads based on that pointer actually give the values the publishing thread intended.

想要接收对象的线程从公知的内存位置读取指针，然后执行读取栅栏以确保基于该指针的所有进一步读取实际上给出了发布线程所期望的值。

If either fence is missing, the reader may read the value of one or more data members of the object before it was initialized. Madness ensues.

如果缺少任一围栏，读者可以在初始化之前读取对象的一个或多个数据成员的值。疯狂随之而来。

#3

If you have say eight cores, and each core is doing different things, then these cores wouldn't be accessing the same memory, and wouldn't have the same memory in a cache line.

如果您说八个内核，并且每个内核执行不同的操作，那么这些内核将无法访问相同的内存，并且在高速缓存行中不会具有相同的内存。

If core #1 uses a memory fence, but no other core accesses the memory that core #1 accesses, then the other cores won't be slowed down at all. However, if core #1 writes to location X, uses a memory fence, then core #2 tries to read the same location X, the memory fence will make sure that core #2 throws away the value of location X if it was in a cache, and reads the data back from RAM, getting the same data that core #1 has written. That takes time of course, but that's what the memory fence was there for.

如果内核＃1使用内存网，但没有其他内核访问内核＃1访问的内存，那么其他内核根本不会减慢速度。但是，如果核心＃1写入位置X，使用内存栅栏，则核心＃2尝试读取相同的位置X，内存栅栏将确保核心＃2丢弃位置X的值（如果它在缓存，并从RAM读取数据，获取核心＃1写入的相同数据。这当然需要时间，但这就是记忆围栏的用途。

(Instead of reading from RAM, if the cores share some cache, then the data will be read from cache. )

（如果内核共享一些缓存，则不是从RAM读取，而是从缓存中读取数据。）

#1

The OS does not need to know, and each CPU core does what it's told: each core with a memory fence has to do certain operations before or after, and that's all. A core isn't synchronizing "with" other cores, it's synchronizing memory accesses relative to itself.
操作系统不需要知道，并且每个CPU核心都按照它所说的去做：每个带有内存栅栏的核心必须在之前或之后进行某些操作，这就是全部。核心不与其他核心“同步”，它正在同步相对于自身的内存访问。
A fence in one core does not mean other cores are synchronized with it, so typically you would have two (or more) fences: one in the writer and one in the reader. A fence executed on one core does not need to impact any other cores. Of course there is no guarantee about this in general, just a hope that sane architectures will not unduly serialize multi-core execution.
一个核心中的栅栏并不意味着其他核心与它同步，因此通常您将拥有两个（或更多）栅栏：一个在写入器中，另一个在读取器中。在一个核心上执行的栅栏不需要影响任何其他核心。当然，一般来说并不能保证这一点，只是希望理智的架构不会过度地序列化多核执行。

#2