多核架构的多线程

When you have a situation where Thread A reads some global variable and Thread B writes to the same variable, now unless read/write is not atomic on a single core, you can do it without synchronizing, however what happens when running on a multi-core machine?

当你遇到线程A读取一些全局变量并且线程B写入同一个变量的情况时,现在除非读/写在单个核心上不是原子的,否则你可以在没有同步的情况下进行,但是当在多个上运行时会发生什么。核心机器?

7 个解决方案

#1

Even on a single core, you cannot assume that an operation will be atomic. That may be the case where you're coding in assembler but, if you are coding in C++ as per your question, you do not know what it will compile down to.

即使在单个核心上,也不能假设操作是原子的。这可能是你在汇编程序中进行编码的情况,但是,如果按照你的问题用C ++编写代码,你就不知道它将编译成什么。

You should rely on the synchronisation primitives at the level of abstraction that you're coding to. In your case, that's the threading calls for C++. whether they be pthreads, Windows threads or something else entirely.

您应该依赖于您编码的抽象级别的同步原语。在你的情况下,这是线程调用C ++。它们是pthreads,Windows线程还是完全不同的东西。

It's the same reasoning that I gave in another answer to do with whether i++ was thread-safe. The bottom line is, you don't know since you're not coding to that level (if you're doing inline assembler and/or you understand and can control what's going on under the covers, you're no longer coding at the C++ level and you can ignore my advice).

这与我在另一个答案中提出的相同的推理是否i ++是线程安全的。最重要的是,你不知道,因为你没有编码到那个级别(如果你正在使用内联汇编程序和/或你理解并且可以控制在幕后发生的事情,你就不再编码了C ++级别,你可以忽略我的建议)。

The operating system and/or OS-type libraries know a great deal about the environment they're running in, far more so than the C++ compiler would. Use of proper syncronisation primitives will save you a great deal of angst.

操作系统和/或OS类型库对它们运行的环境有很多了解,远远超过C ++编译器。使用适当的同步原语将为您节省大量的焦虑。

#2

It will have the same pitfalls as with a single core but with additional latency due to the L1 cache synchronization that must take place between cores.

它将具有与单核相同的缺陷,但由于必须在核之间进行L1高速缓存同步而具有额外的延迟。

Note - "you can do it without synchronizing" is not always a true statement.

注意 - “你可以不同步地做”并不总是一个真实的陈述。

#3

Even on a singlecore machine, there is absolutely no guarantee that this will work without explicit synchronization.

即使在单一机器上,也绝对不能保证在没有明确同步的情况下这将工作。

There are several reasons for this:

有几个原因:

the OS may interrupt a thread at any time (between any two instructions), and then run the other thread, and

操作系统可以随时中断线程(在任意两条指令之间),然后运行另一个线程,并且

if there is no explicit synchronization, the compiler may reorder instructions very liberally, breaking any guarantees you thought you had, and

如果没有明确的同步,编译器可能会非常宽松地重新排序指令,破坏您认为的任何保证,并且

even the CPU may do the same, reordering instructions on the fly.

甚至CPU也可以这样做,即时重新排序指令。

If you want correct communication between two threads, you need some kind of synchronization. Always, with no exception.

如果要在两个线程之间进行正确的通信,则需要某种同步。总是,没有例外。

That synchronization may be a mutex provided by the OS or the threading API, or it may be CPU-specific atomic instructions, or just a plain memory barrier.

该同步可以是OS或线程API提供的互斥锁,也可以是CPU特定的原子指令,或者只是普通的内存屏障。

#4

For a non-atomic operation on a multi-core machine, you need to use a system provided Mutex in order to synchronize the accesses.

对于多核计算机上的非原子操作,您需要使用系统提供的Mutex来同步访问。

For C++, the boost mutex library provides several mutex types that provide a consistent interface for OS-supplied mutex types.

对于C ++,boost mutex库提供了几种互斥类型,为OS提供的互斥锁类型提供了一致的接口。

If you choose to look at boost as your syncing / multithreading library, you should read up on the Synchronization concepts.

如果您选择将boost视为同步/多线程库,则应阅读同步概念。

#5

Depending on your situation the following may be relevant. While it won't make your program run incorrectly it can make a big difference in speed. Even if you aren't accessing the same memory location, you may get a performance hit due to cache effects if two cores are thrashing over the same page in the cache (though not the same location because you carefully synchronized your data structures).

根据您的具体情况,以下内容可能相关。虽然它不会使你的程序运行不正确,但它可以在速度上产生很大的不同。即使您没有访问相同的内存位置,如果两个内核在缓存中的同一页面上颠簸(尽管因为您仔细同步数据结构而不是相同的位置),您可能会因缓存效应而受到性能影响。

There is a good overview of "false sharing" here: http://www.drdobbs.com/go-parallel/article/showArticle.jhtml;jsessionid=LIHTU4QIPKADTQE1GHRSKH4ATMY32JVN?articleID=217500206

这里有一个很好的概述“虚假分享”:http://www.drdobbs.com/go-parallel/article/showArticle.jhtml;jsessionid = LIHTU4QIPKADTQE1GHRSKH4ATMY32JVN?articleID = 2170000206

#6

As far as the (new) C++ standard is concerned, if a program contains a data race, the behavior of the program is undefined. A program has a data race if there is an interleaving of threads such that it contains two neighboring conflicting memory accesses from different threads (which is just a very formal way of saying "a program has a data race if two conflicting accesses can occur concurrently").

就(新)C ++标准而言,如果程序包含数据争用,则程序的行为是未定义的。如果存在线程交错,则程序具有数据竞争,使得它包含来自不同线程的两个相邻的冲突存储器访问(这是一种非常正式的方式,如果两个冲突的访问可以同时发生,则程序具有数据竞争“ )。

Note that it doesn't matter how many cores you're running on, the behavior of your program is undefined (notably the optimizer can reorder instructions as it sees fit).

请注意,运行的核心数无关紧要,程序的行为未定义(特别是优化程序可以按照其认为合适的方式对指令进行重新排序)。

#7

No one has mentioned the pros and cons of implicit synchronization.

没有人提到隐式同步的优缺点。

The main "pro" is of course that the programmer can write anything at all and not have to bother about synchronization.

主要的“专业人士”当然是程序员可以编写任何内容而不必担心同步。

The main "con" is that this takes A LOT of time. The implicit synchronization needs to wind its way down through the caches to at least (you might think) the first cache that is common to both cores. Wrong! There may be several physical processors installed in the computer so synchronization can't stop at a cache, it needs to go all the way down to RAM. If you want to synchronize there you also need to synchronize with other devices that need to synchronize with memory i e any bus-mastering device. Bus-mastering devices may be cards on the classic PCI-bus and may be running at 33 MHz so the implicit synchronization would need to wait for them too to acknowledge that it's ok to write to or read from a specific RAM location. We're talking a 100X difference just in clock speed between the core and the slowest bus and the slowest bus needs several of its own bus cycles to react in a reliable manner. Because synchronization MUST be reliable, it is of no use otherwise.

主要的“骗局”是这需要很多时间。隐式同步需要通过缓存向下移动至少(您可能认为)两个核心共有的第一个缓存。错误!计算机中可能安装了多个物理处理器,因此同步不能在高速缓存中停止,它需要一直向下到RAM。如果要在那里进行同步,还需要与需要与任何总线主控设备的内存同步的其他设备进行同步。总线主控设备可能是经典PCI总线上的卡,并且可能以33 MHz运行,因此隐式同步也需要等待它们以确认可以写入或读取特定RAM位置。我们说的是核心和最慢总线之间的时钟速度差异只有100倍,而最慢的总线需要几个自己的总线周期才能以可靠的方式做出反应。因为同步必须是可靠的,否则它是没有用的。

So in the choice between implementing electronics for implicit synchronization (which is better left to the programmer to handle explicitly anyway) and a faster system which can synchronize when necessary the answer is obvious.

因此,在实现用于隐式同步的电子设备(最好留给程序员以便明确处理)和更快的系统之间进行选择时,可以在必要时进行同步,答案是显而易见的。

The explicit keys to synchronization are the LOCK prefix and the XCHG mem,reg instruction.

同步的显式键是LOCK前缀和XCHG mem,reg指令。

You could say that implicit synchronization is like training wheels: you won't fall to the ground but you can't go especially fast or turn especially quickly. Soon you'll tire and want to move on to the real stuff. Sure, you'll get hurt but in the process you'll either learn or quit.

你可以说隐式同步就像训练*一样:你不会倒在地上,但你不能特别快速地转弯或转得特别快。很快你就会厌倦并想继续前进。当然,你会受伤,但在这个过程中你要么学习要么退出。

#1