如何避免可变状态（多线程时）

Multithreading is hard. The only this you can do is program very carefully and follow good advice. One great advice I got from the answers on this forum is to avoid mutable state. I understand this is even enforced in the Erlang language. However, I fail to see how this can be done without a severe performance hit and huge amounts of caches.

多线程很难。你唯一能做的就是仔细编程并遵循好的建议。我从这个论坛的答案中得到的一个很好的建议是避免可变状态。据我所知,这甚至在Erlang语言中得到了强制执行。但是,如果没有严重的性能损失和大量缓存,我无法看到如何做到这一点。

For example. You have a big list of objects, each containing quite a lot of properties; in other words: a large datastructure. Suppose you have got a bunch of threads and they all need to access and modify the list. How can this be done without shared memory without having to cache the whole datastructure in each of the threads?

例如。你有一个很大的对象列表,每个对象包含很多属性;换句话说:一个大型数据结构。假设你有一堆线程,他们都需要访问和修改列表。如何在没有共享内存的情况下完成此操作而无需在每个线程中缓存整个数据结构?

Update: After reading the reactions so far, I would like to put some more emphasis on performance. Don't you think that copying the same data around will make the program slower than with shared memory?

更新:在阅读了迄今为止的反应后,我想更多地强调性能。您是否认为复制相同的数据会使程序比使用共享内存慢?

6 个解决方案

#1

Not each algorithm can be parallelized in a successful manner.

并非每个算法都能以成功的方式并行化。

If your program doesn't exhibit any "parallel structure", then you're pretty doomed to use locking and shared, mutable structures.

如果你的程序没有展示任何“并行结构”,那么你就注定要使用锁定和共享的可变结构。

If your algorithm exhibit structure, then you can express your computation in terms of some patterns or formalism (for ex., a macro dataflow graph) that makes the choice of an immutable datastruct trivial.

如果您的算法展示了结构,那么您可以根据某些模式或形式(例如,宏数据流图)来表达您的计算,这使得不可变数据结构的选择变得微不足道。

So: think in term of the structure of the algorithm and just not in term of the properties of the datastructure to use.

因此:根据算法的结构来考虑,而不是根据要使用的数据结构的属性。

#2

You can get a great start in thinking about immutable collections, where they are applicable, how they can actually work without requiring lots of copying, etc. by looking through Eric Lippert's articles tagged with immutability:

通过查看Eric Lippert标记为不变性的文章,您可以在思考不可变集合,适用它们,如何实际工作而不需要大量复制等方面有一个良好的开端:

http://blogs.msdn.com/ericlippert/archive/tags/Immutability/default.aspx

#3

I guess the first question is: why do they need to modify the list? Would it be possible for them to return their changes as a list of modifications rather than actually modifying the shared list? Could they work with a list which looks like it's a mutable version of the original list, but is actually only locally mutable? Are you changing which elements are in the list, or just the properties of those elements?

我想第一个问题是:为什么他们需要修改列表?他们是否有可能将更改作为修改列表返回而不是实际修改共享列表?他们可以使用一个看起来像是原始列表的可变版本的列表,但实际上只是本地可变的吗?您是在更改列表中的哪些元素,还是只更改这些元素的属性?

These are just questions rather than answers, but I'm trying to encourage you to think about your problem in a different way. Look at the bigger picture as the task you want to achieve instead of thinking about the way you'd tackle it in a normal imperative, mutable fashion. Changing the way you think about problems is very difficult, but you may find you get some great "aha!" moments :)

这些只是问题而不是答案,但我试图鼓励你以不同的方式思考你的问题。将大局视为您想要实现的任务,而不是考虑以正常的命令式,可变的方式解决它的方式。改变你对问题的看法是非常困难的,但你可能会发现你得到了一些很棒的“啊哈!”时刻:)

#4

There are many pitfalls when working with multiple threads and large sets of data. The advice to avoid mutable state is meant to try to make life easier for you if you can manage to follow the guideline (i.e. if you have no mutable state then multi-threading will be much easier).

使用多个线程和大量数据时存在许多缺陷。避免可变状态的建议意味着如果你能够设法遵循指南(例如,如果你没有可变状态,那么多线程将更容易),试图让你的生活更轻松。

If you have a large amount of data that does need to be modified then you perhaps cannot avoid mutable state. An alternative though would be to partition the data into blocks, each of which is passed to a thread for manipulation. The block can be processed and then passed back, and the controller can then perform the updates where necessary. In this scenario you have removed the mutable state from out of the the thread.

如果您有大量需要修改的数据,那么您可能无法避免可变状态。另一种方法是将数据分区为块,每个块都传递给一个线程进行操作。可以处理该块然后传回,然后控制器可以在必要时执行更新。在这种情况下,您已从线程中删除了可变状态。

If this cannot be done and each thread needs update access to the full list (i.e. it could update any item on the list at any time) then you are going to have a lot of fun trying to make sure you have got your locking strategies and concurrency issues sorted. I'm sure there are scenarios where this is required, and the design pattern of avoiding mutable state may not apply.

如果无法做到这一点并且每个线程都需要对完整列表的更新访问权限(即它可以随时更新列表中的任何项目)那么您将尝试确保获得锁定策略并获得很多乐趣。并发问题排序。我确信有些情况需要这样做,避免可变状态的设计模式可能不适用。

#5

Just using immutable data-objects is a big help. Modifying lists sounds like a constructed argument, but consider granular methods that are unaware of lists.

只使用不可变数据对象是一个很大的帮助。修改列表听起来像构造参数,但要考虑不知道列表的粒度方法。

#6

If you really need to update the structure one way to do this is have a single worker thread which picks up update requests from a fixed area prtected by a mutex.

如果你真的需要更新结构,一种方法是使用一个工作线程从一个互斥锁的固定区域获取更新请求。

If you are clever you can update the structure in place without affecting any "reading" threads (e.g. If you are adding to the end of an array you do all the work to add the new structure but only as the very last instruction do you increment the NoOfMembers count -- the reading threads should not see the new entry until you do this - or - arrange your data as an array of references to structures -- when you want to update a structure you copy the current member, update it, then as the last operation replace the reference in the array)

如果你聪明,你可以在不影响任何“读取”线程的情况下更新结构(例如,如果要添加到数组的末尾,则执行所有工作以添加新结构,但仅作为最后一条指令进行增量NoOfMembers计数 - 读取线程在您执行此操作之前不应该看到新条目 - 或 - 将数据排列为对结构的引用数组 - 当您想要更新结构时复制当前成员,更新它,然后作为最后一个操作替换数组中的引用)

The other threads then only need to check a single simple "update in progess" mutex only when they activly want to update.

其他线程只需要在他们激活想要更新时才检查单个简单的“progess”中的“更新”。

#1