I've been experimenting with multi threading and parallel processing and I needed a counter to do some basic counting and statistic analysis of the speed of the processing. To avoid problems with concurrent use of my class I've used a lock statement on a private variable in my class:
我一直在尝试多线程和并行处理,我需要一个计数器对处理速度进行一些基本的计数和统计分析。为了避免并发使用类的问题,我在类中使用了一个私有变量的锁语句:
private object mutex = new object();
public void Count(int amount)
{
lock(mutex)
{
done += amount;
}
}
But I was wondering... how expensive is locking a variable? What are the negative effects on performance?
但是我在想…锁定变量的代价有多大?对绩效有什么负面影响?
7 个解决方案
#2
44
The technical answer is that this is impossible to quantify, it heavily depends on the state of the CPU memory write-back buffers and how much data that the prefetcher gathered has to be discarded and re-read. Which are both very non-deterministic. I use 150 CPU cycles as a back-of-the-envelope approximation that avoids major disappointments.
技术上的答案是,这是无法量化的,这在很大程度上取决于CPU内存回写缓冲区的状态,以及prefetcher收集的数据有多少必须被丢弃和重新读取。它们都是非确定性的。我使用150个CPU周期作为一个粗略的近似,避免了主要的失望。
The practical answer is that it is waaaay cheaper than the amount of time you'll burn on debugging your code when you think you can skip a lock.
实际的答案是,当你认为你可以跳过一个锁的时候,它比你在调试代码时消耗的时间要便宜的多。
To get a hard number you'll have to measure. Visual Studio has a slick concurrency analyzer available as an extension.
要得到一个硬数字,你必须要衡量。Visual Studio有一个灵活的并发分析器作为扩展。
#3
24
Further reading:
I would like to present few articles of mine, that are interested in general synchronization primitives and they are digging into Monitor, C# lock statement behavior, properties, and costs depending on distinct scenarios and number of threads. It is specifically interested about CPU wastage and throughput periods to understand how much work can be pushed through in multiple scenarios:
我想介绍我的几篇文章,它们对一般的同步原语感兴趣,他们正在深入研究Monitor、c#锁语句的行为、属性和成本,这取决于不同的场景和线程的数量。它特别关注CPU浪费和吞吐量周期,以了解在多个场景中可以完成多少工作:
https://www.codeproject.com/Articles/1236238/Unified-Concurrency-I-Introduction https://www.codeproject.com/Articles/1237518/Unified-Concurrency-II-benchmarking-methodologies https://www.codeproject.com/Articles/1242156/Unified-Concurrency-III-cross-benchmarking
https://www.codeproject.com/Articles/1236238/Unified-Concurrency-I-Introduction https://www.codeproject.com/Articles/1237518/Unified-Concurrency-II-benchmarking-methodologies https://www.codeproject.com/Articles/1237518/Unified-Concurrency-II-benchmarking-methodologies
Original answer:
Oh dear!
噢,亲爱的!
It seems that correct answer flagged here as THE ANSWER is inherently incorrect! I would like to ask the author of the answer, respectfully, to read the linked article to the end. article
答案似乎是正确的,因为答案本身就是错误的!我想请答案的作者,恭敬地,把相关的文章读到最后。文章
The author of the article from 2003 article was measuring on Dual Core machine only and in the first measuring case, he measured locking with a single thread only and the result was about 50ns per lock access.
2003年的一篇文章的作者仅在双核机器上测量,在第一个测量案例中,他只测量了一个线程的锁,结果是每个锁访问大约50ns。
It says nothing about a lock in the concurrent environment. So we have to continue reading the article and in the second half, the author was measuring locking scenario with two and three threads, which gets closer to concurrency levels of today's processors.
它没有说明并发环境中的锁。因此,我们必须继续阅读本文,在下半部分,作者使用了两个和三个线程来度量锁定场景,这些线程与当前处理器的并发级别更接近。
So the author says, that with two threads on Dual Core, the locks cost 120ns, and with 3 threads it goes to 180ns. So it seems to be clearly dependent on the number of threads accessing the lock concurrently.
作者说,双核上有两个线程,锁的价格是120ns, 3个线程的价格是180ns。因此,它似乎明显地依赖于同时访问锁的线程的数量。
So it is simple, it is not 50 ns unless it is a single thread, where the lock gets useless.
所以它很简单,它不是50个ns,除非它是一个单独的线程,其中的锁是无用的。
Another issue for consideration is that it is measured as average time!
另一个需要考虑的问题是,它是以平均时间来衡量的!
If the time of iterations would be measured, there would be even times between 1ms to 20ms, simply because the majority was fast, but few threads will be waiting for processors time and incur even milliseconds long delays.
如果要度量迭代的时间,就会有1ms到20ms之间的偶数时间,这仅仅是因为大多数线程的速度很快,但是很少有线程会等待处理器时间,甚至会导致毫秒长的延迟。
This is bad news for any kind of application which requires high throughput, low latency.
对于任何需要高吞吐量、低延迟的应用程序来说,这都是坏消息。
And the last issue for consideration is that there could be slower operations inside the lock and very often that is the case. The longer the block of code is executed inside the lock, the higher the contention is and delays rise sky high.
最后一个需要考虑的问题是在锁里面可能会有更慢的操作通常情况就是这样。代码块在锁内执行的时间越长,争用就越高,延迟就越高。
Please consider, that over one decade has passed already from 2003, that is few generations of processors designed specifically to run fully concurrently and locking is considerably harming their performance.
请考虑一下,从2003年到现在已经过去了十多年,很少有几代处理器专门设计成完全并发运行,锁会严重损害它们的性能。
#4
19
This doesn't answer your query about performance, but I can say that the .NET Framework does offer an Interlocked.Add
method that will allow you to add your amount
to your done
member without manually locking on another object.
这并不能回答关于性能的查询,但是我可以说。net框架确实提供了一个互锁的。添加方法,该方法将允许您在不手动锁定另一个对象的情况下向done成员添加您的金额。
#5
10
lock
(Monitor.Enter/Exit) is very cheap, cheaper than alternatives like a Waithandle or Mutex.
锁(监视/退出)非常便宜,比Waithandle或Mutex等替代产品便宜。
But what if it was (a little) slow, would you rather have a fast program with incorrect results?
但是如果它是(一点点)慢,你宁愿有一个不正确的结果的快速程序吗?
#6
6
The cost for a lock in a tight loop, compared to an alternative with no lock, is huge. You can afford to loop many times and still be more efficient than a lock. That is why lock free queues are so efficient.
与没有锁的替代方案相比,紧循环中的锁的成本是巨大的。您可以承受多次循环,但仍然比锁更有效。这就是无锁队列如此高效的原因。
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace LockPerformanceConsoleApplication
{
class Program
{
static void Main(string[] args)
{
var stopwatch = new Stopwatch();
const int LoopCount = (int) (100 * 1e6);
int counter = 0;
for (int repetition = 0; repetition < 5; repetition++)
{
stopwatch.Reset();
stopwatch.Start();
for (int i = 0; i < LoopCount; i++)
lock (stopwatch)
counter = i;
stopwatch.Stop();
Console.WriteLine("With lock: {0}", stopwatch.ElapsedMilliseconds);
stopwatch.Reset();
stopwatch.Start();
for (int i = 0; i < LoopCount; i++)
counter = i;
stopwatch.Stop();
Console.WriteLine("Without lock: {0}", stopwatch.ElapsedMilliseconds);
}
Console.ReadKey();
}
}
}
Output:
输出:
With lock: 2013
Without lock: 211
With lock: 2002
Without lock: 210
With lock: 1989
Without lock: 210
With lock: 1987
Without lock: 207
With lock: 1988
Without lock: 208
#7
4
There are a few different ways to define "cost". There is the actual overhead of obtaining and releasing the lock; as Jake writes, that's negligible unless this operation is performed millions of times.
有几种不同的方式来定义“成本”。获取和释放锁的实际开销;正如杰克所写的,这是可以忽略的,除非这个操作被执行数百万次。
Of more relevance is the effect this has on the flow of execution. This code can only be entered by one thread at a time. If you have 5 threads performing this operation on a regular basis, 4 of them will end up waiting for the lock to be released, and then to be the first thread scheduled to enter that piece of code after that lock is released. So, your algorithm is going to suffer significantly. How much so depends on the algorithm and how often the operation is called.. You can't really avoid it without introducing race conditions, but you can ameliorate it by minimizing the number of calls to the locked code.
更相关的是它对执行流的影响。此代码一次只能由一个线程输入。如果有5个线程定期执行此操作,其中4个线程将等待释放锁,然后作为计划在释放锁之后输入这段代码的第一个线程。你的算法会受到很大的影响。这取决于算法以及操作的频率。如果不引入竞态条件,就无法真正避免它,但可以通过最小化对锁定代码的调用数量来改善它。
#1
#2
44
The technical answer is that this is impossible to quantify, it heavily depends on the state of the CPU memory write-back buffers and how much data that the prefetcher gathered has to be discarded and re-read. Which are both very non-deterministic. I use 150 CPU cycles as a back-of-the-envelope approximation that avoids major disappointments.
技术上的答案是,这是无法量化的,这在很大程度上取决于CPU内存回写缓冲区的状态,以及prefetcher收集的数据有多少必须被丢弃和重新读取。它们都是非确定性的。我使用150个CPU周期作为一个粗略的近似,避免了主要的失望。
The practical answer is that it is waaaay cheaper than the amount of time you'll burn on debugging your code when you think you can skip a lock.
实际的答案是,当你认为你可以跳过一个锁的时候,它比你在调试代码时消耗的时间要便宜的多。
To get a hard number you'll have to measure. Visual Studio has a slick concurrency analyzer available as an extension.
要得到一个硬数字,你必须要衡量。Visual Studio有一个灵活的并发分析器作为扩展。
#3
24
Further reading:
I would like to present few articles of mine, that are interested in general synchronization primitives and they are digging into Monitor, C# lock statement behavior, properties, and costs depending on distinct scenarios and number of threads. It is specifically interested about CPU wastage and throughput periods to understand how much work can be pushed through in multiple scenarios:
我想介绍我的几篇文章,它们对一般的同步原语感兴趣,他们正在深入研究Monitor、c#锁语句的行为、属性和成本,这取决于不同的场景和线程的数量。它特别关注CPU浪费和吞吐量周期,以了解在多个场景中可以完成多少工作:
https://www.codeproject.com/Articles/1236238/Unified-Concurrency-I-Introduction https://www.codeproject.com/Articles/1237518/Unified-Concurrency-II-benchmarking-methodologies https://www.codeproject.com/Articles/1242156/Unified-Concurrency-III-cross-benchmarking
https://www.codeproject.com/Articles/1236238/Unified-Concurrency-I-Introduction https://www.codeproject.com/Articles/1237518/Unified-Concurrency-II-benchmarking-methodologies https://www.codeproject.com/Articles/1237518/Unified-Concurrency-II-benchmarking-methodologies
Original answer:
Oh dear!
噢,亲爱的!
It seems that correct answer flagged here as THE ANSWER is inherently incorrect! I would like to ask the author of the answer, respectfully, to read the linked article to the end. article
答案似乎是正确的,因为答案本身就是错误的!我想请答案的作者,恭敬地,把相关的文章读到最后。文章
The author of the article from 2003 article was measuring on Dual Core machine only and in the first measuring case, he measured locking with a single thread only and the result was about 50ns per lock access.
2003年的一篇文章的作者仅在双核机器上测量,在第一个测量案例中,他只测量了一个线程的锁,结果是每个锁访问大约50ns。
It says nothing about a lock in the concurrent environment. So we have to continue reading the article and in the second half, the author was measuring locking scenario with two and three threads, which gets closer to concurrency levels of today's processors.
它没有说明并发环境中的锁。因此,我们必须继续阅读本文,在下半部分,作者使用了两个和三个线程来度量锁定场景,这些线程与当前处理器的并发级别更接近。
So the author says, that with two threads on Dual Core, the locks cost 120ns, and with 3 threads it goes to 180ns. So it seems to be clearly dependent on the number of threads accessing the lock concurrently.
作者说,双核上有两个线程,锁的价格是120ns, 3个线程的价格是180ns。因此,它似乎明显地依赖于同时访问锁的线程的数量。
So it is simple, it is not 50 ns unless it is a single thread, where the lock gets useless.
所以它很简单,它不是50个ns,除非它是一个单独的线程,其中的锁是无用的。
Another issue for consideration is that it is measured as average time!
另一个需要考虑的问题是,它是以平均时间来衡量的!
If the time of iterations would be measured, there would be even times between 1ms to 20ms, simply because the majority was fast, but few threads will be waiting for processors time and incur even milliseconds long delays.
如果要度量迭代的时间,就会有1ms到20ms之间的偶数时间,这仅仅是因为大多数线程的速度很快,但是很少有线程会等待处理器时间,甚至会导致毫秒长的延迟。
This is bad news for any kind of application which requires high throughput, low latency.
对于任何需要高吞吐量、低延迟的应用程序来说,这都是坏消息。
And the last issue for consideration is that there could be slower operations inside the lock and very often that is the case. The longer the block of code is executed inside the lock, the higher the contention is and delays rise sky high.
最后一个需要考虑的问题是在锁里面可能会有更慢的操作通常情况就是这样。代码块在锁内执行的时间越长,争用就越高,延迟就越高。
Please consider, that over one decade has passed already from 2003, that is few generations of processors designed specifically to run fully concurrently and locking is considerably harming their performance.
请考虑一下,从2003年到现在已经过去了十多年,很少有几代处理器专门设计成完全并发运行,锁会严重损害它们的性能。
#4
19
This doesn't answer your query about performance, but I can say that the .NET Framework does offer an Interlocked.Add
method that will allow you to add your amount
to your done
member without manually locking on another object.
这并不能回答关于性能的查询,但是我可以说。net框架确实提供了一个互锁的。添加方法,该方法将允许您在不手动锁定另一个对象的情况下向done成员添加您的金额。
#5
10
lock
(Monitor.Enter/Exit) is very cheap, cheaper than alternatives like a Waithandle or Mutex.
锁(监视/退出)非常便宜,比Waithandle或Mutex等替代产品便宜。
But what if it was (a little) slow, would you rather have a fast program with incorrect results?
但是如果它是(一点点)慢,你宁愿有一个不正确的结果的快速程序吗?
#6
6
The cost for a lock in a tight loop, compared to an alternative with no lock, is huge. You can afford to loop many times and still be more efficient than a lock. That is why lock free queues are so efficient.
与没有锁的替代方案相比,紧循环中的锁的成本是巨大的。您可以承受多次循环,但仍然比锁更有效。这就是无锁队列如此高效的原因。
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace LockPerformanceConsoleApplication
{
class Program
{
static void Main(string[] args)
{
var stopwatch = new Stopwatch();
const int LoopCount = (int) (100 * 1e6);
int counter = 0;
for (int repetition = 0; repetition < 5; repetition++)
{
stopwatch.Reset();
stopwatch.Start();
for (int i = 0; i < LoopCount; i++)
lock (stopwatch)
counter = i;
stopwatch.Stop();
Console.WriteLine("With lock: {0}", stopwatch.ElapsedMilliseconds);
stopwatch.Reset();
stopwatch.Start();
for (int i = 0; i < LoopCount; i++)
counter = i;
stopwatch.Stop();
Console.WriteLine("Without lock: {0}", stopwatch.ElapsedMilliseconds);
}
Console.ReadKey();
}
}
}
Output:
输出:
With lock: 2013
Without lock: 211
With lock: 2002
Without lock: 210
With lock: 1989
Without lock: 210
With lock: 1987
Without lock: 207
With lock: 1988
Without lock: 208
#7
4
There are a few different ways to define "cost". There is the actual overhead of obtaining and releasing the lock; as Jake writes, that's negligible unless this operation is performed millions of times.
有几种不同的方式来定义“成本”。获取和释放锁的实际开销;正如杰克所写的,这是可以忽略的,除非这个操作被执行数百万次。
Of more relevance is the effect this has on the flow of execution. This code can only be entered by one thread at a time. If you have 5 threads performing this operation on a regular basis, 4 of them will end up waiting for the lock to be released, and then to be the first thread scheduled to enter that piece of code after that lock is released. So, your algorithm is going to suffer significantly. How much so depends on the algorithm and how often the operation is called.. You can't really avoid it without introducing race conditions, but you can ameliorate it by minimizing the number of calls to the locked code.
更相关的是它对执行流的影响。此代码一次只能由一个线程输入。如果有5个线程定期执行此操作,其中4个线程将等待释放锁,然后作为计划在释放锁之后输入这段代码的第一个线程。你的算法会受到很大的影响。这取决于算法以及操作的频率。如果不引入竞态条件,就无法真正避免它,但可以通过最小化对锁定代码的调用数量来改善它。