I had a small dispute over performance of synchronized block in Java. This is theoretical question, which does not affect real life application. Consider single-threaded application, which uses locks and synchronize sections. Does this code work slower than the same code without synchronize sections? If so, why? We do not discuss concurrency, since it’s only single thread application
我对Java中同步块的性能有一点争议。这是一个理论问题,不影响现实生活中的应用。考虑使用锁和同步部分的单线程应用程序。这个代码是否比没有同步部分的相同代码慢?如果是这样,为什么?我们不讨论并发性,因为它只是单线程应用程序
Upd
Found interesting benchmark testing it. But it's from 2001. Things could have changed dramatically in the latest version of JDK
找到有趣的基准测试吧。但它是从2001年开始的。在最新版本的JDK中,情况可能会发生巨大变化
6 个解决方案
#1
43
Single-threaded code will still run slower when using synchronized
blocks. Obviously you will not have other threads stalled while waiting for other threads to finish, however you will have to deal with the other effects of synchronization, namely cache coherency.
使用synchronized块时,单线程代码仍然会运行得更慢。显然,在等待其他线程完成时,您不会停止其他线程,但是您将不得不处理同步的其他影响,即缓存一致性。
Synchronized blocks are not only used for concurrency, but also visibility. Every synchronized block is a memory barrier: the JVM is free to work on variables in registers, instead of main memory, on the assumption that multiple threads will not access that variable. Without synchronization blocks, this data could be stored in a CPU's cache and different threads on different CPUs would not see the same data. By using a synchronization block, you force the JVM to write this data to main memory for visibility to other threads.
同步块不仅用于并发,还用于可见性。每个同步块都是一个内存屏障:JVM可以*地处理寄存器中的变量,而不是主内存,假设多个线程不会访问该变量。没有同步块,这些数据可以存储在CPU的缓存中,不同CPU上的不同线程不会看到相同的数据。通过使用同步块,可以强制JVM将此数据写入主内存,以便可以看到其他线程。
So even though you're free from lock contention, the JVM will still have to do housekeeping in flushing data to main memory.
因此,即使您没有锁定争用,JVM仍然需要在将数据刷新到主内存时进行内务处理。
In addition, this has optimization constraints. The JVM is free to reorder instructions in order to provide optimization: consider a simple example:
另外,这具有优化约束。 JVM可以*重新排序指令以提供优化:考虑一个简单的例子:
foo++;
bar++;
versus:
foo++;
synchronized(obj)
{
bar++;
}
In the first example, the compiler is free to load foo
and bar
at the same time, then increment them both, then save them both. In the second example, the compiler must perform the load/add/save on foo
, then perform the load/add/save on bar
. Thus, synchronization may impact the ability of the JRE to optimize instructions.
在第一个例子中,编译器可以同时加载foo和bar,然后将它们相加,然后将它们保存。在第二个示例中,编译器必须在foo上执行load / add / save,然后在bar上执行load / add / save。因此,同步可能会影响JRE优化指令的能力。
(An excellent book on the Java Memory Model is Brian Goetz's Java Concurrency In Practice.)
(关于Java内存模型的优秀书籍是Brian Goetz的Java Concurrency In Practice。)
#2
28
There are 3 type of locking in HotSpot
HotSpot中有3种类型的锁定
- Fat: JVM relies on OS mutexes to acquire lock.
- Thin: JVM is using CAS algorithm.
- Biased: CAS is rather expensive operation on some of the architecture. Biased locking - is special type of locking optimized for scenario when only one thread is working on object.
胖:JVM依赖OS互斥锁来获取锁。
Thin:JVM正在使用CAS算法。
有偏见:CAS在一些架构上运行起来相当昂贵。偏向锁定 - 是特殊类型的锁定,针对只有一个线程处理对象的情况进行了优化。
By default JVM uses thin locking. Later if JVM determines that there is no contention thin locking is converted to biased locking. Operation that changes type of the lock is rather expensive, hence JVM does not apply this optimization immediately. There is special JVM option - XX:BiasedLockingStartupDelay=delay which tells JVM when this kind of optimization should be applied.
默认情况下,JVM使用精简锁定。稍后如果JVM确定没有争用,则将瘦锁定转换为偏置锁定。更改锁类型的操作相当昂贵,因此JVM不会立即应用此优化。有一个特殊的JVM选项--XX:BiasedLockingStartupDelay = delay,告诉JVM何时应该应用这种优化。
Once biased, that thread can subsequently lock and unlock the object without resorting to expensive atomic instructions.
一旦有偏差,该线程随后可以锁定和解锁对象,而无需借助昂贵的原子指令。
Answer to the question: it depends. But if biased, the single threaded code with locking and without locking has average same performance.
回答问题:这取决于。但是如果有偏见,具有锁定和无锁定的单线程代码具有平均相同的性能。
- Biased Locking in HotSpot - Dave Dice's Weblog
- Synchronization and Object Locking - Thomas Kotzmann and Christian Wimmer
HotSpot中有偏见的锁定 - Dave Dice的博客
同步和对象锁定 - Thomas Kotzmann和Christian Wimmer
#3
18
There is some overhead in acquiring a non-contested lock, but on modern JVMs it is very small.
获取无争议的锁会有一些开销,但在现代JVM上,它非常小。
A key run-time optimization that's relevant to this case is called "Biased Locking" and is explained in the Java SE 6 Performance White Paper.
与此案例相关的关键运行时优化称为“偏置锁定”,并在Java SE 6性能白皮书中进行了解释。
If you wanted to have some performance numbers that are relevant to your JVM and hardware, you could construct a micro-benchmark to try and measure this overhead.
如果您想要一些与您的JVM和硬件相关的性能数字,您可以构建一个微基准测试来测试这种开销。
#4
8
Using locks when you don't need to will slow down your application. It could be too small to measure or it could be surprisingly high.
在不需要时使用锁定会降低应用程序的速度。它可能太小而无法测量,或者可能会出乎意料地高。
IMHO Often the best approach is to use lock free code in a single threaded program to make it clear this code is not intended to be shared across thread. This could be more important for maintenance than any performance issues.
恕我直言通常最好的方法是在单线程程序中使用无锁代码,以明确此代码不打算跨线程共享。对于维护而言,这可能比任何性能问题更重要。
public static void main(String... args) throws IOException {
for (int i = 0; i < 3; i++) {
perfTest(new Vector<Integer>());
perfTest(new ArrayList<Integer>());
}
}
private static void perfTest(List<Integer> objects) {
long start = System.nanoTime();
final int runs = 100000000;
for (int i = 0; i < runs; i += 20) {
// add items.
for (int j = 0; j < 20; j+=2)
objects.add(i);
// remove from the end.
while (!objects.isEmpty())
objects.remove(objects.size() - 1);
}
long time = System.nanoTime() - start;
System.out.printf("%s each add/remove took an average of %.1f ns%n", objects.getClass().getSimpleName(), (double) time/runs);
}
prints
Vector each add/remove took an average of 38.9 ns
ArrayList each add/remove took an average of 6.4 ns
Vector each add/remove took an average of 10.5 ns
ArrayList each add/remove took an average of 6.2 ns
Vector each add/remove took an average of 10.4 ns
ArrayList each add/remove took an average of 5.7 ns
From a performance point of view, if 4 ns is important to you, you have to use the non-synchronized version.
从性能的角度来看,如果4 ns对您很重要,则必须使用非同步版本。
For 99% of use cases, the clarity of the code is more important than performance. Clear, simple code often performs reasonably good as well.
对于99%的用例,代码的清晰度比性能更重要。清晰,简单的代码通常也表现得相当不错。
BTW: I am using a 4.6 GHz i7 2600 with Oracle Java 7u1.
BTW:我使用4.6 GHz i7 2600和Oracle Java 7u1。
For comparison if I do the following where perfTest1,2,3 are identical.
为了进行比较,如果我执行以下操作,其中perfTest1,2,3是相同的。
perfTest1(new ArrayList<Integer>());
perfTest2(new Vector<Integer>());
perfTest3(Collections.synchronizedList(new ArrayList<Integer>()));
I get
ArrayList each add/remove took an average of 2.6 ns
Vector each add/remove took an average of 7.5 ns
SynchronizedRandomAccessList each add/remove took an average of 8.9 ns
If I use a common perfTest
method it cannot inline the code as optimally and they are all slower
如果我使用常见的perfTest方法,它不能以最佳方式内联代码并且它们都慢
ArrayList each add/remove took an average of 9.3 ns
Vector each add/remove took an average of 12.4 ns
SynchronizedRandomAccessList each add/remove took an average of 13.9 ns
Swapping the order of tests
交换测试顺序
ArrayList each add/remove took an average of 3.0 ns
Vector each add/remove took an average of 39.7 ns
ArrayList each add/remove took an average of 2.0 ns
Vector each add/remove took an average of 4.6 ns
ArrayList each add/remove took an average of 2.3 ns
Vector each add/remove took an average of 4.5 ns
ArrayList each add/remove took an average of 2.3 ns
Vector each add/remove took an average of 4.4 ns
ArrayList each add/remove took an average of 2.4 ns
Vector each add/remove took an average of 4.6 ns
one at a time
一次一个
ArrayList each add/remove took an average of 3.0 ns
ArrayList each add/remove took an average of 3.0 ns
ArrayList each add/remove took an average of 2.3 ns
ArrayList each add/remove took an average of 2.2 ns
ArrayList each add/remove took an average of 2.4 ns
and
Vector each add/remove took an average of 28.4 ns
Vector each add/remove took an average of 37.4 ns
Vector each add/remove took an average of 7.6 ns
Vector each add/remove took an average of 7.6 ns
Vector each add/remove took an average of 7.6 ns
#5
0
This sample code (with 100 threads making 1,000,000 iterations each one) demonstrates the performance difference between avoiding and not avoiding a synchronized block.
此示例代码(100个线程每个进行1,000,000次迭代)演示了避免和不避免同步块之间的性能差异。
Output:
Total time(Avoid Sync Block): 630ms
Total time(NOT Avoid Sync Block): 6360ms
Total time(Avoid Sync Block): 427ms
Total time(NOT Avoid Sync Block): 6636ms
Total time(Avoid Sync Block): 481ms
Total time(NOT Avoid Sync Block): 5882ms
Code:
import org.apache.commons.lang.time.StopWatch;
public class App {
public static int countTheads = 100;
public static int loopsPerThead = 1000000;
public static int sleepOfFirst = 10;
public static int runningCount = 0;
public static Boolean flagSync = null;
public static void main( String[] args )
{
for (int j = 0; j < 3; j++) {
App.startAll(new App.AvoidSyncBlockRunner(), "(Avoid Sync Block)");
App.startAll(new App.NotAvoidSyncBlockRunner(), "(NOT Avoid Sync Block)");
}
}
public static void startAll(Runnable runnable, String description) {
App.runningCount = 0;
App.flagSync = null;
Thread[] threads = new Thread[App.countTheads];
StopWatch sw = new StopWatch();
sw.start();
for (int i = 0; i < threads.length; i++) {
threads[i] = new Thread(runnable);
}
for (int i = 0; i < threads.length; i++) {
threads[i].start();
}
do {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
e.printStackTrace();
}
} while (runningCount != 0);
System.out.println("Total time"+description+": " + (sw.getTime() - App.sleepOfFirst) + "ms");
}
public static void commonBlock() {
String a = "foo";
a += "Baa";
}
public static synchronized void incrementCountRunning(int inc) {
runningCount = runningCount + inc;
}
public static class NotAvoidSyncBlockRunner implements Runnable {
public void run() {
App.incrementCountRunning(1);
for (int i = 0; i < App.loopsPerThead; i++) {
synchronized (App.class) {
if (App.flagSync == null) {
try {
Thread.sleep(App.sleepOfFirst);
} catch (InterruptedException e) {
e.printStackTrace();
}
App.flagSync = true;
}
}
App.commonBlock();
}
App.incrementCountRunning(-1);
}
}
public static class AvoidSyncBlockRunner implements Runnable {
public void run() {
App.incrementCountRunning(1);
for (int i = 0; i < App.loopsPerThead; i++) {
// THIS "IF" MAY SEEM POINTLESS, BUT IT AVOIDS THE NEXT
//ITERATION OF ENTERING INTO THE SYNCHRONIZED BLOCK
if (App.flagSync == null) {
synchronized (App.class) {
if (App.flagSync == null) {
try {
Thread.sleep(App.sleepOfFirst);
} catch (InterruptedException e) {
e.printStackTrace();
}
App.flagSync = true;
}
}
}
App.commonBlock();
}
App.incrementCountRunning(-1);
}
}
}
#6
-1
Assuming you're using the HotSpot VM, I believe the JVM is able to recognize that there is no contention for any resources within the synchronized
block and treat it as "normal" code.
假设您正在使用HotSpot VM,我相信JVM能够识别出同步块中的任何资源都没有争用,并将其视为“普通”代码。
#1
43
Single-threaded code will still run slower when using synchronized
blocks. Obviously you will not have other threads stalled while waiting for other threads to finish, however you will have to deal with the other effects of synchronization, namely cache coherency.
使用synchronized块时,单线程代码仍然会运行得更慢。显然,在等待其他线程完成时,您不会停止其他线程,但是您将不得不处理同步的其他影响,即缓存一致性。
Synchronized blocks are not only used for concurrency, but also visibility. Every synchronized block is a memory barrier: the JVM is free to work on variables in registers, instead of main memory, on the assumption that multiple threads will not access that variable. Without synchronization blocks, this data could be stored in a CPU's cache and different threads on different CPUs would not see the same data. By using a synchronization block, you force the JVM to write this data to main memory for visibility to other threads.
同步块不仅用于并发,还用于可见性。每个同步块都是一个内存屏障:JVM可以*地处理寄存器中的变量,而不是主内存,假设多个线程不会访问该变量。没有同步块,这些数据可以存储在CPU的缓存中,不同CPU上的不同线程不会看到相同的数据。通过使用同步块,可以强制JVM将此数据写入主内存,以便可以看到其他线程。
So even though you're free from lock contention, the JVM will still have to do housekeeping in flushing data to main memory.
因此,即使您没有锁定争用,JVM仍然需要在将数据刷新到主内存时进行内务处理。
In addition, this has optimization constraints. The JVM is free to reorder instructions in order to provide optimization: consider a simple example:
另外,这具有优化约束。 JVM可以*重新排序指令以提供优化:考虑一个简单的例子:
foo++;
bar++;
versus:
foo++;
synchronized(obj)
{
bar++;
}
In the first example, the compiler is free to load foo
and bar
at the same time, then increment them both, then save them both. In the second example, the compiler must perform the load/add/save on foo
, then perform the load/add/save on bar
. Thus, synchronization may impact the ability of the JRE to optimize instructions.
在第一个例子中,编译器可以同时加载foo和bar,然后将它们相加,然后将它们保存。在第二个示例中,编译器必须在foo上执行load / add / save,然后在bar上执行load / add / save。因此,同步可能会影响JRE优化指令的能力。
(An excellent book on the Java Memory Model is Brian Goetz's Java Concurrency In Practice.)
(关于Java内存模型的优秀书籍是Brian Goetz的Java Concurrency In Practice。)
#2
28
There are 3 type of locking in HotSpot
HotSpot中有3种类型的锁定
- Fat: JVM relies on OS mutexes to acquire lock.
- Thin: JVM is using CAS algorithm.
- Biased: CAS is rather expensive operation on some of the architecture. Biased locking - is special type of locking optimized for scenario when only one thread is working on object.
胖:JVM依赖OS互斥锁来获取锁。
Thin:JVM正在使用CAS算法。
有偏见:CAS在一些架构上运行起来相当昂贵。偏向锁定 - 是特殊类型的锁定,针对只有一个线程处理对象的情况进行了优化。
By default JVM uses thin locking. Later if JVM determines that there is no contention thin locking is converted to biased locking. Operation that changes type of the lock is rather expensive, hence JVM does not apply this optimization immediately. There is special JVM option - XX:BiasedLockingStartupDelay=delay which tells JVM when this kind of optimization should be applied.
默认情况下,JVM使用精简锁定。稍后如果JVM确定没有争用,则将瘦锁定转换为偏置锁定。更改锁类型的操作相当昂贵,因此JVM不会立即应用此优化。有一个特殊的JVM选项--XX:BiasedLockingStartupDelay = delay,告诉JVM何时应该应用这种优化。
Once biased, that thread can subsequently lock and unlock the object without resorting to expensive atomic instructions.
一旦有偏差,该线程随后可以锁定和解锁对象,而无需借助昂贵的原子指令。
Answer to the question: it depends. But if biased, the single threaded code with locking and without locking has average same performance.
回答问题:这取决于。但是如果有偏见,具有锁定和无锁定的单线程代码具有平均相同的性能。
- Biased Locking in HotSpot - Dave Dice's Weblog
- Synchronization and Object Locking - Thomas Kotzmann and Christian Wimmer
HotSpot中有偏见的锁定 - Dave Dice的博客
同步和对象锁定 - Thomas Kotzmann和Christian Wimmer
#3
18
There is some overhead in acquiring a non-contested lock, but on modern JVMs it is very small.
获取无争议的锁会有一些开销,但在现代JVM上,它非常小。
A key run-time optimization that's relevant to this case is called "Biased Locking" and is explained in the Java SE 6 Performance White Paper.
与此案例相关的关键运行时优化称为“偏置锁定”,并在Java SE 6性能白皮书中进行了解释。
If you wanted to have some performance numbers that are relevant to your JVM and hardware, you could construct a micro-benchmark to try and measure this overhead.
如果您想要一些与您的JVM和硬件相关的性能数字,您可以构建一个微基准测试来测试这种开销。
#4
8
Using locks when you don't need to will slow down your application. It could be too small to measure or it could be surprisingly high.
在不需要时使用锁定会降低应用程序的速度。它可能太小而无法测量,或者可能会出乎意料地高。
IMHO Often the best approach is to use lock free code in a single threaded program to make it clear this code is not intended to be shared across thread. This could be more important for maintenance than any performance issues.
恕我直言通常最好的方法是在单线程程序中使用无锁代码,以明确此代码不打算跨线程共享。对于维护而言,这可能比任何性能问题更重要。
public static void main(String... args) throws IOException {
for (int i = 0; i < 3; i++) {
perfTest(new Vector<Integer>());
perfTest(new ArrayList<Integer>());
}
}
private static void perfTest(List<Integer> objects) {
long start = System.nanoTime();
final int runs = 100000000;
for (int i = 0; i < runs; i += 20) {
// add items.
for (int j = 0; j < 20; j+=2)
objects.add(i);
// remove from the end.
while (!objects.isEmpty())
objects.remove(objects.size() - 1);
}
long time = System.nanoTime() - start;
System.out.printf("%s each add/remove took an average of %.1f ns%n", objects.getClass().getSimpleName(), (double) time/runs);
}
prints
Vector each add/remove took an average of 38.9 ns
ArrayList each add/remove took an average of 6.4 ns
Vector each add/remove took an average of 10.5 ns
ArrayList each add/remove took an average of 6.2 ns
Vector each add/remove took an average of 10.4 ns
ArrayList each add/remove took an average of 5.7 ns
From a performance point of view, if 4 ns is important to you, you have to use the non-synchronized version.
从性能的角度来看,如果4 ns对您很重要,则必须使用非同步版本。
For 99% of use cases, the clarity of the code is more important than performance. Clear, simple code often performs reasonably good as well.
对于99%的用例,代码的清晰度比性能更重要。清晰,简单的代码通常也表现得相当不错。
BTW: I am using a 4.6 GHz i7 2600 with Oracle Java 7u1.
BTW:我使用4.6 GHz i7 2600和Oracle Java 7u1。
For comparison if I do the following where perfTest1,2,3 are identical.
为了进行比较,如果我执行以下操作,其中perfTest1,2,3是相同的。
perfTest1(new ArrayList<Integer>());
perfTest2(new Vector<Integer>());
perfTest3(Collections.synchronizedList(new ArrayList<Integer>()));
I get
ArrayList each add/remove took an average of 2.6 ns
Vector each add/remove took an average of 7.5 ns
SynchronizedRandomAccessList each add/remove took an average of 8.9 ns
If I use a common perfTest
method it cannot inline the code as optimally and they are all slower
如果我使用常见的perfTest方法,它不能以最佳方式内联代码并且它们都慢
ArrayList each add/remove took an average of 9.3 ns
Vector each add/remove took an average of 12.4 ns
SynchronizedRandomAccessList each add/remove took an average of 13.9 ns
Swapping the order of tests
交换测试顺序
ArrayList each add/remove took an average of 3.0 ns
Vector each add/remove took an average of 39.7 ns
ArrayList each add/remove took an average of 2.0 ns
Vector each add/remove took an average of 4.6 ns
ArrayList each add/remove took an average of 2.3 ns
Vector each add/remove took an average of 4.5 ns
ArrayList each add/remove took an average of 2.3 ns
Vector each add/remove took an average of 4.4 ns
ArrayList each add/remove took an average of 2.4 ns
Vector each add/remove took an average of 4.6 ns
one at a time
一次一个
ArrayList each add/remove took an average of 3.0 ns
ArrayList each add/remove took an average of 3.0 ns
ArrayList each add/remove took an average of 2.3 ns
ArrayList each add/remove took an average of 2.2 ns
ArrayList each add/remove took an average of 2.4 ns
and
Vector each add/remove took an average of 28.4 ns
Vector each add/remove took an average of 37.4 ns
Vector each add/remove took an average of 7.6 ns
Vector each add/remove took an average of 7.6 ns
Vector each add/remove took an average of 7.6 ns
#5
0
This sample code (with 100 threads making 1,000,000 iterations each one) demonstrates the performance difference between avoiding and not avoiding a synchronized block.
此示例代码(100个线程每个进行1,000,000次迭代)演示了避免和不避免同步块之间的性能差异。
Output:
Total time(Avoid Sync Block): 630ms
Total time(NOT Avoid Sync Block): 6360ms
Total time(Avoid Sync Block): 427ms
Total time(NOT Avoid Sync Block): 6636ms
Total time(Avoid Sync Block): 481ms
Total time(NOT Avoid Sync Block): 5882ms
Code:
import org.apache.commons.lang.time.StopWatch;
public class App {
public static int countTheads = 100;
public static int loopsPerThead = 1000000;
public static int sleepOfFirst = 10;
public static int runningCount = 0;
public static Boolean flagSync = null;
public static void main( String[] args )
{
for (int j = 0; j < 3; j++) {
App.startAll(new App.AvoidSyncBlockRunner(), "(Avoid Sync Block)");
App.startAll(new App.NotAvoidSyncBlockRunner(), "(NOT Avoid Sync Block)");
}
}
public static void startAll(Runnable runnable, String description) {
App.runningCount = 0;
App.flagSync = null;
Thread[] threads = new Thread[App.countTheads];
StopWatch sw = new StopWatch();
sw.start();
for (int i = 0; i < threads.length; i++) {
threads[i] = new Thread(runnable);
}
for (int i = 0; i < threads.length; i++) {
threads[i].start();
}
do {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
e.printStackTrace();
}
} while (runningCount != 0);
System.out.println("Total time"+description+": " + (sw.getTime() - App.sleepOfFirst) + "ms");
}
public static void commonBlock() {
String a = "foo";
a += "Baa";
}
public static synchronized void incrementCountRunning(int inc) {
runningCount = runningCount + inc;
}
public static class NotAvoidSyncBlockRunner implements Runnable {
public void run() {
App.incrementCountRunning(1);
for (int i = 0; i < App.loopsPerThead; i++) {
synchronized (App.class) {
if (App.flagSync == null) {
try {
Thread.sleep(App.sleepOfFirst);
} catch (InterruptedException e) {
e.printStackTrace();
}
App.flagSync = true;
}
}
App.commonBlock();
}
App.incrementCountRunning(-1);
}
}
public static class AvoidSyncBlockRunner implements Runnable {
public void run() {
App.incrementCountRunning(1);
for (int i = 0; i < App.loopsPerThead; i++) {
// THIS "IF" MAY SEEM POINTLESS, BUT IT AVOIDS THE NEXT
//ITERATION OF ENTERING INTO THE SYNCHRONIZED BLOCK
if (App.flagSync == null) {
synchronized (App.class) {
if (App.flagSync == null) {
try {
Thread.sleep(App.sleepOfFirst);
} catch (InterruptedException e) {
e.printStackTrace();
}
App.flagSync = true;
}
}
}
App.commonBlock();
}
App.incrementCountRunning(-1);
}
}
}
#6
-1
Assuming you're using the HotSpot VM, I believe the JVM is able to recognize that there is no contention for any resources within the synchronized
block and treat it as "normal" code.
假设您正在使用HotSpot VM,我相信JVM能够识别出同步块中的任何资源都没有争用,并将其视为“普通”代码。