
时间:2021-03-20 17:21:15

How do you write (and run) a correct micro-benchmark in Java?


I'm looking here for code samples and comments illustrating various things to think about.


Example: Should the benchmark measure time/iteration or iterations/time, and why?


Related: Is stopwatch benchmarking acceptable?


11 个解决方案



Tips about writing micro benchmarks from the creators of Java HotSpot:

关于编写Java HotSpot的微基准的提示:

Rule 0: Read a reputable paper on JVMs and micro-benchmarking. A good one is Brian Goetz, 2005. Do not expect too much from micro-benchmarks; they measure only a limited range of JVM performance characteristics.

规则0:阅读关于jvm和微基准测试的著名论文。一个很好的例子是Brian Goetz, 2005年。不要对微观基准期望过高;它们只度量有限范围的JVM性能特征。

Rule 1: Always include a warmup phase which runs your test kernel all the way through, enough to trigger all initializations and compilations before timing phase(s). (Fewer iterations is OK on the warmup phase. The rule of thumb is several tens of thousands of inner loop iterations.)


Rule 2: Always run with -XX:+PrintCompilation, -verbose:gc, etc., so you can verify that the compiler and other parts of the JVM are not doing unexpected work during your timing phase.

规则2:始终使用-XX:+ printcompile, -verbose:gc等来运行,这样您就可以验证编译器和JVM的其他部分在计时阶段没有执行意外的工作。

Rule 2.1: Print messages at the beginning and end of timing and warmup phases, so you can verify that there is no output from Rule 2 during the timing phase.


Rule 3: Be aware of the difference between -client and -server, and OSR and regular compilations. The -XX:+PrintCompilation flag reports OSR compilations with an at-sign to denote the non-initial entry point, for example: Trouble$1::run @ 2 (41 bytes). Prefer server to client, and regular to OSR, if you are after best performance.

规则3:注意-client和-server、OSR和常规编译之间的区别。-XX:+ print编译标记报告OSR编译的at-sign表示非初始入口点,例如:麻烦$1::run @ 2(41字节)。与客户端相比,更喜欢服务器端,如果您追求最好的性能,则更喜欢常规的OSR。

Rule 4: Be aware of initialization effects. Do not print for the first time during your timing phase, since printing loads and initializes classes. Do not load new classes outside of the warmup phase (or final reporting phase), unless you are testing class loading specifically (and in that case load only the test classes). Rule 2 is your first line of defense against such effects.


Rule 5: Be aware of deoptimization and recompilation effects. Do not take any code path for the first time in the timing phase, because the compiler may junk and recompile the code, based on an earlier optimistic assumption that the path was not going to be used at all. Rule 2 is your first line of defense against such effects.


Rule 6: Use appropriate tools to read the compiler's mind, and expect to be surprised by the code it produces. Inspect the code yourself before forming theories about what makes something faster or slower.


Rule 7: Reduce noise in your measurements. Run your benchmark on a quiet machine, and run it several times, discarding outliers. Use -Xbatch to serialize the compiler with the application, and consider setting -XX:CICompilerCount=1 to prevent the compiler from running in parallel with itself.


Rule 8: Use a library for your benchmark as it is probably more efficient and was already debugged for this sole purpose. Such as JMH, Caliper or Bill and Paul's Excellent UCSD Benchmarks for Java.

规则8:使用库作为基准测试,因为它可能更有效,并且已经为这个目的进行了调试。例如JMH、Caliper或Bill以及Paul的优秀Java UCSD基准。



I know this question has been marked as answered but I wanted to mention two libraries that enable us to write micro benchmarks


Caliper from Google


Getting started tutorials


  1. http://codingjunkie.net/micro-benchmarking-with-caliper/
  2. http://codingjunkie.net/micro-benchmarking-with-caliper/
  3. http://vertexlabs.co.uk/blog/caliper
  4. http://vertexlabs.co.uk/blog/caliper

JMH from OpenJDK


Getting started tutorials


  1. Avoiding Benchmarking Pitfalls on the JVM
  2. 避免在JVM上进行基准测试
  3. http://nitschinger.at/Using-JMH-for-Java-Microbenchmarking
  4. http://nitschinger.at/Using-JMH-for-Java-Microbenchmarking
  5. http://java-performance.info/jmh/
  6. http://java-performance.info/jmh/



Important things for Java benchmarks are:


  • Warm up the JIT first by running the code several times before timing it
  • 首先通过运行代码几次来预热JIT,然后再计时
  • Make sure you run it for long enough to be able to measure the results in seconds or (better) tens of seconds
  • 确保你运行它的时间足够长,能够以秒或(更好的)数十秒来度量结果
  • While you can't call System.gc() between iterations, it's a good idea to run it between tests, so that each test will hopefully get a "clean" memory space to work with. (Yes, gc() is more of a hint than a guarantee, but it's very likely that it really will garbage collect in my experience.)
  • 虽然在迭代之间不能调用System.gc(),但是在测试之间运行它是一个好主意,这样每个测试就有希望获得一个“干净”的内存空间。(是的,gc()更多的是一种提示,而不是保证,但根据我的经验,它很可能真的会垃圾收集。)
  • I like to display iterations and time, and a score of time/iteration which can be scaled such that the "best" algorithm gets a score of 1.0 and others are scored in a relative fashion. This means you can run all algorithms for a longish time, varying both number of iterations and time, but still getting comparable results.
  • 我喜欢显示迭代和时间,以及可以缩放的时间/迭代,以便“最佳”算法的得分为1.0,而其他算法的得分是相对的。这意味着您可以在较长的时间内运行所有算法,改变迭代次数和时间,但仍然可以得到类似的结果。

I'm just in the process of blogging about the design of a benchmarking framework in .NET. I've got a couple of earlier posts which may be able to give you some ideas - not everything will be appropriate, of course, but some of it may be.




jmh is a recent addition to OpenJDK and has been written by some performance engineers from Oracle. Certainly worth having a look.


The jmh is a Java harness for building, running, and analysing nano/micro/macro benchmarks written in Java and other languages targetting the JVM.


Very interesting pieces of information buried in the sample tests comments.


See also:




Should the benchmark measure time/iteration or iterations/time, and why?


It depends on what you are trying to test. If you are interested in latency, use time/iteration and if you are interested in throughput use iterations/time.




Make sure you somehow use results which are computed in benchmarked code. Otherwise your code can be optimized away.




If you are trying to compare two algorithms, do at least two benchmarks on each, alternating the order. i.e.:



I have found some noticeable differences (5-10% sometimes) in the runtime of the same algorithm in different passes..


Also, make sure that n is very large, so that the runtime of each loop is at the very least 10 seconds or so. The more iterations, the more significant figures in your benchmark time and the more reliable that data is.




There are many possible pitfalls for writing micro-benchmarks in Java.


First: You have to calculate with all sorts of events that take time more or less random: Garbage collection, caching effects (of OS for files and of CPU for memory), IO etc.


Second: You cannot trust the accuracy of the measured times for very short intervals.


Third: The JVM optimizes your code while executing. So different runs in the same JVM-instance will become faster and faster.


My recommendations: Make your benchmark run some seconds, that is more reliable than a runtime over milliseconds. Warm up the JVM (means running the benchmark at least once without measuring, that the JVM can run optimizations). And run your benchmark multiple times (maybe 5 times) and take the median-value. Run every micro-benchmark in a new JVM-instance (call for every benchmark new Java) otherwise optimization effects of the JVM can influence later running tests. Don't execute things, that aren't executed in the warmup-phase (as this could trigger class-load and recompilation).




It should also be noted that it might also be important to analyze the results of the micro benchmark when comparing different implementations. Therefore a significance test should be made.


This is because implementation A might be faster during most of the runs of the benchmark than implementation B. But A might also have a higher spread, so the measured performance benefit of A won't be of any significance when compared with B.


So it is also important to write and run a micro benchmark correctly, but also to analyze it correctly.




http://opt.sourceforge.net/ Java Micro Benchmark - control tasks required to determine the comparative performance characteristics of the computer system on different platforms. Can be used to guide optimization decisions and to compare different Java implementations.




To add to the other excellent advice, I'd also be mindful of the following:


For some CPUs (e.g. Intel Core i5 range with TurboBoost), the temperature (and number of cores currently being used, as well as thier utilisation percent) affects the clock speed. Since CPUs are dynamically clocked, this can affect your results. For example, if you have a single-threaded application, the maximum clock speed (with TurboBoost) is higher than for an application using all cores. This can therefore interfere with comparisons of single and multi-threaded performance on some systems. Bear in mind that the temperature and volatages also affect how long Turbo frequency is maintained.

对于一些cpu(如Intel Core i5带TurboBoost),温度(当前正在使用的内核数量以及它们的利用率)会影响时钟的速度。因为cpu是动态的,所以这会影响结果。例如,如果您有一个单线程应用程序,那么最大的时钟速度(使用TurboBoost)要比使用所有内核的应用程序高。因此,这可能会影响在某些系统上对单线程和多线程性能的比较。记住,温度和电压也会影响涡轮频率保持的时间。

Perhaps a more fundamentally important aspect that you have direct control over: make sure you're measuring the right thing! For example, if you're using System.nanoTime() to benchmark a particular bit of code, put the calls to the assignment in places that make sense to avoid measuring things which you aren't interested in. For example, don't do:


long startTime = System.nanoTime();
//code here...
System.out.println("Code took "+(System.nanoTime()-startTime)+"nano seconds");

Problem is you're not immediately getting the end time when the code has finished. Instead, try the following:


final long endTime, startTime = System.nanoTime();
//code here...
endTime = System.nanoTime();
System.out.println("Code took "+(endTime-startTime)+"nano seconds");



Tips about writing micro benchmarks from the creators of Java HotSpot:

关于编写Java HotSpot的微基准的提示:

Rule 0: Read a reputable paper on JVMs and micro-benchmarking. A good one is Brian Goetz, 2005. Do not expect too much from micro-benchmarks; they measure only a limited range of JVM performance characteristics.

规则0:阅读关于jvm和微基准测试的著名论文。一个很好的例子是Brian Goetz, 2005年。不要对微观基准期望过高;它们只度量有限范围的JVM性能特征。

Rule 1: Always include a warmup phase which runs your test kernel all the way through, enough to trigger all initializations and compilations before timing phase(s). (Fewer iterations is OK on the warmup phase. The rule of thumb is several tens of thousands of inner loop iterations.)


Rule 2: Always run with -XX:+PrintCompilation, -verbose:gc, etc., so you can verify that the compiler and other parts of the JVM are not doing unexpected work during your timing phase.

规则2:始终使用-XX:+ printcompile, -verbose:gc等来运行,这样您就可以验证编译器和JVM的其他部分在计时阶段没有执行意外的工作。

Rule 2.1: Print messages at the beginning and end of timing and warmup phases, so you can verify that there is no output from Rule 2 during the timing phase.


Rule 3: Be aware of the difference between -client and -server, and OSR and regular compilations. The -XX:+PrintCompilation flag reports OSR compilations with an at-sign to denote the non-initial entry point, for example: Trouble$1::run @ 2 (41 bytes). Prefer server to client, and regular to OSR, if you are after best performance.

规则3:注意-client和-server、OSR和常规编译之间的区别。-XX:+ print编译标记报告OSR编译的at-sign表示非初始入口点,例如:麻烦$1::run @ 2(41字节)。与客户端相比,更喜欢服务器端,如果您追求最好的性能,则更喜欢常规的OSR。

Rule 4: Be aware of initialization effects. Do not print for the first time during your timing phase, since printing loads and initializes classes. Do not load new classes outside of the warmup phase (or final reporting phase), unless you are testing class loading specifically (and in that case load only the test classes). Rule 2 is your first line of defense against such effects.


Rule 5: Be aware of deoptimization and recompilation effects. Do not take any code path for the first time in the timing phase, because the compiler may junk and recompile the code, based on an earlier optimistic assumption that the path was not going to be used at all. Rule 2 is your first line of defense against such effects.


Rule 6: Use appropriate tools to read the compiler's mind, and expect to be surprised by the code it produces. Inspect the code yourself before forming theories about what makes something faster or slower.


Rule 7: Reduce noise in your measurements. Run your benchmark on a quiet machine, and run it several times, discarding outliers. Use -Xbatch to serialize the compiler with the application, and consider setting -XX:CICompilerCount=1 to prevent the compiler from running in parallel with itself.


Rule 8: Use a library for your benchmark as it is probably more efficient and was already debugged for this sole purpose. Such as JMH, Caliper or Bill and Paul's Excellent UCSD Benchmarks for Java.

规则8:使用库作为基准测试,因为它可能更有效,并且已经为这个目的进行了调试。例如JMH、Caliper或Bill以及Paul的优秀Java UCSD基准。



I know this question has been marked as answered but I wanted to mention two libraries that enable us to write micro benchmarks


Caliper from Google


Getting started tutorials


  1. http://codingjunkie.net/micro-benchmarking-with-caliper/
  2. http://codingjunkie.net/micro-benchmarking-with-caliper/
  3. http://vertexlabs.co.uk/blog/caliper
  4. http://vertexlabs.co.uk/blog/caliper

JMH from OpenJDK


Getting started tutorials


  1. Avoiding Benchmarking Pitfalls on the JVM
  2. 避免在JVM上进行基准测试
  3. http://nitschinger.at/Using-JMH-for-Java-Microbenchmarking
  4. http://nitschinger.at/Using-JMH-for-Java-Microbenchmarking
  5. http://java-performance.info/jmh/
  6. http://java-performance.info/jmh/



Important things for Java benchmarks are:


  • Warm up the JIT first by running the code several times before timing it
  • 首先通过运行代码几次来预热JIT,然后再计时
  • Make sure you run it for long enough to be able to measure the results in seconds or (better) tens of seconds
  • 确保你运行它的时间足够长,能够以秒或(更好的)数十秒来度量结果
  • While you can't call System.gc() between iterations, it's a good idea to run it between tests, so that each test will hopefully get a "clean" memory space to work with. (Yes, gc() is more of a hint than a guarantee, but it's very likely that it really will garbage collect in my experience.)
  • 虽然在迭代之间不能调用System.gc(),但是在测试之间运行它是一个好主意,这样每个测试就有希望获得一个“干净”的内存空间。(是的,gc()更多的是一种提示,而不是保证,但根据我的经验,它很可能真的会垃圾收集。)
  • I like to display iterations and time, and a score of time/iteration which can be scaled such that the "best" algorithm gets a score of 1.0 and others are scored in a relative fashion. This means you can run all algorithms for a longish time, varying both number of iterations and time, but still getting comparable results.
  • 我喜欢显示迭代和时间,以及可以缩放的时间/迭代,以便“最佳”算法的得分为1.0,而其他算法的得分是相对的。这意味着您可以在较长的时间内运行所有算法,改变迭代次数和时间,但仍然可以得到类似的结果。

I'm just in the process of blogging about the design of a benchmarking framework in .NET. I've got a couple of earlier posts which may be able to give you some ideas - not everything will be appropriate, of course, but some of it may be.




jmh is a recent addition to OpenJDK and has been written by some performance engineers from Oracle. Certainly worth having a look.


The jmh is a Java harness for building, running, and analysing nano/micro/macro benchmarks written in Java and other languages targetting the JVM.


Very interesting pieces of information buried in the sample tests comments.


See also:




Should the benchmark measure time/iteration or iterations/time, and why?


It depends on what you are trying to test. If you are interested in latency, use time/iteration and if you are interested in throughput use iterations/time.




Make sure you somehow use results which are computed in benchmarked code. Otherwise your code can be optimized away.




If you are trying to compare two algorithms, do at least two benchmarks on each, alternating the order. i.e.:



I have found some noticeable differences (5-10% sometimes) in the runtime of the same algorithm in different passes..


Also, make sure that n is very large, so that the runtime of each loop is at the very least 10 seconds or so. The more iterations, the more significant figures in your benchmark time and the more reliable that data is.




There are many possible pitfalls for writing micro-benchmarks in Java.


First: You have to calculate with all sorts of events that take time more or less random: Garbage collection, caching effects (of OS for files and of CPU for memory), IO etc.


Second: You cannot trust the accuracy of the measured times for very short intervals.


Third: The JVM optimizes your code while executing. So different runs in the same JVM-instance will become faster and faster.


My recommendations: Make your benchmark run some seconds, that is more reliable than a runtime over milliseconds. Warm up the JVM (means running the benchmark at least once without measuring, that the JVM can run optimizations). And run your benchmark multiple times (maybe 5 times) and take the median-value. Run every micro-benchmark in a new JVM-instance (call for every benchmark new Java) otherwise optimization effects of the JVM can influence later running tests. Don't execute things, that aren't executed in the warmup-phase (as this could trigger class-load and recompilation).




It should also be noted that it might also be important to analyze the results of the micro benchmark when comparing different implementations. Therefore a significance test should be made.


This is because implementation A might be faster during most of the runs of the benchmark than implementation B. But A might also have a higher spread, so the measured performance benefit of A won't be of any significance when compared with B.


So it is also important to write and run a micro benchmark correctly, but also to analyze it correctly.




http://opt.sourceforge.net/ Java Micro Benchmark - control tasks required to determine the comparative performance characteristics of the computer system on different platforms. Can be used to guide optimization decisions and to compare different Java implementations.




To add to the other excellent advice, I'd also be mindful of the following:


For some CPUs (e.g. Intel Core i5 range with TurboBoost), the temperature (and number of cores currently being used, as well as thier utilisation percent) affects the clock speed. Since CPUs are dynamically clocked, this can affect your results. For example, if you have a single-threaded application, the maximum clock speed (with TurboBoost) is higher than for an application using all cores. This can therefore interfere with comparisons of single and multi-threaded performance on some systems. Bear in mind that the temperature and volatages also affect how long Turbo frequency is maintained.

对于一些cpu(如Intel Core i5带TurboBoost),温度(当前正在使用的内核数量以及它们的利用率)会影响时钟的速度。因为cpu是动态的,所以这会影响结果。例如,如果您有一个单线程应用程序,那么最大的时钟速度(使用TurboBoost)要比使用所有内核的应用程序高。因此,这可能会影响在某些系统上对单线程和多线程性能的比较。记住,温度和电压也会影响涡轮频率保持的时间。

Perhaps a more fundamentally important aspect that you have direct control over: make sure you're measuring the right thing! For example, if you're using System.nanoTime() to benchmark a particular bit of code, put the calls to the assignment in places that make sense to avoid measuring things which you aren't interested in. For example, don't do:


long startTime = System.nanoTime();
//code here...
System.out.println("Code took "+(System.nanoTime()-startTime)+"nano seconds");

Problem is you're not immediately getting the end time when the code has finished. Instead, try the following:


final long endTime, startTime = System.nanoTime();
//code here...
endTime = System.nanoTime();
System.out.println("Code took "+(endTime-startTime)+"nano seconds");