如何用Java编写正确的微基准?

How do you write (and run) a correct micro-benchmark in Java?

如何在Java中编写(并运行)正确的微基准?

I'm looking here for code samples and comments illustrating various things to think about.

我在这里寻找代码示例和说明各种要考虑的内容的注释。

Example: Should the benchmark measure time/iteration or iterations/time, and why?

示例:基准测试应该度量时间/迭代还是迭代/时间，为什么?

相关信息:秒表基准是否可以接受?

11 个解决方案

#1

645

Tips about writing micro benchmarks from the creators of Java HotSpot:

关于编写Java HotSpot的微基准的提示:

Rule 0: Read a reputable paper on JVMs and micro-benchmarking. A good one is Brian Goetz, 2005. Do not expect too much from micro-benchmarks; they measure only a limited range of JVM performance characteristics.

规则0:阅读关于jvm和微基准测试的著名论文。一个很好的例子是Brian Goetz, 2005年。不要对微观基准期望过高;它们只度量有限范围的JVM性能特征。

Rule 1: Always include a warmup phase which runs your test kernel all the way through, enough to trigger all initializations and compilations before timing phase(s). (Fewer iterations is OK on the warmup phase. The rule of thumb is several tens of thousands of inner loop iterations.)

规则1:始终包含一个热身阶段，该阶段一直运行您的测试内核，足以在计时阶段之前触发所有的初始化和编译。(在热身阶段较少的迭代是可以的。经验法则是有数以万计的内部循环迭代。

Rule 2: Always run with -XX:+PrintCompilation, -verbose:gc, etc., so you can verify that the compiler and other parts of the JVM are not doing unexpected work during your timing phase.

规则2:始终使用-XX:+ printcompile， -verbose:gc等来运行，这样您就可以验证编译器和JVM的其他部分在计时阶段没有执行意外的工作。

Rule 2.1: Print messages at the beginning and end of timing and warmup phases, so you can verify that there is no output from Rule 2 during the timing phase.

规则2.1:在计时和热身阶段的开始和结束阶段打印消息，这样您就可以验证在计时阶段没有来自规则2的输出。

Rule 3: Be aware of the difference between -client and -server, and OSR and regular compilations. The -XX:+PrintCompilation flag reports OSR compilations with an at-sign to denote the non-initial entry point, for example: Trouble$1::run @ 2 (41 bytes). Prefer server to client, and regular to OSR, if you are after best performance.

规则3:注意-client和-server、OSR和常规编译之间的区别。-XX:+ print编译标记报告OSR编译的at-sign表示非初始入口点，例如:麻烦$1::run @ 2(41字节)。与客户端相比，更喜欢服务器端，如果您追求最好的性能，则更喜欢常规的OSR。

Rule 4: Be aware of initialization effects. Do not print for the first time during your timing phase, since printing loads and initializes classes. Do not load new classes outside of the warmup phase (or final reporting phase), unless you are testing class loading specifically (and in that case load only the test classes). Rule 2 is your first line of defense against such effects.

规则4:注意初始化效果。不要在计时阶段第一次打印，因为打印加载和初始化类。不要在预热阶段(或最终报告阶段)之外加载新类，除非您正在专门测试类加载(在这种情况下，只加载测试类)。规则2是你抵御这些影响的第一道防线。

Rule 5: Be aware of deoptimization and recompilation effects. Do not take any code path for the first time in the timing phase, because the compiler may junk and recompile the code, based on an earlier optimistic assumption that the path was not going to be used at all. Rule 2 is your first line of defense against such effects.

规则5:注意优化和重新编译的效果。不要在计时阶段第一次使用任何代码路径，因为编译器可能会基于先前的乐观假设(该路径根本不会被使用)丢弃并重新编译代码。规则2是你抵御这些影响的第一道防线。

Rule 6: Use appropriate tools to read the compiler's mind, and expect to be surprised by the code it produces. Inspect the code yourself before forming theories about what makes something faster or slower.

规则6:使用适当的工具来阅读编译器的思想，并期望对它生成的代码感到惊讶。你自己先检查一下代码，然后再想一下是什么让事情变得更快或更慢。

Rule 7: Reduce noise in your measurements. Run your benchmark on a quiet machine, and run it several times, discarding outliers. Use -Xbatch to serialize the compiler with the application, and consider setting -XX:CICompilerCount=1 to prevent the compiler from running in parallel with itself.

规则7:减少测量中的噪音。在安静的机器上运行基准测试，并运行它几次，丢弃离群值。使用-Xbatch将编译器与应用程序序列化，并考虑设置-XX:CICompilerCount=1，以防止编译器与自身并行运行。

Rule 8: Use a library for your benchmark as it is probably more efficient and was already debugged for this sole purpose. Such as JMH, Caliper or Bill and Paul's Excellent UCSD Benchmarks for Java.

规则8:使用库作为基准测试，因为它可能更有效，并且已经为这个目的进行了调试。例如JMH、Caliper或Bill以及Paul的优秀Java UCSD基准。

#2

206

I know this question has been marked as answered but I wanted to mention two libraries that enable us to write micro benchmarks

我知道这个问题已经得到了回答，但是我想提到两个库，它们使我们能够编写微基准

Caliper from Google

卡尺从谷歌

Getting started tutorials

入门教程

http://codingjunkie.net/micro-benchmarking-with-caliper/
http://codingjunkie.net/micro-benchmarking-with-caliper/
http://vertexlabs.co.uk/blog/caliper
http://vertexlabs.co.uk/blog/caliper

JMH from OpenJDK

从OpenJDK JMH

Getting started tutorials

入门教程

Avoiding Benchmarking Pitfalls on the JVM
避免在JVM上进行基准测试
http://nitschinger.at/Using-JMH-for-Java-Microbenchmarking
http://nitschinger.at/Using-JMH-for-Java-Microbenchmarking
http://java-performance.info/jmh/
http://java-performance.info/jmh/

#3

Important things for Java benchmarks are:

Java基准测试的重要内容是:

Warm up the JIT first by running the code several times before timing it
首先通过运行代码几次来预热JIT，然后再计时
Make sure you run it for long enough to be able to measure the results in seconds or (better) tens of seconds
确保你运行它的时间足够长，能够以秒或(更好的)数十秒来度量结果
While you can't call System.gc() between iterations, it's a good idea to run it between tests, so that each test will hopefully get a "clean" memory space to work with. (Yes, gc() is more of a hint than a guarantee, but it's very likely that it really will garbage collect in my experience.)
虽然在迭代之间不能调用System.gc()，但是在测试之间运行它是一个好主意，这样每个测试就有希望获得一个“干净”的内存空间。(是的，gc()更多的是一种提示，而不是保证，但根据我的经验，它很可能真的会垃圾收集。)
I like to display iterations and time, and a score of time/iteration which can be scaled such that the "best" algorithm gets a score of 1.0 and others are scored in a relative fashion. This means you can run all algorithms for a longish time, varying both number of iterations and time, but still getting comparable results.
我喜欢显示迭代和时间，以及可以缩放的时间/迭代，以便“最佳”算法的得分为1.0，而其他算法的得分是相对的。这意味着您可以在较长的时间内运行所有算法，改变迭代次数和时间，但仍然可以得到类似的结果。

I'm just in the process of blogging about the design of a benchmarking framework in .NET. I've got a couple of earlier posts which may be able to give you some ideas - not everything will be appropriate, of course, but some of it may be.

我正在写关于。net中基准测试框架设计的博客。我有一些早期的文章可以给你一些建议——当然不是所有的事情都合适，但是有些可能是合适的。

#4

jmh is a recent addition to OpenJDK and has been written by some performance engineers from Oracle. Certainly worth having a look.

jmh是OpenJDK的新成员，由Oracle的一些性能工程师编写。当然值得一看。

The jmh is a Java harness for building, running, and analysing nano/micro/macro benchmarks written in Java and other languages targetting the JVM.

jmh是一个Java工具，用于构建、运行和分析用Java和其他语言编写的、影响JVM的nano/micro/宏基准。

Very interesting pieces of information buried in the sample tests comments.

非常有趣的信息隐藏在示例测试注释中。

#5

Should the benchmark measure time/iteration or iterations/time, and why?

基准测试是否应该度量时间/迭代或迭代/时间，为什么?

It depends on what you are trying to test. If you are interested in latency, use time/iteration and if you are interested in throughput use iterations/time.

这取决于你要测试什么。如果您对延迟感兴趣，请使用时间/迭代;如果您对吞吐量使用迭代/时间感兴趣，请使用时间/迭代。

#6

Make sure you somehow use results which are computed in benchmarked code. Otherwise your code can be optimized away.

确保以某种方式使用在基准测试代码中计算的结果。否则你的代码就会被优化掉。

#7

If you are trying to compare two algorithms, do at least two benchmarks on each, alternating the order. i.e.:

如果你想比较两种算法，至少在每一种算法上做两个基准测试，交替排序。例如:

for(i=1..n)
  alg1();
for(i=1..n)
  alg2();
for(i=1..n)
  alg2();
for(i=1..n)
  alg1();

I have found some noticeable differences (5-10% sometimes) in the runtime of the same algorithm in different passes..

我发现同一算法在不同的运行时间有一些明显的差异(有时是5-10%)。

Also, make sure that n is very large, so that the runtime of each loop is at the very least 10 seconds or so. The more iterations, the more significant figures in your benchmark time and the more reliable that data is.

另外，确保n非常大，因此每个循环的运行时间至少为10秒左右。迭代次数越多，您的基准测试时间就越重要，数据就越可靠。

#8

There are many possible pitfalls for writing micro-benchmarks in Java.

在Java中编写微基准有很多可能的缺陷。

First: You have to calculate with all sorts of events that take time more or less random: Garbage collection, caching effects (of OS for files and of CPU for memory), IO etc.

首先:您必须计算所有类型的事件，这些事件或多或少需要花费时间:垃圾收集、缓存效果(文件的操作系统和内存的CPU)、IO等等。

Second: You cannot trust the accuracy of the measured times for very short intervals.

第二:你不能相信测量时间在很短的间隔内的准确性。

Third: The JVM optimizes your code while executing. So different runs in the same JVM-instance will become faster and faster.

第三:JVM在执行时优化您的代码。因此在同一个jvm实例中不同的运行将变得越来越快。

My recommendations: Make your benchmark run some seconds, that is more reliable than a runtime over milliseconds. Warm up the JVM (means running the benchmark at least once without measuring, that the JVM can run optimizations). And run your benchmark multiple times (maybe 5 times) and take the median-value. Run every micro-benchmark in a new JVM-instance (call for every benchmark new Java) otherwise optimization effects of the JVM can influence later running tests. Don't execute things, that aren't executed in the warmup-phase (as this could trigger class-load and recompilation).

我的建议是:让您的基准测试运行几秒钟，这比运行时运行几毫秒更可靠。预热JVM(意味着至少运行一次基准而不进行度量，JVM可以运行优化)。然后多次运行基准测试(可能是5次)并取中间值。在新的JVM实例中运行每个微基准(调用每个基准新的Java)，否则JVM的优化效果会影响以后运行的测试。不要执行在warmup阶段中没有执行的东西(因为这会触发类加载和重新编译)。

#9

It should also be noted that it might also be important to analyze the results of the micro benchmark when comparing different implementations. Therefore a significance test should be made.

还应该指出，在比较不同的实现时，分析微基准的结果也很重要。因此需要进行显著性检验。

This is because implementation A might be faster during most of the runs of the benchmark than implementation B. But A might also have a higher spread, so the measured performance benefit of A won't be of any significance when compared with B.

这是因为在大多数的基准测试中，实现A可能比实现B更快，但是A也可能有更高的传播，因此，与B相比，A的测量性能优势将不会有任何意义。

So it is also important to write and run a micro benchmark correctly, but also to analyze it correctly.

因此，正确地编写和运行微基准也是很重要的，同时也要正确地分析微基准。

#10

http://opt.sourceforge.net/ Java Micro Benchmark - control tasks required to determine the comparative performance characteristics of the computer system on different platforms. Can be used to guide optimization decisions and to compare different Java implementations.

http://opt.sourceforge.net/java微基准——在不同平台上确定计算机系统的比较性能特征所需要的控制任务。可以用于指导优化决策和比较不同的Java实现。

#11

To add to the other excellent advice, I'd also be mindful of the following:

为了增加其他的好建议，我也要注意以下几点:

For some CPUs (e.g. Intel Core i5 range with TurboBoost), the temperature (and number of cores currently being used, as well as thier utilisation percent) affects the clock speed. Since CPUs are dynamically clocked, this can affect your results. For example, if you have a single-threaded application, the maximum clock speed (with TurboBoost) is higher than for an application using all cores. This can therefore interfere with comparisons of single and multi-threaded performance on some systems. Bear in mind that the temperature and volatages also affect how long Turbo frequency is maintained.

对于一些cpu(如Intel Core i5带TurboBoost)，温度(当前正在使用的内核数量以及它们的利用率)会影响时钟的速度。因为cpu是动态的，所以这会影响结果。例如，如果您有一个单线程应用程序，那么最大的时钟速度(使用TurboBoost)要比使用所有内核的应用程序高。因此，这可能会影响在某些系统上对单线程和多线程性能的比较。记住，温度和电压也会影响涡轮频率保持的时间。

Perhaps a more fundamentally important aspect that you have direct control over: make sure you're measuring the right thing! For example, if you're using System.nanoTime() to benchmark a particular bit of code, put the calls to the assignment in places that make sense to avoid measuring things which you aren't interested in. For example, don't do:

也许你能直接控制的一个更重要的方面是:确保你测量的是正确的东西!例如，如果您正在使用System.nanoTime()对特定的代码进行基准测试，那么请将对赋值的调用放在有意义的地方，以避免度量您不感兴趣的东西。例如,不要做的事情:

long startTime = System.nanoTime();
//code here...
System.out.println("Code took "+(System.nanoTime()-startTime)+"nano seconds");

Problem is you're not immediately getting the end time when the code has finished. Instead, try the following:

问题是，当代码完成时，您不会立即得到结束时间。相反,试试以下:

final long endTime, startTime = System.nanoTime();
//code here...
endTime = System.nanoTime();
System.out.println("Code took "+(endTime-startTime)+"nano seconds");

#1

645

Tips about writing micro benchmarks from the creators of Java HotSpot:

关于编写Java HotSpot的微基准的提示:

规则0:阅读关于jvm和微基准测试的著名论文。一个很好的例子是Brian Goetz, 2005年。不要对微观基准期望过高;它们只度量有限范围的JVM性能特征。

Rule 2: Always run with -XX:+PrintCompilation, -verbose:gc, etc., so you can verify that the compiler and other parts of the JVM are not doing unexpected work during your timing phase.

规则2:始终使用-XX:+ printcompile， -verbose:gc等来运行，这样您就可以验证编译器和JVM的其他部分在计时阶段没有执行意外的工作。

Rule 2.1: Print messages at the beginning and end of timing and warmup phases, so you can verify that there is no output from Rule 2 during the timing phase.

规则2.1:在计时和热身阶段的开始和结束阶段打印消息，这样您就可以验证在计时阶段没有来自规则2的输出。

规则6:使用适当的工具来阅读编译器的思想，并期望对它生成的代码感到惊讶。你自己先检查一下代码，然后再想一下是什么让事情变得更快或更慢。

规则8:使用库作为基准测试，因为它可能更有效，并且已经为这个目的进行了调试。例如JMH、Caliper或Bill以及Paul的优秀Java UCSD基准。

#2

206

I know this question has been marked as answered but I wanted to mention two libraries that enable us to write micro benchmarks

我知道这个问题已经得到了回答，但是我想提到两个库，它们使我们能够编写微基准

Caliper from Google

卡尺从谷歌

Getting started tutorials

入门教程

http://codingjunkie.net/micro-benchmarking-with-caliper/
http://codingjunkie.net/micro-benchmarking-with-caliper/
http://vertexlabs.co.uk/blog/caliper
http://vertexlabs.co.uk/blog/caliper

JMH from OpenJDK

从OpenJDK JMH

Getting started tutorials

入门教程

Avoiding Benchmarking Pitfalls on the JVM
避免在JVM上进行基准测试
http://nitschinger.at/Using-JMH-for-Java-Microbenchmarking
http://nitschinger.at/Using-JMH-for-Java-Microbenchmarking
http://java-performance.info/jmh/
http://java-performance.info/jmh/

#3

Important things for Java benchmarks are:

Java基准测试的重要内容是:

Warm up the JIT first by running the code several times before timing it
首先通过运行代码几次来预热JIT，然后再计时
Make sure you run it for long enough to be able to measure the results in seconds or (better) tens of seconds
确保你运行它的时间足够长，能够以秒或(更好的)数十秒来度量结果
While you can't call System.gc() between iterations, it's a good idea to run it between tests, so that each test will hopefully get a "clean" memory space to work with. (Yes, gc() is more of a hint than a guarantee, but it's very likely that it really will garbage collect in my experience.)
虽然在迭代之间不能调用System.gc()，但是在测试之间运行它是一个好主意，这样每个测试就有希望获得一个“干净”的内存空间。(是的，gc()更多的是一种提示，而不是保证，但根据我的经验，它很可能真的会垃圾收集。)
I like to display iterations and time, and a score of time/iteration which can be scaled such that the "best" algorithm gets a score of 1.0 and others are scored in a relative fashion. This means you can run all algorithms for a longish time, varying both number of iterations and time, but still getting comparable results.
我喜欢显示迭代和时间，以及可以缩放的时间/迭代，以便“最佳”算法的得分为1.0，而其他算法的得分是相对的。这意味着您可以在较长的时间内运行所有算法，改变迭代次数和时间，但仍然可以得到类似的结果。

我正在写关于。net中基准测试框架设计的博客。我有一些早期的文章可以给你一些建议——当然不是所有的事情都合适，但是有些可能是合适的。

#4

jmh is a recent addition to OpenJDK and has been written by some performance engineers from Oracle. Certainly worth having a look.

jmh是OpenJDK的新成员，由Oracle的一些性能工程师编写。当然值得一看。

The jmh is a Java harness for building, running, and analysing nano/micro/macro benchmarks written in Java and other languages targetting the JVM.

jmh是一个Java工具，用于构建、运行和分析用Java和其他语言编写的、影响JVM的nano/micro/宏基准。

Very interesting pieces of information buried in the sample tests comments.

非常有趣的信息隐藏在示例测试注释中。

#5

Should the benchmark measure time/iteration or iterations/time, and why?

基准测试是否应该度量时间/迭代或迭代/时间，为什么?

It depends on what you are trying to test. If you are interested in latency, use time/iteration and if you are interested in throughput use iterations/time.

这取决于你要测试什么。如果您对延迟感兴趣，请使用时间/迭代;如果您对吞吐量使用迭代/时间感兴趣，请使用时间/迭代。

#6

Make sure you somehow use results which are computed in benchmarked code. Otherwise your code can be optimized away.

确保以某种方式使用在基准测试代码中计算的结果。否则你的代码就会被优化掉。

#7

If you are trying to compare two algorithms, do at least two benchmarks on each, alternating the order. i.e.:

如果你想比较两种算法，至少在每一种算法上做两个基准测试，交替排序。例如:

for(i=1..n)
  alg1();
for(i=1..n)
  alg2();
for(i=1..n)
  alg2();
for(i=1..n)
  alg1();

I have found some noticeable differences (5-10% sometimes) in the runtime of the same algorithm in different passes..

我发现同一算法在不同的运行时间有一些明显的差异(有时是5-10%)。

另外，确保n非常大，因此每个循环的运行时间至少为10秒左右。迭代次数越多，您的基准测试时间就越重要，数据就越可靠。

#8

There are many possible pitfalls for writing micro-benchmarks in Java.

在Java中编写微基准有很多可能的缺陷。

First: You have to calculate with all sorts of events that take time more or less random: Garbage collection, caching effects (of OS for files and of CPU for memory), IO etc.

首先:您必须计算所有类型的事件，这些事件或多或少需要花费时间:垃圾收集、缓存效果(文件的操作系统和内存的CPU)、IO等等。

Second: You cannot trust the accuracy of the measured times for very short intervals.

第二:你不能相信测量时间在很短的间隔内的准确性。

Third: The JVM optimizes your code while executing. So different runs in the same JVM-instance will become faster and faster.

第三:JVM在执行时优化您的代码。因此在同一个jvm实例中不同的运行将变得越来越快。

#9

It should also be noted that it might also be important to analyze the results of the micro benchmark when comparing different implementations. Therefore a significance test should be made.

还应该指出，在比较不同的实现时，分析微基准的结果也很重要。因此需要进行显著性检验。

这是因为在大多数的基准测试中，实现A可能比实现B更快，但是A也可能有更高的传播，因此，与B相比，A的测量性能优势将不会有任何意义。

So it is also important to write and run a micro benchmark correctly, but also to analyze it correctly.

因此，正确地编写和运行微基准也是很重要的，同时也要正确地分析微基准。

#10

http://opt.sourceforge.net/java微基准——在不同平台上确定计算机系统的比较性能特征所需要的控制任务。可以用于指导优化决策和比较不同的Java实现。

#11

To add to the other excellent advice, I'd also be mindful of the following:

为了增加其他的好建议，我也要注意以下几点:

long startTime = System.nanoTime();
//code here...
System.out.println("Code took "+(System.nanoTime()-startTime)+"nano seconds");

Problem is you're not immediately getting the end time when the code has finished. Instead, try the following:

问题是，当代码完成时，您不会立即得到结束时间。相反,试试以下:

final long endTime, startTime = System.nanoTime();
//code here...
endTime = System.nanoTime();
System.out.println("Code took "+(endTime-startTime)+"nano seconds");

秒客网

如何用Java编写正确的微基准?

11 个解决方案

#1

#2

#3

#4

#5

#6

#7

#8

#9

#10

#11

#1

#2

#3

#4

#5

#6

#7

#8

#9

#10

#11

相关文章