We have a simple unit test as part of our performance test suite which we use to verify that the base system is sane and performs before we even start testing our code. This way we usually verify that a machine is suitable for running actual performance tests.
我们有一个简单的单元测试作为我们的性能测试套件的一部分,我们用它来验证基本系统在我们开始测试代码之前是否理智并且执行。这样我们通常会验证机器是否适合运行实际的性能测试。
When we compare Java 6 and Java 7 using this test, Java 7 takes considerably longer to execute! We see an average of 22 seconds for Java 6 and 24 seconds for Java 7. The test only computes fibonacci, so only bytecode execution in a single thread should be relevant here and not I/O or anything else.
当我们使用此测试比较Java 6和Java 7时,Java 7执行需要相当长的时间!我们看到Java 6平均为22秒,Java 7平均为24秒。该测试仅计算斐波那契,因此在单个线程中只有字节码执行应该与此相关,而不是I / O或其他任何东西。
Currently we run it with default settings on Windows with or without "-server", with both 32 and 64 bit JVM, all runs indicate a similar degradation for Java 7.
目前我们使用带有或不带“-server”的Windows上的默认设置运行它,使用32位和64位JVM,所有运行都表明Java 7的类似降级。
Which tuning options may be suitable here to try to match Java 7 against Java 6?
哪些调优选项可能适合尝试将Java 7与Java 6相匹配?
public class BaseLinePerformance {
@Before
public void setup() throws Exception{
fib(46);
}
@Test
public void testBaseLine() throws Exception {
long start = System.currentTimeMillis();
fib(46);
fib(46);
System.out.println("Time: " + (System.currentTimeMillis() - start));
}
public static void fib(final int n) throws Exception {
for (int i = 0; i < n; i++) {
System.out.println("fib(" + i + ") = " + fib2(i));
}
}
public static int fib2(final int n) {
if (n == 0)
return 0;
else if (n == 1)
return 1;
else
return fib2(n - 2) + fib2(n - 1);
}
}
Update: I have reduced the test to not do any sleeps and followed the other suggestions from How do I write a correct micro-benchmark in Java?, I still see the same difference between Java 7 and Java 6, additional JVM options to print compilation and GC do not show any output during the actual test, only initially compilation information is printed.
更新:我已经减少了测试,不做任何睡眠,并遵循了如何在Java中编写正确的微基准测试的其他建议?我仍然看到Java 7和Java 6之间的相同区别,打印编译的其他JVM选项并且GC在实际测试期间不显示任何输出,只打印最初的编译信息。
2 个解决方案
#1
5
One of my colleagues found out the reason for this after a bit more digging:
我的一位同事经过多次挖掘后发现了原因:
There is a JVM flag -XX:MaxRecursiveInlineLevel which has a default value of 1. It seems the handling of this setting was slightly incorrect in previous versions, so Sun/Oracle "fixed" this in Java 7, however it has the side-effect that sometimes the inlining now is done less aggressively and thus pure runtime/CPU time of recursive code can be longer than before.
有一个JVM标志-XX:MaxRecursiveInlineLevel,其默认值为1.在以前的版本中,这个设置的处理似乎稍微不正确,所以Sun / Oracle在Java 7中“修复”了这个,但它有副作用有时,内联现在不那么积极地进行,因此递归代码的纯运行时/ CPU时间可能比以前更长。
We are testing setting it to 2 to get the same behavior as in Java 6 at least for the test in question.
我们正在测试将其设置为2以获得与Java 6中相同的行为,至少对于相关测试而言。
#2
0
This is not an easy answer, there are plenty of things that can account for those 2 seconds.
这不是一个简单的答案,有很多东西可以解释这2秒。
I am assuming for your comments that you are already familiar with micro benchmarking and that your benchmark is being run after warming up the JVM having your code reach an optimized JIT state and no GCs happening, also assuming that your hardware setup has not changed.
我假设您已经熟悉微型基准测试,并且在您的代码达到优化的JIT状态且没有GC发生的JVM预热后运行您的基准测试,同时假设您的硬件设置未发生变化。
I would recommend CPU profiling your benchmark, that will help you identify where those two seconds are being accounted and perhaps act accordingly.
我建议CPU分析你的基准测试,这将帮助你确定这两秒钟的位置,并可能采取相应的行动。
If you are curious about the bytecode you can take a peek at it.
如果您对字节码感到好奇,可以先看一下它。
To do this you can compile your class and do javap -c ClassName on both machines, this will disassemble the class file bytecode and show it to you, here you will surely see changes between both compiled classes.
为此,您可以编译您的类并在两台机器上执行javap -c ClassName,这将反汇编类文件字节码并显示给您,在这里您肯定会看到两个编译类之间的更改。
In conclusion, profile and tune your application accordingly to reach 22 seconds after looking at the data, there is nothing you can do anyways about the bytecode implementation.
总而言之,在查看数据后,相应地分析和调整应用程序达到22秒,无论如何都没有关于字节码实现的任何事情。
#1
5
One of my colleagues found out the reason for this after a bit more digging:
我的一位同事经过多次挖掘后发现了原因:
There is a JVM flag -XX:MaxRecursiveInlineLevel which has a default value of 1. It seems the handling of this setting was slightly incorrect in previous versions, so Sun/Oracle "fixed" this in Java 7, however it has the side-effect that sometimes the inlining now is done less aggressively and thus pure runtime/CPU time of recursive code can be longer than before.
有一个JVM标志-XX:MaxRecursiveInlineLevel,其默认值为1.在以前的版本中,这个设置的处理似乎稍微不正确,所以Sun / Oracle在Java 7中“修复”了这个,但它有副作用有时,内联现在不那么积极地进行,因此递归代码的纯运行时/ CPU时间可能比以前更长。
We are testing setting it to 2 to get the same behavior as in Java 6 at least for the test in question.
我们正在测试将其设置为2以获得与Java 6中相同的行为,至少对于相关测试而言。
#2
0
This is not an easy answer, there are plenty of things that can account for those 2 seconds.
这不是一个简单的答案,有很多东西可以解释这2秒。
I am assuming for your comments that you are already familiar with micro benchmarking and that your benchmark is being run after warming up the JVM having your code reach an optimized JIT state and no GCs happening, also assuming that your hardware setup has not changed.
我假设您已经熟悉微型基准测试,并且在您的代码达到优化的JIT状态且没有GC发生的JVM预热后运行您的基准测试,同时假设您的硬件设置未发生变化。
I would recommend CPU profiling your benchmark, that will help you identify where those two seconds are being accounted and perhaps act accordingly.
我建议CPU分析你的基准测试,这将帮助你确定这两秒钟的位置,并可能采取相应的行动。
If you are curious about the bytecode you can take a peek at it.
如果您对字节码感到好奇,可以先看一下它。
To do this you can compile your class and do javap -c ClassName on both machines, this will disassemble the class file bytecode and show it to you, here you will surely see changes between both compiled classes.
为此,您可以编译您的类并在两台机器上执行javap -c ClassName,这将反汇编类文件字节码并显示给您,在这里您肯定会看到两个编译类之间的更改。
In conclusion, profile and tune your application accordingly to reach 22 seconds after looking at the data, there is nothing you can do anyways about the bytecode implementation.
总而言之,在查看数据后,相应地分析和调整应用程序达到22秒,无论如何都没有关于字节码实现的任何事情。