如果Java重复相同的代码,为什么会更快呢?

时间:2022-04-21 17:22:28

Given the following code:

鉴于以下代码:

public class Test{

    static int[] big = new int [10000];

    public static void main(String[] args){
        long time;
        for (int i = 0; i < 16; i++){
            time = System.nanoTime();
            getTimes();
            System.out.println(System.nanoTime() - time);
        }    
    }
    public static void getTimes(){
        int d;
        for (int i = 0; i < 10000; i++){
            d = big[i];
        }    
    }
}

The output shows a decreasing duration trend:

输出为持续时间递减趋势:

171918
167213
165930
165502
164647
165075
203991
70563
45759
43193
45759
44476
45759
52601
47897
48325

Why is the same code in getTimes being executed in less than one third of the time after it has been executed 8 times or more? (Edit: It does not always happen at the 8th time, but from the 5th to 10th)

为什么在执行8次或以上之后,在不到1 / 3的时间内执行的相同代码?(编辑:不总是在第8次,而是在第5次到第10次)

3 个解决方案

#1


9  

The fact that what you see is the result of some JIT optimization should be clear by now looking at all the comments you received. But what is really happening and why that code is optimized always nearly after the same amount of iterations of the outer for?

实际上,您看到的是一些JIT优化的结果,现在应该清楚地看到您收到的所有评论。但是究竟发生了什么,为什么代码总是在相同数量的外部迭代之后进行优化?

I'll try to answer both questions but please remember that everything explained here is relative only to Oracle's Hotspot VM. There is no Java specification that defines how a JVM JIT should behave.

我将尝试回答这两个问题,但请记住,这里所解释的一切都是与Oracle的Hotspot VM相关的。没有Java规范定义JVM JIT的行为方式。

First of all, let's see what the JIT is doing running that Test program with some additional flag (the plain JVM is enough to run this, no need to load the debugging shared library, required for some of the UnlockDiagnosticVMOptions options):

首先,让我们看看JIT在用一些附加标志运行测试程序时做了什么(普通JVM足够运行这个程序,不需要加载调试共享库,这是unlockdiagnostics vmoptions选项中的一些选项所需要的):

java -XX:+PrintCompilation Test

The execution completes with this output (removing a few lines at the beginning that show that other methods are being compiled):

执行以这个输出完成(在开始时删除一些显示正在编译其他方法的行):

[...]
195017
184573
184342
184262
183491
189494
    131   51%      3       Test::getTimes @ 2 (22 bytes)
245167
    132   52       3       Test::getTimes (22 bytes)
165144  

65090
    132   53       1       java.nio.Buffer::limit (5 bytes)
59427
    132   54%      4       Test::getTimes @ 2 (22 bytes)  
75137
48110    
    135   51%     3        Test::getTimes @ -2 (22 bytes)   made not entrant

    142   55       4       Test::getTimes (22 bytes)
150820
86951
90012
91421

The printlns from your code are interleaved with diagnostic information related to the compilation the JIT is performing. Looking at a single line:

您的代码中的printlns与JIT正在执行的编译相关的诊断信息交错在一起。看一行:

131    51%      3       Test::getTimes @ 2 (22 bytes)

Each column has the following meaning:

每一列都有以下含义:

  1. Timestamp
  2. 时间戳
  3. Compilation Id (with additional attributes if needed)
  4. 编译Id(如果需要,还有其他属性)
  5. Tiered compilation level
  6. 分层编制水平
  7. Method short name (with @osr_bci if available)
  8. 方法短名称(可用@osr_bci)
  9. Compiled method size
  10. 编译方法的大小

Keeping only the lines related to getTimes:

只保留与getTimes相关的行:

    131   51%      3       Test::getTimes @ 2 (22 bytes)
    132   52       3       Test::getTimes (22 bytes)
    132   54%      4       Test::getTimes @ 2 (22 bytes)     
    135   51%      3       Test::getTimes @ -2 (22 bytes)   made not entrant
    142   55       4       Test::getTimes (22 bytes)

It's clear that getTimes is being compiled more than once, but every time it's compiled in a different way.

很明显,getTimes被编译了不止一次,但是每次都是以不同的方式编译的。

That % symbol means that on-stack replacement(OSR) has been performed, meaning that the 10k loop contained in getTimes have been compiled isolated from the rest of the method and that the JVM replaced that section of the method code with the compiled version. The osr_bci is an index that points to this new compiled block of code.

这个%符号表示已经执行了堆栈上的替换(OSR),这意味着getTimes中包含的10k循环已经被编译,与方法的其他部分是隔离的,并且JVM用编译后的版本替换了方法代码的那个部分。osr_bci是指向这个新的编译代码块的索引。

The next compilation is a classic JIT compilation that compiles all the getTimes method (the size is still the same because there is nothing else in that method other than the loop).

下一个编译是一个经典的JIT编译,它编译所有的getTimes方法(大小仍然是相同的,因为在这个方法中除了循环之外没有其他方法)。

The third time another OSR is performed but at a different tiered level. Tiered compilation have been added in Java7 and basically allows the JVM to choose the client or server JIT mode at runtime, switching freely between the two when necessary. The Client mode performs a simpler set of optimization strategies while the server mode is able to apply more sophisticated optimizations that on the other hand have a bigger cost in term of time spent compiling.

第三次执行另一个OSR,但级别不同。在Java7中添加了分层编译,基本上允许JVM在运行时选择客户机或服务器JIT模式,在必要时在两者之间*切换。客户端模式执行一组更简单的优化策略,而服务器模式能够应用更复杂的优化,而另一方面,在编译的时间上花费更大的成本。

I will not go into details about the different modes or about tiered compilation, if you need additional information i recommend Java Performance: The Definitive Guide by Scott Oaks and also check this question that explain what changes between the levels.

如果您需要我推荐的Java性能的附加信息:Scott Oaks的权威指南,我将不详细介绍不同的模式或分层编译,并检查这个问题,它解释了级别之间的变化。

Back to the output of PrintCompilation, the gist here is that from a certain point in time, a sequence of compilations with increasing complexity is performed until the method becomes apparently stable (i.e. the JIT doesn't compile it again).

回到printcompile的输出,这里的要点是,从某个时间点开始,执行一系列复杂程度越来越高的编译,直到该方法变得明显稳定(即JIT不再编译它)。

So, why all this start at that certain point in time, after 5-10 iteration of the main loop?

那么,为什么所有这些都是从某个时间点开始的,在主循环的5-10次迭代之后?

Because the inner getTimes loop has become "hot".

因为内getTimes循环已经变得“热”。

The Hotspot VM, usually defines "hot" those methods that have been invoked at least 10k times (that's the historical default threshold, can be changed using -XX:CompileThreshold=<num>, with tiered compilation there are now multiple thresholds) but in the case of OSR i'm guessing that it's performed when a block of code is deemed "hot" enough, in term of absolute or relative execution time, inside the method contains it.

Hotspot VM,通常这些方法定义了“热”,至少10 k次调用(这是历史默认阈值,可以使用- xx:改变CompileThreshold = < num >,与分层编译现在有多个阈值),但在OSR的情况下我猜它执行一块代码时被认为是“热”,在绝对的还是相对的执行时间,该方法包含它。

Additional References

额外的引用

PrintCompilation Guide by Krystal Mok

克里斯多·莫克的版画编译指南

Java Performance: The Definitive Guide

Java性能:权威指南。

#2


4  

The JIT (Just in Time) Compiler of the virtual machine optimizes the intepretation of the Java Byte Code. For example, if you have an if() statement, which is false in about 99% of cases, the jit optimizes your code for the false case, which makes your true cases eventually slower. Sorry for the bad english.

虚拟机的JIT编译器(及时)优化了Java字节代码的整数。例如,如果您有一个if()语句,在大约99%的情况下是假的,那么jit会为假的情况优化代码,这最终会使真实的情况变慢。对不起,我的英语不好。

#3


0  

Example : Code Before Optimization

示例:优化前的代码

class A {
  B b;
  public void newMethod() {
    y = b.get();  //calling get() function
    ...do stuff...
    z = b.get();   // calling again
    sum = y + z;
  }
}
class B {
   int value;
   final int get() {
      return value;
   }
}

Example : Code After Optimization

示例:优化后的代码。

class A {
B b;
public void newMethod() {
   y = b.value;
   ...do stuff...
   sum = y + y; 
}
}
class B {
   int value;
   final int get() {
      return value;
   }
}

Originally, the code contained two calls to the b.get() method. After optimization, the two method calls are optimized into a single variable-copy operation; that is, the optimized code does not need to perform a method call to acquire the field value of class B.

最初,代码包含对b.get()方法的两个调用。优化后,将两个方法调用优化为一个单变量拷贝操作;也就是说,优化后的代码不需要执行方法调用来获取类B的字段值。

Read more

阅读更多

#1


9  

The fact that what you see is the result of some JIT optimization should be clear by now looking at all the comments you received. But what is really happening and why that code is optimized always nearly after the same amount of iterations of the outer for?

实际上,您看到的是一些JIT优化的结果,现在应该清楚地看到您收到的所有评论。但是究竟发生了什么,为什么代码总是在相同数量的外部迭代之后进行优化?

I'll try to answer both questions but please remember that everything explained here is relative only to Oracle's Hotspot VM. There is no Java specification that defines how a JVM JIT should behave.

我将尝试回答这两个问题,但请记住,这里所解释的一切都是与Oracle的Hotspot VM相关的。没有Java规范定义JVM JIT的行为方式。

First of all, let's see what the JIT is doing running that Test program with some additional flag (the plain JVM is enough to run this, no need to load the debugging shared library, required for some of the UnlockDiagnosticVMOptions options):

首先,让我们看看JIT在用一些附加标志运行测试程序时做了什么(普通JVM足够运行这个程序,不需要加载调试共享库,这是unlockdiagnostics vmoptions选项中的一些选项所需要的):

java -XX:+PrintCompilation Test

The execution completes with this output (removing a few lines at the beginning that show that other methods are being compiled):

执行以这个输出完成(在开始时删除一些显示正在编译其他方法的行):

[...]
195017
184573
184342
184262
183491
189494
    131   51%      3       Test::getTimes @ 2 (22 bytes)
245167
    132   52       3       Test::getTimes (22 bytes)
165144  

65090
    132   53       1       java.nio.Buffer::limit (5 bytes)
59427
    132   54%      4       Test::getTimes @ 2 (22 bytes)  
75137
48110    
    135   51%     3        Test::getTimes @ -2 (22 bytes)   made not entrant

    142   55       4       Test::getTimes (22 bytes)
150820
86951
90012
91421

The printlns from your code are interleaved with diagnostic information related to the compilation the JIT is performing. Looking at a single line:

您的代码中的printlns与JIT正在执行的编译相关的诊断信息交错在一起。看一行:

131    51%      3       Test::getTimes @ 2 (22 bytes)

Each column has the following meaning:

每一列都有以下含义:

  1. Timestamp
  2. 时间戳
  3. Compilation Id (with additional attributes if needed)
  4. 编译Id(如果需要,还有其他属性)
  5. Tiered compilation level
  6. 分层编制水平
  7. Method short name (with @osr_bci if available)
  8. 方法短名称(可用@osr_bci)
  9. Compiled method size
  10. 编译方法的大小

Keeping only the lines related to getTimes:

只保留与getTimes相关的行:

    131   51%      3       Test::getTimes @ 2 (22 bytes)
    132   52       3       Test::getTimes (22 bytes)
    132   54%      4       Test::getTimes @ 2 (22 bytes)     
    135   51%      3       Test::getTimes @ -2 (22 bytes)   made not entrant
    142   55       4       Test::getTimes (22 bytes)

It's clear that getTimes is being compiled more than once, but every time it's compiled in a different way.

很明显,getTimes被编译了不止一次,但是每次都是以不同的方式编译的。

That % symbol means that on-stack replacement(OSR) has been performed, meaning that the 10k loop contained in getTimes have been compiled isolated from the rest of the method and that the JVM replaced that section of the method code with the compiled version. The osr_bci is an index that points to this new compiled block of code.

这个%符号表示已经执行了堆栈上的替换(OSR),这意味着getTimes中包含的10k循环已经被编译,与方法的其他部分是隔离的,并且JVM用编译后的版本替换了方法代码的那个部分。osr_bci是指向这个新的编译代码块的索引。

The next compilation is a classic JIT compilation that compiles all the getTimes method (the size is still the same because there is nothing else in that method other than the loop).

下一个编译是一个经典的JIT编译,它编译所有的getTimes方法(大小仍然是相同的,因为在这个方法中除了循环之外没有其他方法)。

The third time another OSR is performed but at a different tiered level. Tiered compilation have been added in Java7 and basically allows the JVM to choose the client or server JIT mode at runtime, switching freely between the two when necessary. The Client mode performs a simpler set of optimization strategies while the server mode is able to apply more sophisticated optimizations that on the other hand have a bigger cost in term of time spent compiling.

第三次执行另一个OSR,但级别不同。在Java7中添加了分层编译,基本上允许JVM在运行时选择客户机或服务器JIT模式,在必要时在两者之间*切换。客户端模式执行一组更简单的优化策略,而服务器模式能够应用更复杂的优化,而另一方面,在编译的时间上花费更大的成本。

I will not go into details about the different modes or about tiered compilation, if you need additional information i recommend Java Performance: The Definitive Guide by Scott Oaks and also check this question that explain what changes between the levels.

如果您需要我推荐的Java性能的附加信息:Scott Oaks的权威指南,我将不详细介绍不同的模式或分层编译,并检查这个问题,它解释了级别之间的变化。

Back to the output of PrintCompilation, the gist here is that from a certain point in time, a sequence of compilations with increasing complexity is performed until the method becomes apparently stable (i.e. the JIT doesn't compile it again).

回到printcompile的输出,这里的要点是,从某个时间点开始,执行一系列复杂程度越来越高的编译,直到该方法变得明显稳定(即JIT不再编译它)。

So, why all this start at that certain point in time, after 5-10 iteration of the main loop?

那么,为什么所有这些都是从某个时间点开始的,在主循环的5-10次迭代之后?

Because the inner getTimes loop has become "hot".

因为内getTimes循环已经变得“热”。

The Hotspot VM, usually defines "hot" those methods that have been invoked at least 10k times (that's the historical default threshold, can be changed using -XX:CompileThreshold=<num>, with tiered compilation there are now multiple thresholds) but in the case of OSR i'm guessing that it's performed when a block of code is deemed "hot" enough, in term of absolute or relative execution time, inside the method contains it.

Hotspot VM,通常这些方法定义了“热”,至少10 k次调用(这是历史默认阈值,可以使用- xx:改变CompileThreshold = < num >,与分层编译现在有多个阈值),但在OSR的情况下我猜它执行一块代码时被认为是“热”,在绝对的还是相对的执行时间,该方法包含它。

Additional References

额外的引用

PrintCompilation Guide by Krystal Mok

克里斯多·莫克的版画编译指南

Java Performance: The Definitive Guide

Java性能:权威指南。

#2


4  

The JIT (Just in Time) Compiler of the virtual machine optimizes the intepretation of the Java Byte Code. For example, if you have an if() statement, which is false in about 99% of cases, the jit optimizes your code for the false case, which makes your true cases eventually slower. Sorry for the bad english.

虚拟机的JIT编译器(及时)优化了Java字节代码的整数。例如,如果您有一个if()语句,在大约99%的情况下是假的,那么jit会为假的情况优化代码,这最终会使真实的情况变慢。对不起,我的英语不好。

#3


0  

Example : Code Before Optimization

示例:优化前的代码

class A {
  B b;
  public void newMethod() {
    y = b.get();  //calling get() function
    ...do stuff...
    z = b.get();   // calling again
    sum = y + z;
  }
}
class B {
   int value;
   final int get() {
      return value;
   }
}

Example : Code After Optimization

示例:优化后的代码。

class A {
B b;
public void newMethod() {
   y = b.value;
   ...do stuff...
   sum = y + y; 
}
}
class B {
   int value;
   final int get() {
      return value;
   }
}

Originally, the code contained two calls to the b.get() method. After optimization, the two method calls are optimized into a single variable-copy operation; that is, the optimized code does not need to perform a method call to acquire the field value of class B.

最初,代码包含对b.get()方法的两个调用。优化后,将两个方法调用优化为一个单变量拷贝操作;也就是说,优化后的代码不需要执行方法调用来获取类B的字段值。

Read more

阅读更多