如何在不重用目标变量的情况下连续调用StringBuffer(或StringBuilder)来提高性能呢?

时间:2021-07-12 05:54:57

I have the following piece of code in Java.

我在Java中有以下代码段。

String foo = " ";

Method 1:

方法1:

StringBuffer buf = new StringBuffer();
buf.append("Hello");
buf.append(foo);
buf.append("World");  

Method 2:

方法2:

StringBuffer buf = new StringBuffer();
buf.append("Hello").append(foo).append("World");

Can someone enlighten me, how method 2 can improve the performance of code?

有人能启发我,方法2如何能提高代码的性能?

https://pmd.github.io/pmd-5.4.2/pmd-java/rules/java/strings.html#ConsecutiveAppendsShouldReuse

https://pmd.github.io/pmd-5.4.2/pmd-java/rules/java/strings.html ConsecutiveAppendsShouldReuse

1 个解决方案

#1


5  

Is it really different?

Let's start by analyzing javac output. Given the code:

让我们从分析javac输出开始。考虑到代码:

public class Main {
  public String appendInline() {
    final StringBuilder sb = new StringBuilder().append("some").append(' ').append("string");
    return sb.toString();
  }

  public String appendPerLine() {
    final StringBuilder sb = new StringBuilder();
    sb.append("some");
    sb.append(' ');
    sb.append("string");
    return sb.toString();
  }
}

We compile with javac, and check the output with javap -c -s

我们使用javac进行编译,并使用javap -c -s检查输出。

  public java.lang.String appendInline();
    descriptor: ()Ljava/lang/String;
    Code:
       0: new           #2                  // class java/lang/StringBuilder
       3: dup
       4: invokespecial #3                  // Method java/lang/StringBuilder."<init>":()V
       7: ldc           #4                  // String some
       9: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      12: bipush        32
      14: invokevirtual #6                  // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
      17: ldc           #7                  // String string
      19: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      22: astore_1
      23: aload_1
      24: invokevirtual #8                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      27: areturn

  public java.lang.String appendPerLine();
    descriptor: ()Ljava/lang/String;
    Code:
       0: new           #2                  // class java/lang/StringBuilder
       3: dup
       4: invokespecial #3                  // Method java/lang/StringBuilder."<init>":()V
       7: astore_1
       8: aload_1
       9: ldc           #4                  // String some
      11: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      14: pop
      15: aload_1
      16: bipush        32
      18: invokevirtual #6                  // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
      21: pop
      22: aload_1
      23: ldc           #7                  // String string
      25: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      28: pop
      29: aload_1
      30: invokevirtual #8                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      33: areturn

As seen, the appendPerLine variant produces a much larger bytecode, by producing several extra aload_1 and pop instructions that basically cancel each other out (leaving the string builder / buffer in he stack, and removing it to discard it). In turn, this means the JRE will produce a larger callsite and has a greater overhead. On the contrary, a smaller callsite improves the chances the JVM will inline the method calls, reducing method call overhead and further improving performance.

正如所看到的,appendPerLine变体生成了一个更大的字节码,它产生了几个额外的aload_1和pop指令,它们基本上可以相互抵消(将字符串生成器/缓冲区放在他的堆栈中,并删除它以丢弃它)。反过来,这意味着JRE将产生一个更大的callsite,并且有更大的开销。相反,较小的callsite提高了JVM内联方法调用、减少方法调用开销和进一步提高性能的机会。

This alone improves the performance from a cold start when chaining method calls.

当链接方法调用时,仅这一点就可以提高性能。

Shouldn't the JVM optimize this away?

One could argue that the JRE should be able to optimize these instructions away once the VM has warmed up. However, this claim needs support, and would still only apply to long-running processes.

有人可能会争辩说,一旦VM热身完毕,JRE应该能够优化这些指令。但是,这个请求需要支持,并且仍然只适用于长时间运行的流程。

So, let's check this claim, and validate the performance even after warmup. Let's use JMH to benchmark this behavior:

因此,让我们检查这个声明,并验证性能,即使在热身之后。让我们用JMH来衡量这种行为:

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;

@State(Scope.Benchmark)
public class StringBenchmark {
    private String from = "Alex";
    private String to = "Readers";
    private String subject = "Benchmarking with JMH";

    @Param({"16"})
    private int size;

    @Benchmark
    public String testEmailBuilderSimple() {
        StringBuilder builder = new StringBuilder(size);
        builder.append("From");
        builder.append(from);
        builder.append("To");
        builder.append(to);
        builder.append("Subject");
        builder.append(subject);
        return builder.toString();
    }

    @Benchmark
    public String testEmailBufferSimple() {
        StringBuffer buffer = new StringBuffer(size);
        buffer.append("From");
        buffer.append(from);
        buffer.append("To");
        buffer.append(to);
        buffer.append("Subject");
        buffer.append(subject);
        return buffer.toString();
    }

    @Benchmark
    public String testEmailBuilderChain() {
        return new StringBuilder(size).append("From").append(from).append("To").append(to).append("Subject")
                .append(subject).toString();
    }

    @Benchmark
    public String testEmailBufferChain() {
        return new StringBuffer(size).append("From").append(from).append("To").append(to).append("Subject")
                .append(subject).toString();
    }
}

We compile and run it and we obtain:

我们编译并运行它,我们得到:

Benchmark                               (size)   Mode  Cnt         Score        Error  Units
StringBenchmark.testEmailBufferChain        16  thrpt  200  22981842.957 ± 238502.907  ops/s
StringBenchmark.testEmailBufferSimple       16  thrpt  200   5789967.103 ±  62743.660  ops/s
StringBenchmark.testEmailBuilderChain       16  thrpt  200  22984472.260 ± 212243.175  ops/s
StringBenchmark.testEmailBuilderSimple      16  thrpt  200   5778824.788 ±  59200.312  ops/s

So, even after warming up, following the rule produces a ~4X improvement in throughput. All these runs were done using Oracle JRE 8u121.

因此,即使在热身之后,按照规则,吞吐量也会提高4倍。所有这些运行都是使用Oracle JRE 8u121完成的。

Of course, you don't have to believe me, others have done similar analysis and you can even try it yourself.

当然,你不必相信我,其他人也做过类似的分析,你甚至可以自己尝试一下。

Does it even matter?

Well, it depends. This is certainly a micro-optimization. If a system is using Bubble Sort, there are certainly more pressing performance issues than this. Not all programs have the same requirements and therefore not all need to follow the same rules.

这得视情况而定。这当然是一个微观优化。如果一个系统使用冒泡排序,那么肯定会有比这更紧迫的性能问题。不是所有的程序都有相同的要求,因此不需要遵循相同的规则。

This PMD rule is probably meaningful only to specific projects that value performance greatly, and will do whatever it takes to shave a couple ms. Such projects would normally use several different profilers, microbenchmarks, and other tools. And having tools such as PMD keeping an eye on specific patterns will certainly help them.

这个PMD规则可能只对那些非常重视性能的特定项目有意义,并且会做任何事情来为这些项目做任何事情,这些项目通常会使用几个不同的分析器、微基准和其他工具。并且拥有像PMD这样的工具可以帮助他们观察特定的模式。

PMD has many other rules available, that will probably apply to many other projects. Just because this particular rule may not apply to your project doesn't mean the tool is not useful, just take your time to review the available rules and choose those that really matter to you.

PMD有许多其他的规则,这可能适用于许多其他项目。仅仅因为这个规则不适用于您的项目并不意味着该工具是无用的,您只需花点时间检查可用的规则并选择那些对您来说非常重要的规则。

Hope that clears it up for everyone.

希望能让每个人都明白。

#1


5  

Is it really different?

Let's start by analyzing javac output. Given the code:

让我们从分析javac输出开始。考虑到代码:

public class Main {
  public String appendInline() {
    final StringBuilder sb = new StringBuilder().append("some").append(' ').append("string");
    return sb.toString();
  }

  public String appendPerLine() {
    final StringBuilder sb = new StringBuilder();
    sb.append("some");
    sb.append(' ');
    sb.append("string");
    return sb.toString();
  }
}

We compile with javac, and check the output with javap -c -s

我们使用javac进行编译,并使用javap -c -s检查输出。

  public java.lang.String appendInline();
    descriptor: ()Ljava/lang/String;
    Code:
       0: new           #2                  // class java/lang/StringBuilder
       3: dup
       4: invokespecial #3                  // Method java/lang/StringBuilder."<init>":()V
       7: ldc           #4                  // String some
       9: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      12: bipush        32
      14: invokevirtual #6                  // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
      17: ldc           #7                  // String string
      19: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      22: astore_1
      23: aload_1
      24: invokevirtual #8                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      27: areturn

  public java.lang.String appendPerLine();
    descriptor: ()Ljava/lang/String;
    Code:
       0: new           #2                  // class java/lang/StringBuilder
       3: dup
       4: invokespecial #3                  // Method java/lang/StringBuilder."<init>":()V
       7: astore_1
       8: aload_1
       9: ldc           #4                  // String some
      11: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      14: pop
      15: aload_1
      16: bipush        32
      18: invokevirtual #6                  // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
      21: pop
      22: aload_1
      23: ldc           #7                  // String string
      25: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      28: pop
      29: aload_1
      30: invokevirtual #8                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      33: areturn

As seen, the appendPerLine variant produces a much larger bytecode, by producing several extra aload_1 and pop instructions that basically cancel each other out (leaving the string builder / buffer in he stack, and removing it to discard it). In turn, this means the JRE will produce a larger callsite and has a greater overhead. On the contrary, a smaller callsite improves the chances the JVM will inline the method calls, reducing method call overhead and further improving performance.

正如所看到的,appendPerLine变体生成了一个更大的字节码,它产生了几个额外的aload_1和pop指令,它们基本上可以相互抵消(将字符串生成器/缓冲区放在他的堆栈中,并删除它以丢弃它)。反过来,这意味着JRE将产生一个更大的callsite,并且有更大的开销。相反,较小的callsite提高了JVM内联方法调用、减少方法调用开销和进一步提高性能的机会。

This alone improves the performance from a cold start when chaining method calls.

当链接方法调用时,仅这一点就可以提高性能。

Shouldn't the JVM optimize this away?

One could argue that the JRE should be able to optimize these instructions away once the VM has warmed up. However, this claim needs support, and would still only apply to long-running processes.

有人可能会争辩说,一旦VM热身完毕,JRE应该能够优化这些指令。但是,这个请求需要支持,并且仍然只适用于长时间运行的流程。

So, let's check this claim, and validate the performance even after warmup. Let's use JMH to benchmark this behavior:

因此,让我们检查这个声明,并验证性能,即使在热身之后。让我们用JMH来衡量这种行为:

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;

@State(Scope.Benchmark)
public class StringBenchmark {
    private String from = "Alex";
    private String to = "Readers";
    private String subject = "Benchmarking with JMH";

    @Param({"16"})
    private int size;

    @Benchmark
    public String testEmailBuilderSimple() {
        StringBuilder builder = new StringBuilder(size);
        builder.append("From");
        builder.append(from);
        builder.append("To");
        builder.append(to);
        builder.append("Subject");
        builder.append(subject);
        return builder.toString();
    }

    @Benchmark
    public String testEmailBufferSimple() {
        StringBuffer buffer = new StringBuffer(size);
        buffer.append("From");
        buffer.append(from);
        buffer.append("To");
        buffer.append(to);
        buffer.append("Subject");
        buffer.append(subject);
        return buffer.toString();
    }

    @Benchmark
    public String testEmailBuilderChain() {
        return new StringBuilder(size).append("From").append(from).append("To").append(to).append("Subject")
                .append(subject).toString();
    }

    @Benchmark
    public String testEmailBufferChain() {
        return new StringBuffer(size).append("From").append(from).append("To").append(to).append("Subject")
                .append(subject).toString();
    }
}

We compile and run it and we obtain:

我们编译并运行它,我们得到:

Benchmark                               (size)   Mode  Cnt         Score        Error  Units
StringBenchmark.testEmailBufferChain        16  thrpt  200  22981842.957 ± 238502.907  ops/s
StringBenchmark.testEmailBufferSimple       16  thrpt  200   5789967.103 ±  62743.660  ops/s
StringBenchmark.testEmailBuilderChain       16  thrpt  200  22984472.260 ± 212243.175  ops/s
StringBenchmark.testEmailBuilderSimple      16  thrpt  200   5778824.788 ±  59200.312  ops/s

So, even after warming up, following the rule produces a ~4X improvement in throughput. All these runs were done using Oracle JRE 8u121.

因此,即使在热身之后,按照规则,吞吐量也会提高4倍。所有这些运行都是使用Oracle JRE 8u121完成的。

Of course, you don't have to believe me, others have done similar analysis and you can even try it yourself.

当然,你不必相信我,其他人也做过类似的分析,你甚至可以自己尝试一下。

Does it even matter?

Well, it depends. This is certainly a micro-optimization. If a system is using Bubble Sort, there are certainly more pressing performance issues than this. Not all programs have the same requirements and therefore not all need to follow the same rules.

这得视情况而定。这当然是一个微观优化。如果一个系统使用冒泡排序,那么肯定会有比这更紧迫的性能问题。不是所有的程序都有相同的要求,因此不需要遵循相同的规则。

This PMD rule is probably meaningful only to specific projects that value performance greatly, and will do whatever it takes to shave a couple ms. Such projects would normally use several different profilers, microbenchmarks, and other tools. And having tools such as PMD keeping an eye on specific patterns will certainly help them.

这个PMD规则可能只对那些非常重视性能的特定项目有意义,并且会做任何事情来为这些项目做任何事情,这些项目通常会使用几个不同的分析器、微基准和其他工具。并且拥有像PMD这样的工具可以帮助他们观察特定的模式。

PMD has many other rules available, that will probably apply to many other projects. Just because this particular rule may not apply to your project doesn't mean the tool is not useful, just take your time to review the available rules and choose those that really matter to you.

PMD有许多其他的规则,这可能适用于许多其他项目。仅仅因为这个规则不适用于您的项目并不意味着该工具是无用的,您只需花点时间检查可用的规则并选择那些对您来说非常重要的规则。

Hope that clears it up for everyone.

希望能让每个人都明白。