如何在大型Java代码库中找到所有天真(“+”)字符串连接?

时间:2021-11-11 19:24:32

We have a huge code base and we suspect that there are quite a few "+" based string concats in the code that might benefit from the use of StringBuilder/StringBuffer. Is there an effective way or existing tools to search for these, especially in Eclipse?

我们有一个巨大的代码库,我们怀疑代码中有很多基于“+”的字符串连接可能会受益于StringBuilder / StringBuffer的使用。有没有一种有效的方法或现有的工具来搜索这些,特别是在Eclipse中?

A search by "+" isn't a good idea since there's a lot of math in the code, so this needs to be something that actually analyzes the code and types to figure out which additions involve strings.

搜索“+”不是一个好主意,因为代码中有很多数学,所以这需要实际分析代码和类型以确定哪些添加涉及字符串。

11 个解决方案

#1


Just make sure you really understand where it's actually better to use StringBuilder. I'm not saying you don't know, but there are certainly plenty of people who would take code like this:

只要确保你真正了解使用StringBuilder实际上更好的地方。我不是说你不知道,但肯定有很多人会采用这样的代码:

String foo = "Your age is: " + getAge();

and turn it into:

并把它变成:

StringBuilder builder = new StringBuilder("Your age is: ");
builder.append(getAge());
String foo = builder.toString();

which is just a less readable version of the same thing. Often the naive solution is the best solution. Likewise some people worry about:

这只是同一件事的不太可读的版本。通常,天真的解决方案是最好的解决方案。同样有些人担心:

String x = "long line" + 
    "another long line";

when actually that concatenation is performed at compile-time.

实际上,在编译时执行连接。

As nsander's quite rightly said, find out if you've got a problem first...

正如纳桑德所说的那样,找出你是否先遇到问题......

#2


I'm pretty sure FindBugs can detect these. If not, it's still extremely useful to have around.

我很确定FindBugs可以检测到这些。如果没有,它仍然非常有用。

Edit: It can indeed find concatenations in a loop, which is the only time it really makes a difference.

编辑:它确实可以在循环中找到连接,这是唯一真正有所作为的时间。

#3


Why not use a profiler to find the "naive" string concatenations that actually matter? Only switch over to the more verbose StringBuffer if you actually need it.

为什么不使用分析器来查找实际上重要的“天真”字符串连接?如果确实需要,只切换到更详细的StringBuffer。

#4


Chances are you will make your performance worse and your code less readable. The compiler already makes this optimization, and unless you are in a loop, it will generally do a better job. Furthermore, in JDK 8 they may come out with StringUberBuilder, and all your code which uses StringBuilder will run slower, while the "+" concatenated strings will benefit from the new class.

您可能会使性能变差,代码可读性降低。编译器已经进行了这种优化,除非你处于循环中,否则它通常会做得更好。此外,在JDK 8中,它们可能会出现StringUberBuilder,并且所有使用StringBuilder的代码都会运行得更慢,而“+”连接字符串将从新类中受益。

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald Knuth

“我们应该忘记小的效率,大约97%的时间说:过早的优化是所有邪恶的根源。然而,我们不应该把这个关键的3%的机会放弃。“ - 唐纳德克努特

#5


IntelliJ can find these using "structural search". You search for "$a + $b" and set the characteristics of both $a and $b as type java.lang.String.

IntelliJ可以使用“结构搜索”找到它们。您搜索“$ a + $ b”并将$ a和$ b的特征设置为java.lang.String类型。

However, if you have IntelliJ, it likely has a built in inspection that will do a better job of finding what you want anyway.

但是,如果你有IntelliJ,它可能有一个内置的检查,可以更好地找到你想要的东西。

#6


I suggest using a profiler. This is really a performance question and if you can't make the code show up with reasonable test data there is unlikely to be any value in changing it.

我建议使用分析器。这实际上是一个性能问题,如果您无法使用合理的测试数据显示代码,则更改它的可能性不大。

#7


Jon Skeet (as always) and the others have already said all that is needed but I would really like to emphasize that maybe you are hunting for a non existing performance improvement...

Jon Skeet(一如既往)和其他人已经说过所有需要但我真的想强调的是,也许你正在寻找一种不存在的性能提升......

Take a look at this code:

看看这段代码:

public class StringBuilding {
  public static void main(String args[]) {
    String a = "The first part";
    String b = "The second part";
    String res = a+b;

    System.gc(); // Inserted to make it easier to see "before" and "after" below

    res = new StringBuilder().append(a).append(b).toString();
  }
}

If you compile it and disassemble it with javap, this is what you get.

如果你编译它并用javap反汇编它,这就是你得到的。

public static void main(java.lang.String[]);
  Code:
   0:   ldc     #2; //String The first part
   2:   astore_1
   3:   ldc     #3; //String The second part
   5:   astore_2
   6:   new     #4; //class java/lang/StringBuilder
   9:   dup
   10:  invokespecial   #5; //Method java/lang/StringBuilder."<init>":()V
   13:  aload_1
   14:  invokevirtual   #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   17:  aload_2
   18:  invokevirtual   #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   21:  invokevirtual   #7; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   24:  astore_3
   25:  invokestatic    #8; //Method java/lang/System.gc:()V
   28:  new     #4; //class java/lang/StringBuilder
   31:  dup
   32:  invokespecial   #5; //Method java/lang/StringBuilder."<init>":()V
   35:  aload_1
   36:  invokevirtual   #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   39:  aload_2
   40:  invokevirtual   #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   43:  invokevirtual   #7; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   46:  astore_3
   47:  return

As you can see, 6-21 are pretty much identical to 28-43. Not much of an optimization, right?

如你所见,6-21与28-43完全相同。没有太多的优化,对吧?

Edit: The loop issue is valid though...

编辑:循环问题虽然有效...

#8


Instead of searching for just a + search for "+ and +" those will find the vast majority probably. cases where you are concatenating multiple variables will be tougher.

而不是只搜索+ +搜索“+和+”,这些可能会找到绝大多数。你连接多个变量的情况将更加困难。

#9


If you have a huge code base you probably have lots of hotspots, which may or may not involve "+" concatenation. Just run your usual profiler, and fix the big ones, regardless of what kind of construct they are.

如果你有一个巨大的代码库,你可能有很多热点,可能会或可能不会涉及“+”串联。只需运行你通常的分析器,并修复大的分析器,无论它们是什么类型的构造。

It would be an odd approach to fix just one class of (potential) bottleneck, rather than fixing the actual bottlenecks.

解决一类(潜在)瓶颈而不是修复实际瓶颈将是一种奇怪的方法。

#10


With PMD, you can write rules with XPath or using a Java syntax. It might be worth investigating whether it can match the string concatenation operator—it certainly seems within the purview of static analysis. This is such a vague idea, I'm going to make this "community wiki"; if anyone else wants to elaborate (or create their own answer along these lines), please do!

使用PMD,您可以使用XPath或使用Java语法编写规则。可能值得研究它是否可以匹配字符串连接运算符 - 它肯定似乎属于静态分析的范围。这是一个模糊的想法,我将打造这个“社区维基”;如果有其他人想要详细说明(或沿着这些方针创建自己的答案),请做!

#11


Forget it - your JVM most likely does it already - see the JLS, 15.18.1.2 Optimization of String Concatenation:

忘掉它 - 你的JVM很可能已经这样做了 - 参见JLS,15.18.1.2字符串连接的优化:

An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.

实现可以选择在一个步骤中执行转换和连接,以避免创建然后丢弃中间String对象。为了提高重复字符串连接的性能,Java编译器可以使用StringBuffer类或类似技术来减少通过计算表达式创建的中间String对象的数量。

#1


Just make sure you really understand where it's actually better to use StringBuilder. I'm not saying you don't know, but there are certainly plenty of people who would take code like this:

只要确保你真正了解使用StringBuilder实际上更好的地方。我不是说你不知道,但肯定有很多人会采用这样的代码:

String foo = "Your age is: " + getAge();

and turn it into:

并把它变成:

StringBuilder builder = new StringBuilder("Your age is: ");
builder.append(getAge());
String foo = builder.toString();

which is just a less readable version of the same thing. Often the naive solution is the best solution. Likewise some people worry about:

这只是同一件事的不太可读的版本。通常,天真的解决方案是最好的解决方案。同样有些人担心:

String x = "long line" + 
    "another long line";

when actually that concatenation is performed at compile-time.

实际上,在编译时执行连接。

As nsander's quite rightly said, find out if you've got a problem first...

正如纳桑德所说的那样,找出你是否先遇到问题......

#2


I'm pretty sure FindBugs can detect these. If not, it's still extremely useful to have around.

我很确定FindBugs可以检测到这些。如果没有,它仍然非常有用。

Edit: It can indeed find concatenations in a loop, which is the only time it really makes a difference.

编辑:它确实可以在循环中找到连接,这是唯一真正有所作为的时间。

#3


Why not use a profiler to find the "naive" string concatenations that actually matter? Only switch over to the more verbose StringBuffer if you actually need it.

为什么不使用分析器来查找实际上重要的“天真”字符串连接?如果确实需要,只切换到更详细的StringBuffer。

#4


Chances are you will make your performance worse and your code less readable. The compiler already makes this optimization, and unless you are in a loop, it will generally do a better job. Furthermore, in JDK 8 they may come out with StringUberBuilder, and all your code which uses StringBuilder will run slower, while the "+" concatenated strings will benefit from the new class.

您可能会使性能变差,代码可读性降低。编译器已经进行了这种优化,除非你处于循环中,否则它通常会做得更好。此外,在JDK 8中,它们可能会出现StringUberBuilder,并且所有使用StringBuilder的代码都会运行得更慢,而“+”连接字符串将从新类中受益。

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald Knuth

“我们应该忘记小的效率,大约97%的时间说:过早的优化是所有邪恶的根源。然而,我们不应该把这个关键的3%的机会放弃。“ - 唐纳德克努特

#5


IntelliJ can find these using "structural search". You search for "$a + $b" and set the characteristics of both $a and $b as type java.lang.String.

IntelliJ可以使用“结构搜索”找到它们。您搜索“$ a + $ b”并将$ a和$ b的特征设置为java.lang.String类型。

However, if you have IntelliJ, it likely has a built in inspection that will do a better job of finding what you want anyway.

但是,如果你有IntelliJ,它可能有一个内置的检查,可以更好地找到你想要的东西。

#6


I suggest using a profiler. This is really a performance question and if you can't make the code show up with reasonable test data there is unlikely to be any value in changing it.

我建议使用分析器。这实际上是一个性能问题,如果您无法使用合理的测试数据显示代码,则更改它的可能性不大。

#7


Jon Skeet (as always) and the others have already said all that is needed but I would really like to emphasize that maybe you are hunting for a non existing performance improvement...

Jon Skeet(一如既往)和其他人已经说过所有需要但我真的想强调的是,也许你正在寻找一种不存在的性能提升......

Take a look at this code:

看看这段代码:

public class StringBuilding {
  public static void main(String args[]) {
    String a = "The first part";
    String b = "The second part";
    String res = a+b;

    System.gc(); // Inserted to make it easier to see "before" and "after" below

    res = new StringBuilder().append(a).append(b).toString();
  }
}

If you compile it and disassemble it with javap, this is what you get.

如果你编译它并用javap反汇编它,这就是你得到的。

public static void main(java.lang.String[]);
  Code:
   0:   ldc     #2; //String The first part
   2:   astore_1
   3:   ldc     #3; //String The second part
   5:   astore_2
   6:   new     #4; //class java/lang/StringBuilder
   9:   dup
   10:  invokespecial   #5; //Method java/lang/StringBuilder."<init>":()V
   13:  aload_1
   14:  invokevirtual   #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   17:  aload_2
   18:  invokevirtual   #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   21:  invokevirtual   #7; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   24:  astore_3
   25:  invokestatic    #8; //Method java/lang/System.gc:()V
   28:  new     #4; //class java/lang/StringBuilder
   31:  dup
   32:  invokespecial   #5; //Method java/lang/StringBuilder."<init>":()V
   35:  aload_1
   36:  invokevirtual   #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   39:  aload_2
   40:  invokevirtual   #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   43:  invokevirtual   #7; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   46:  astore_3
   47:  return

As you can see, 6-21 are pretty much identical to 28-43. Not much of an optimization, right?

如你所见,6-21与28-43完全相同。没有太多的优化,对吧?

Edit: The loop issue is valid though...

编辑:循环问题虽然有效...

#8


Instead of searching for just a + search for "+ and +" those will find the vast majority probably. cases where you are concatenating multiple variables will be tougher.

而不是只搜索+ +搜索“+和+”,这些可能会找到绝大多数。你连接多个变量的情况将更加困难。

#9


If you have a huge code base you probably have lots of hotspots, which may or may not involve "+" concatenation. Just run your usual profiler, and fix the big ones, regardless of what kind of construct they are.

如果你有一个巨大的代码库,你可能有很多热点,可能会或可能不会涉及“+”串联。只需运行你通常的分析器,并修复大的分析器,无论它们是什么类型的构造。

It would be an odd approach to fix just one class of (potential) bottleneck, rather than fixing the actual bottlenecks.

解决一类(潜在)瓶颈而不是修复实际瓶颈将是一种奇怪的方法。

#10


With PMD, you can write rules with XPath or using a Java syntax. It might be worth investigating whether it can match the string concatenation operator—it certainly seems within the purview of static analysis. This is such a vague idea, I'm going to make this "community wiki"; if anyone else wants to elaborate (or create their own answer along these lines), please do!

使用PMD,您可以使用XPath或使用Java语法编写规则。可能值得研究它是否可以匹配字符串连接运算符 - 它肯定似乎属于静态分析的范围。这是一个模糊的想法,我将打造这个“社区维基”;如果有其他人想要详细说明(或沿着这些方针创建自己的答案),请做!

#11


Forget it - your JVM most likely does it already - see the JLS, 15.18.1.2 Optimization of String Concatenation:

忘掉它 - 你的JVM很可能已经这样做了 - 参见JLS,15.18.1.2字符串连接的优化:

An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.

实现可以选择在一个步骤中执行转换和连接,以避免创建然后丢弃中间String对象。为了提高重复字符串连接的性能,Java编译器可以使用StringBuffer类或类似技术来减少通过计算表达式创建的中间String对象的数量。