替换字符串中出现的所有子字符串 - 这在Java中更有效?

时间:2021-05-25 19:20:29

I know of two ways of replacing all occurrences of substring in a string.

我知道有两种方法可以替换字符串中所有出现的子字符串。

The regex way (assuming "substring-to-be-replaced" doesn't include regex special chars):

正则表达式方式(假设“要替换的子字符串”不包括正则表达式特殊字符):

String regex = "substring-to-be-replaced" + "+";
Pattern scriptPattern = Pattern.compile(regex);
Matcher matcher = scriptPattern.matcher(originalstring);
newstring = matcher.replaceAll("replacement-substring");

The String.replace() way:

String.replace()方式:

newstring = originalstring.replace("substring-to-be-replaced", "replacement-substring");

Which of the two is more efficient (and why)?

哪两个更有效(以及为什么)?

Are there more efficient ways than the above described two?

有没有比上述两种更有效的方法?

5 个解决方案

#1


12  

String.replace() uses regex underneath.

String.replace()使用下面的正则表达式。

public String replace(CharSequence target, CharSequence replacement) {
      return Pattern.compile(target.toString(), Pattern.LITERAL)
             .matcher(this ).replaceAll(
               Matcher.quoteReplacement(replacement.toString()));
  }

Are there more efficient ways than the above described two?

有没有比上述两种更有效的方法?

There are given that you operate on an implementation backed e.g., by an array, rather than the immutable String class (since string.replace creates a new string on each invocation). See for instance StringBuilder.replace().

您可以使用由数组支持的实现,而不是不可变的String类(因为string.replace在每次调用时都会创建一个新字符串)。请参阅StringBuilder.replace()。

Compiling a regex incurs quite alot of overhead which is clear when observing the Pattern source code. Luckily, Apache offers an alternative approach in StringUtils.replace() which according to the source code (line #3732) is quite efficient.

编译正则表达式会产生很多开销,这在观察Pattern源代码时很明显。幸运的是,Apache在StringUtils.replace()中提供了一种替代方法,根据源代码(第3732行)非常有效。

#2


2  

Here's the source code from openjdk:

这是openjdk的源代码:

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
       this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

#3


1  

Not having done any profiling or benchmarking, I'd say it's a fairly safe bet that if you don't need regex magic, then the overhead of the regular expression parser (which you'll get no matter what, in terms of memory as well as CPU usage) costs you a lot more than you can possibly gain on the other end.

没有进行任何分析或基准测试,我会说这是一个相当安全的赌注,如果你不需要正则表达式魔法,那么正则表达式解析器的开销(无论如何,你将获得内存方面的内容)以及CPU使用率)比你在另一端可能获得的成本高得多。

#4


1  

Instead of using strings, which are immutable, use char arrays or some other mutable type (such as StringBuffer or StringBuilder).

而不是使用不可变的字符串,使用char数组或其他一些可变类型(如StringBuffer或StringBuilder)。

#5


0  

Shouldn't you compare replaceAll 2 times? However, for a single invocation it will hardly be measurable. And will you do millions of comparisions?

你不应该比较replaceAll 2次吗?但是,对于单个调用,它几乎不可测量。你会做数百万次比较吗?

Then I would expect 'compile' to be faster, but only, if you don't use a constant String without any pattern-rules.

然后我希望'compile'更快,但只有,如果你不使用没有任何模式规则的常量字符串。

Where is the problem in writing a micro benchmark? Or look up the source.

编写微基准的问题在哪里?或者查看源代码。

#1


12  

String.replace() uses regex underneath.

String.replace()使用下面的正则表达式。

public String replace(CharSequence target, CharSequence replacement) {
      return Pattern.compile(target.toString(), Pattern.LITERAL)
             .matcher(this ).replaceAll(
               Matcher.quoteReplacement(replacement.toString()));
  }

Are there more efficient ways than the above described two?

有没有比上述两种更有效的方法?

There are given that you operate on an implementation backed e.g., by an array, rather than the immutable String class (since string.replace creates a new string on each invocation). See for instance StringBuilder.replace().

您可以使用由数组支持的实现,而不是不可变的String类(因为string.replace在每次调用时都会创建一个新字符串)。请参阅StringBuilder.replace()。

Compiling a regex incurs quite alot of overhead which is clear when observing the Pattern source code. Luckily, Apache offers an alternative approach in StringUtils.replace() which according to the source code (line #3732) is quite efficient.

编译正则表达式会产生很多开销,这在观察Pattern源代码时很明显。幸运的是,Apache在StringUtils.replace()中提供了一种替代方法,根据源代码(第3732行)非常有效。

#2


2  

Here's the source code from openjdk:

这是openjdk的源代码:

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
       this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

#3


1  

Not having done any profiling or benchmarking, I'd say it's a fairly safe bet that if you don't need regex magic, then the overhead of the regular expression parser (which you'll get no matter what, in terms of memory as well as CPU usage) costs you a lot more than you can possibly gain on the other end.

没有进行任何分析或基准测试,我会说这是一个相当安全的赌注,如果你不需要正则表达式魔法,那么正则表达式解析器的开销(无论如何,你将获得内存方面的内容)以及CPU使用率)比你在另一端可能获得的成本高得多。

#4


1  

Instead of using strings, which are immutable, use char arrays or some other mutable type (such as StringBuffer or StringBuilder).

而不是使用不可变的字符串,使用char数组或其他一些可变类型(如StringBuffer或StringBuilder)。

#5


0  

Shouldn't you compare replaceAll 2 times? However, for a single invocation it will hardly be measurable. And will you do millions of comparisions?

你不应该比较replaceAll 2次吗?但是,对于单个调用,它几乎不可测量。你会做数百万次比较吗?

Then I would expect 'compile' to be faster, but only, if you don't use a constant String without any pattern-rules.

然后我希望'compile'更快,但只有,如果你不使用没有任何模式规则的常量字符串。

Where is the problem in writing a micro benchmark? Or look up the source.

编写微基准的问题在哪里?或者查看源代码。