用Ruby构建字符串最快的方法是什么?

时间:2021-08-12 01:37:49

In Ternary operator, a person wanting to join ["foo", "bar", "baz"] with commas and an "and" cited The Ruby Cookbook as saying

在三元运算符中,一个人想要加入“foo”、“bar”、“baz”,并用逗号和“and”来表示

If efficiency is important to you, don't build a new string when you can append items onto an existing string. [And so on]... Use str << var1 << ' ' << var2 instead.

如果效率对您很重要,那么当您可以将项目附加到现有字符串时,不要构建新的字符串。(等等)……使用str << var1 << ' > < var2。

But the book was written in 2006.

但这本书是2006年写的。

Is using appending (ie <<) still the fastest way to build a large string given an array of smaller strings, in all major implementations of Ruby?

在Ruby的所有主要实现中,使用appending (ie << <)仍然是构建一个大字符串的最快方式吗?

2 个解决方案

#1


23  

Use Array#join when you can, and String#<< when you can't.

可以时使用数组#join,不能时使用字符串#<。

The problem with using String#+ is that it must create an intermediary (unwanted) string object, while String#<< mutates the original string. Here are the time results (in seconds) of joining 1,000 strings with ", " 1,000 times, via Array#join, String#+, and String#<<:

使用String#+的问题是,它必须创建一个中间的(不需要的)字符串对象,而String#< <改变原始字符串。这里是连接1,000个字符串的时间结果(以秒为单位),通过数组#连接、字符串#+和字符串#<<:< p>

Ruby 1.9.2p180      user     system      total        real
Array#join      0.320000   0.000000   0.320000 (  0.330224)
String#+ 1      7.730000   0.200000   7.930000 (  8.373900)
String#+ 2      4.670000   0.600000   5.270000 (  5.546633)
String#<< 1     1.260000   0.010000   1.270000 (  1.315991)
String#<< 2     1.600000   0.020000   1.620000 (  1.793415)

JRuby 1.6.1         user     system      total        real
Array#join      0.185000   0.000000   0.185000 (  0.185000)
String#+ 1      9.118000   0.000000   9.118000 (  9.118000)
String#+ 2      4.544000   0.000000   4.544000 (  4.544000)
String#<< 1     0.865000   0.000000   0.865000 (  0.866000)
String#<< 2     0.852000   0.000000   0.852000 (  0.852000)

Ruby 1.8.7p334      user     system      total        real
Array#join      0.290000   0.010000   0.300000 (  0.305367)
String#+ 1      7.620000   0.060000   7.680000 (  7.682265)
String#+ 2      4.820000   0.130000   4.950000 (  4.957258)
String#<< 1     1.290000   0.010000   1.300000 (  1.304764)
String#<< 2     1.350000   0.010000   1.360000 (  1.347226)

Rubinius (head)     user     system      total        real
Array#join      0.864054   0.008001   0.872055 (  0.870757)
String#+ 1      9.636602   0.076005   9.712607 (  9.714820)
String#+ 2      6.456403   0.064004   6.520407 (  6.521633)
String#<< 1     2.196138   0.016001   2.212139 (  2.212564)
String#<< 2     2.176136   0.012001   2.188137 (  2.186298)

Here's the benchmarking code:

这是基准测试代码:

WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000

require 'benchmark'
Benchmark.bmbm do |x|
  x.report("Array#join"){
    N.times{ s = WORDS.join(', ') }
  }
  x.report("String#+ 1"){
    N.times{
      s = WORDS.first
      WORDS[1..-1].each{ |w| s += ", "; s += w }
    }
  }
  x.report("String#+ 2"){
    N.times{
      s = WORDS.first
      WORDS[1..-1].each{ |w| s += ", " + w }
    }
  }
  x.report("String#<< 1"){
    N.times{
      s = WORDS.first.dup
      WORDS[1..-1].each{ |w| s << ", "; s << w }
    }
  }
  x.report("String#<< 2"){
    N.times{
      s = WORDS.first.dup
      WORDS[1..-1].each{ |w| s << ", " << w }
    }
  }
end

Results obtained on Ubuntu under RVM. Results from Ruby 1.9.2p180 from RubyInstaller on Windows are similar to the 1.9.2 shown above.

RVM下的Ubuntu结果。来自Windows上RubyInstaller的Ruby 1.9.2p180的结果与上面所示的1.9.2相似。

#2


3  

What if your source of string bits is not an array?

TLDR; even when your source of string bits is not a giant array, you are still much better off constructing an array first and using join. + is not as bad in 2.1.1 as 1.9.3, but it's still bad (for this use case). 1.9.3 is actually slightly faster at both array.join & <<

TLDR;即使字符串位的来源不是一个巨大的数组,也最好先构造一个数组并使用join。+在2.1.1中不如1.9.3糟糕,但它仍然糟糕(对于这个用例)。实际上,在两个数组中,1。9.3都稍微快一点。加入& < <


Old hands at benchmarking may have looked at @Phrogz answer and thought "but but but..." because the join benchmark doesn't have the array enumeration overhead that the others do. I was curious to see how much difference it made, so...

基准测试的老手可能已经看过了@Phrogz的答案,并认为“但是…”,因为join基准没有其他基准那样的数组枚举开销。我很好奇这有多大的不同,所以……

    WORDS = (1..1000).map{ rand(10000).to_s }
    N = 1000

    require 'benchmark'
    Benchmark.bmbm do |x|
      x.report("Array#join"){
        N.times{ s = WORDS.join(', ') }
      }
      x.report("Array#join 2"){
        N.times{
          arr = Array.new(WORDS.length)
          arr[0] = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; }
          s = WORDS.join(', ')
        }
      }
      x.report("String#+ 1"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; s += ", "; s += w }
        }
      }
      x.report("String#+ 2"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; s += ", " + w }
        }
      }
      x.report("String#<< 1"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| arr << w; s << ", "; s << w }
        }
      }
      x.report("String#<< 2"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| arr << w; s << ", " << w }
        }
      }
      x.report("String#<< 2 A"){
        N.times{
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| s << ", " << w }
        }
      }
    end

small words, ruby 2.1.1

                        user     system      total        real
    Array#join      0.130000   0.000000   0.130000 (  0.128281)
    Array#join 2    0.220000   0.000000   0.220000 (  0.219588)
    String#+ 1      1.720000   0.770000   2.490000 (  2.478555)
    String#+ 2      1.040000   0.370000   1.410000 (  1.407190)
    String#<< 1     0.370000   0.000000   0.370000 (  0.371125)
    String#<< 2     0.360000   0.000000   0.360000 (  0.360161)
    String#<< 2 A   0.310000   0.000000   0.310000 (  0.318130)

small words, ruby 2.1.1

                        user     system      total        real
    Array#join      0.090000   0.000000   0.090000 (  0.092072)
    Array#join 2    0.180000   0.000000   0.180000 (  0.180423)
    String#+ 1      3.400000   0.750000   4.150000 (  4.149934)
    String#+ 2      1.740000   0.370000   2.110000 (  2.122511)
    String#<< 1     0.360000   0.000000   0.360000 (  0.359707)
    String#<< 2     0.340000   0.000000   0.340000 (  0.343233)
    String#<< 2 A   0.300000   0.000000   0.300000 (  0.297420)

I was also curious how the benchmark would be affected by string bits that are (sometimes) longer than 23 characters so I reran with:

我还想知道,如果字符串位(有时)超过23个字符,那么基准会受到什么影响,因此我重新运行如下代码:

    WORDS = (1..1000).map{ rand(100000).to_s * (rand(15)+1) }

as I expected, the impact on + was quite significant, but I was pleasantly surprised that it had very little impact on join or <<

正如我所预期的那样,对+的影响是相当显著的,但我惊喜地发现,它对join或<< <

words often longer than 23 chars, ruby 2.1.1

                        user     system      total        real
    Array#join      0.150000   0.000000   0.150000 (  0.152846)
    Array#join 2    0.230000   0.010000   0.240000 (  0.231272)
    String#+ 1      7.450000   5.490000  12.940000 ( 12.936776)
    String#+ 2      4.200000   2.590000   6.790000 (  6.791125)
    String#<< 1     0.400000   0.000000   0.400000 (  0.399452)
    String#<< 2     0.380000   0.010000   0.390000 (  0.389791)
    String#<< 2 A   0.340000   0.000000   0.340000 (  0.341099)

words often longer than 23 chars, ruby 1.9.3

                        user     system      total        real
    Array#join      0.130000   0.010000   0.140000 (  0.132957)
    Array#join 2    0.220000   0.000000   0.220000 (  0.220181)
    String#+ 1     20.060000   5.230000  25.290000 ( 25.293366)
    String#+ 2      9.750000   2.670000  12.420000 ( 12.425229)
    String#<< 1     0.390000   0.000000   0.390000 (  0.397733)
    String#<< 2     0.390000   0.000000   0.390000 (  0.390540)
    String#<< 2 A   0.330000   0.000000   0.330000 (  0.333791)

#1


23  

Use Array#join when you can, and String#<< when you can't.

可以时使用数组#join,不能时使用字符串#<。

The problem with using String#+ is that it must create an intermediary (unwanted) string object, while String#<< mutates the original string. Here are the time results (in seconds) of joining 1,000 strings with ", " 1,000 times, via Array#join, String#+, and String#<<:

使用String#+的问题是,它必须创建一个中间的(不需要的)字符串对象,而String#< <改变原始字符串。这里是连接1,000个字符串的时间结果(以秒为单位),通过数组#连接、字符串#+和字符串#<<:< p>

Ruby 1.9.2p180      user     system      total        real
Array#join      0.320000   0.000000   0.320000 (  0.330224)
String#+ 1      7.730000   0.200000   7.930000 (  8.373900)
String#+ 2      4.670000   0.600000   5.270000 (  5.546633)
String#<< 1     1.260000   0.010000   1.270000 (  1.315991)
String#<< 2     1.600000   0.020000   1.620000 (  1.793415)

JRuby 1.6.1         user     system      total        real
Array#join      0.185000   0.000000   0.185000 (  0.185000)
String#+ 1      9.118000   0.000000   9.118000 (  9.118000)
String#+ 2      4.544000   0.000000   4.544000 (  4.544000)
String#<< 1     0.865000   0.000000   0.865000 (  0.866000)
String#<< 2     0.852000   0.000000   0.852000 (  0.852000)

Ruby 1.8.7p334      user     system      total        real
Array#join      0.290000   0.010000   0.300000 (  0.305367)
String#+ 1      7.620000   0.060000   7.680000 (  7.682265)
String#+ 2      4.820000   0.130000   4.950000 (  4.957258)
String#<< 1     1.290000   0.010000   1.300000 (  1.304764)
String#<< 2     1.350000   0.010000   1.360000 (  1.347226)

Rubinius (head)     user     system      total        real
Array#join      0.864054   0.008001   0.872055 (  0.870757)
String#+ 1      9.636602   0.076005   9.712607 (  9.714820)
String#+ 2      6.456403   0.064004   6.520407 (  6.521633)
String#<< 1     2.196138   0.016001   2.212139 (  2.212564)
String#<< 2     2.176136   0.012001   2.188137 (  2.186298)

Here's the benchmarking code:

这是基准测试代码:

WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000

require 'benchmark'
Benchmark.bmbm do |x|
  x.report("Array#join"){
    N.times{ s = WORDS.join(', ') }
  }
  x.report("String#+ 1"){
    N.times{
      s = WORDS.first
      WORDS[1..-1].each{ |w| s += ", "; s += w }
    }
  }
  x.report("String#+ 2"){
    N.times{
      s = WORDS.first
      WORDS[1..-1].each{ |w| s += ", " + w }
    }
  }
  x.report("String#<< 1"){
    N.times{
      s = WORDS.first.dup
      WORDS[1..-1].each{ |w| s << ", "; s << w }
    }
  }
  x.report("String#<< 2"){
    N.times{
      s = WORDS.first.dup
      WORDS[1..-1].each{ |w| s << ", " << w }
    }
  }
end

Results obtained on Ubuntu under RVM. Results from Ruby 1.9.2p180 from RubyInstaller on Windows are similar to the 1.9.2 shown above.

RVM下的Ubuntu结果。来自Windows上RubyInstaller的Ruby 1.9.2p180的结果与上面所示的1.9.2相似。

#2


3  

What if your source of string bits is not an array?

TLDR; even when your source of string bits is not a giant array, you are still much better off constructing an array first and using join. + is not as bad in 2.1.1 as 1.9.3, but it's still bad (for this use case). 1.9.3 is actually slightly faster at both array.join & <<

TLDR;即使字符串位的来源不是一个巨大的数组,也最好先构造一个数组并使用join。+在2.1.1中不如1.9.3糟糕,但它仍然糟糕(对于这个用例)。实际上,在两个数组中,1。9.3都稍微快一点。加入& < <


Old hands at benchmarking may have looked at @Phrogz answer and thought "but but but..." because the join benchmark doesn't have the array enumeration overhead that the others do. I was curious to see how much difference it made, so...

基准测试的老手可能已经看过了@Phrogz的答案,并认为“但是…”,因为join基准没有其他基准那样的数组枚举开销。我很好奇这有多大的不同,所以……

    WORDS = (1..1000).map{ rand(10000).to_s }
    N = 1000

    require 'benchmark'
    Benchmark.bmbm do |x|
      x.report("Array#join"){
        N.times{ s = WORDS.join(', ') }
      }
      x.report("Array#join 2"){
        N.times{
          arr = Array.new(WORDS.length)
          arr[0] = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; }
          s = WORDS.join(', ')
        }
      }
      x.report("String#+ 1"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; s += ", "; s += w }
        }
      }
      x.report("String#+ 2"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; s += ", " + w }
        }
      }
      x.report("String#<< 1"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| arr << w; s << ", "; s << w }
        }
      }
      x.report("String#<< 2"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| arr << w; s << ", " << w }
        }
      }
      x.report("String#<< 2 A"){
        N.times{
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| s << ", " << w }
        }
      }
    end

small words, ruby 2.1.1

                        user     system      total        real
    Array#join      0.130000   0.000000   0.130000 (  0.128281)
    Array#join 2    0.220000   0.000000   0.220000 (  0.219588)
    String#+ 1      1.720000   0.770000   2.490000 (  2.478555)
    String#+ 2      1.040000   0.370000   1.410000 (  1.407190)
    String#<< 1     0.370000   0.000000   0.370000 (  0.371125)
    String#<< 2     0.360000   0.000000   0.360000 (  0.360161)
    String#<< 2 A   0.310000   0.000000   0.310000 (  0.318130)

small words, ruby 2.1.1

                        user     system      total        real
    Array#join      0.090000   0.000000   0.090000 (  0.092072)
    Array#join 2    0.180000   0.000000   0.180000 (  0.180423)
    String#+ 1      3.400000   0.750000   4.150000 (  4.149934)
    String#+ 2      1.740000   0.370000   2.110000 (  2.122511)
    String#<< 1     0.360000   0.000000   0.360000 (  0.359707)
    String#<< 2     0.340000   0.000000   0.340000 (  0.343233)
    String#<< 2 A   0.300000   0.000000   0.300000 (  0.297420)

I was also curious how the benchmark would be affected by string bits that are (sometimes) longer than 23 characters so I reran with:

我还想知道,如果字符串位(有时)超过23个字符,那么基准会受到什么影响,因此我重新运行如下代码:

    WORDS = (1..1000).map{ rand(100000).to_s * (rand(15)+1) }

as I expected, the impact on + was quite significant, but I was pleasantly surprised that it had very little impact on join or <<

正如我所预期的那样,对+的影响是相当显著的,但我惊喜地发现,它对join或<< <

words often longer than 23 chars, ruby 2.1.1

                        user     system      total        real
    Array#join      0.150000   0.000000   0.150000 (  0.152846)
    Array#join 2    0.230000   0.010000   0.240000 (  0.231272)
    String#+ 1      7.450000   5.490000  12.940000 ( 12.936776)
    String#+ 2      4.200000   2.590000   6.790000 (  6.791125)
    String#<< 1     0.400000   0.000000   0.400000 (  0.399452)
    String#<< 2     0.380000   0.010000   0.390000 (  0.389791)
    String#<< 2 A   0.340000   0.000000   0.340000 (  0.341099)

words often longer than 23 chars, ruby 1.9.3

                        user     system      total        real
    Array#join      0.130000   0.010000   0.140000 (  0.132957)
    Array#join 2    0.220000   0.000000   0.220000 (  0.220181)
    String#+ 1     20.060000   5.230000  25.290000 ( 25.293366)
    String#+ 2      9.750000   2.670000  12.420000 ( 12.425229)
    String#<< 1     0.390000   0.000000   0.390000 (  0.397733)
    String#<< 2     0.390000   0.000000   0.390000 (  0.390540)
    String#<< 2 A   0.330000   0.000000   0.330000 (  0.333791)