用Ruby构建字符串最快的方法是什么?

In Ternary operator, a person wanting to join ["foo", "bar", "baz"] with commas and an "and" cited The Ruby Cookbook as saying

在三元运算符中，一个人想要加入“foo”、“bar”、“baz”，并用逗号和“and”来表示

If efficiency is important to you, don't build a new string when you can append items onto an existing string. [And so on]... Use str << var1 << ' ' << var2 instead.

如果效率对您很重要，那么当您可以将项目附加到现有字符串时，不要构建新的字符串。(等等)……使用str << var1 << ' > < var2。

But the book was written in 2006.

但这本书是2006年写的。

Is using appending (ie <<) still the fastest way to build a large string given an array of smaller strings, in all major implementations of Ruby?

在Ruby的所有主要实现中，使用appending (ie << <)仍然是构建一个大字符串的最快方式吗?

2 个解决方案

#1

Use Array#join when you can, and String#<< when you can't.

可以时使用数组#join，不能时使用字符串#<。

The problem with using String#+ is that it must create an intermediary (unwanted) string object, while String#<< mutates the original string. Here are the time results (in seconds) of joining 1,000 strings with ", " 1,000 times, via Array#join, String#+, and String#<<:

使用String#+的问题是，它必须创建一个中间的(不需要的)字符串对象，而String#< <改变原始字符串。这里是连接1,000个字符串的时间结果(以秒为单位)，通过数组#连接、字符串#+和字符串#<<:< p>

Ruby 1.9.2p180      user     system      total        real
Array#join      0.320000   0.000000   0.320000 (  0.330224)
String#+ 1      7.730000   0.200000   7.930000 (  8.373900)
String#+ 2      4.670000   0.600000   5.270000 (  5.546633)
String#<< 1     1.260000   0.010000   1.270000 (  1.315991)
String#<< 2     1.600000   0.020000   1.620000 (  1.793415)

JRuby 1.6.1         user     system      total        real
Array#join      0.185000   0.000000   0.185000 (  0.185000)
String#+ 1      9.118000   0.000000   9.118000 (  9.118000)
String#+ 2      4.544000   0.000000   4.544000 (  4.544000)
String#<< 1     0.865000   0.000000   0.865000 (  0.866000)
String#<< 2     0.852000   0.000000   0.852000 (  0.852000)

Ruby 1.8.7p334      user     system      total        real
Array#join      0.290000   0.010000   0.300000 (  0.305367)
String#+ 1      7.620000   0.060000   7.680000 (  7.682265)
String#+ 2      4.820000   0.130000   4.950000 (  4.957258)
String#<< 1     1.290000   0.010000   1.300000 (  1.304764)
String#<< 2     1.350000   0.010000   1.360000 (  1.347226)

Rubinius (head)     user     system      total        real
Array#join      0.864054   0.008001   0.872055 (  0.870757)
String#+ 1      9.636602   0.076005   9.712607 (  9.714820)
String#+ 2      6.456403   0.064004   6.520407 (  6.521633)
String#<< 1     2.196138   0.016001   2.212139 (  2.212564)
String#<< 2     2.176136   0.012001   2.188137 (  2.186298)

Here's the benchmarking code:

这是基准测试代码:

WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000

require 'benchmark'
Benchmark.bmbm do |x|
  x.report("Array#join"){
    N.times{ s = WORDS.join(', ') }
  }
  x.report("String#+ 1"){
    N.times{
      s = WORDS.first
      WORDS[1..-1].each{ |w| s += ", "; s += w }
    }
  }
  x.report("String#+ 2"){
    N.times{
      s = WORDS.first
      WORDS[1..-1].each{ |w| s += ", " + w }
    }
  }
  x.report("String#<< 1"){
    N.times{
      s = WORDS.first.dup
      WORDS[1..-1].each{ |w| s << ", "; s << w }
    }
  }
  x.report("String#<< 2"){
    N.times{
      s = WORDS.first.dup
      WORDS[1..-1].each{ |w| s << ", " << w }
    }
  }
end

Results obtained on Ubuntu under RVM. Results from Ruby 1.9.2p180 from RubyInstaller on Windows are similar to the 1.9.2 shown above.

RVM下的Ubuntu结果。来自Windows上RubyInstaller的Ruby 1.9.2p180的结果与上面所示的1.9.2相似。

#2

What if your source of string bits is not an array?

TLDR; even when your source of string bits is not a giant array, you are still much better off constructing an array first and using join. + is not as bad in 2.1.1 as 1.9.3, but it's still bad (for this use case). 1.9.3 is actually slightly faster at both array.join & <<

TLDR;即使字符串位的来源不是一个巨大的数组，也最好先构造一个数组并使用join。+在2.1.1中不如1.9.3糟糕，但它仍然糟糕(对于这个用例)。实际上，在两个数组中，1。9.3都稍微快一点。加入& < <

Old hands at benchmarking may have looked at @Phrogz answer and thought "but but but..." because the join benchmark doesn't have the array enumeration overhead that the others do. I was curious to see how much difference it made, so...

基准测试的老手可能已经看过了@Phrogz的答案，并认为“但是…”，因为join基准没有其他基准那样的数组枚举开销。我很好奇这有多大的不同，所以……

    WORDS = (1..1000).map{ rand(10000).to_s }
    N = 1000

    require 'benchmark'
    Benchmark.bmbm do |x|
      x.report("Array#join"){
        N.times{ s = WORDS.join(', ') }
      }
      x.report("Array#join 2"){
        N.times{
          arr = Array.new(WORDS.length)
          arr[0] = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; }
          s = WORDS.join(', ')
        }
      }
      x.report("String#+ 1"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; s += ", "; s += w }
        }
      }
      x.report("String#+ 2"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; s += ", " + w }
        }
      }
      x.report("String#<< 1"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| arr << w; s << ", "; s << w }
        }
      }
      x.report("String#<< 2"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| arr << w; s << ", " << w }
        }
      }
      x.report("String#<< 2 A"){
        N.times{
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| s << ", " << w }
        }
      }
    end

small words, ruby 2.1.1

                        user     system      total        real
    Array#join      0.130000   0.000000   0.130000 (  0.128281)
    Array#join 2    0.220000   0.000000   0.220000 (  0.219588)
    String#+ 1      1.720000   0.770000   2.490000 (  2.478555)
    String#+ 2      1.040000   0.370000   1.410000 (  1.407190)
    String#<< 1     0.370000   0.000000   0.370000 (  0.371125)
    String#<< 2     0.360000   0.000000   0.360000 (  0.360161)
    String#<< 2 A   0.310000   0.000000   0.310000 (  0.318130)

small words, ruby 2.1.1

                        user     system      total        real
    Array#join      0.090000   0.000000   0.090000 (  0.092072)
    Array#join 2    0.180000   0.000000   0.180000 (  0.180423)
    String#+ 1      3.400000   0.750000   4.150000 (  4.149934)
    String#+ 2      1.740000   0.370000   2.110000 (  2.122511)
    String#<< 1     0.360000   0.000000   0.360000 (  0.359707)
    String#<< 2     0.340000   0.000000   0.340000 (  0.343233)
    String#<< 2 A   0.300000   0.000000   0.300000 (  0.297420)

I was also curious how the benchmark would be affected by string bits that are (sometimes) longer than 23 characters so I reran with:

我还想知道，如果字符串位(有时)超过23个字符，那么基准会受到什么影响，因此我重新运行如下代码:

    WORDS = (1..1000).map{ rand(100000).to_s * (rand(15)+1) }

as I expected, the impact on + was quite significant, but I was pleasantly surprised that it had very little impact on join or <<

正如我所预期的那样，对+的影响是相当显著的，但我惊喜地发现，它对join或<< <

words often longer than 23 chars, ruby 2.1.1

                        user     system      total        real
    Array#join      0.150000   0.000000   0.150000 (  0.152846)
    Array#join 2    0.230000   0.010000   0.240000 (  0.231272)
    String#+ 1      7.450000   5.490000  12.940000 ( 12.936776)
    String#+ 2      4.200000   2.590000   6.790000 (  6.791125)
    String#<< 1     0.400000   0.000000   0.400000 (  0.399452)
    String#<< 2     0.380000   0.010000   0.390000 (  0.389791)
    String#<< 2 A   0.340000   0.000000   0.340000 (  0.341099)

words often longer than 23 chars, ruby 1.9.3

                        user     system      total        real
    Array#join      0.130000   0.010000   0.140000 (  0.132957)
    Array#join 2    0.220000   0.000000   0.220000 (  0.220181)
    String#+ 1     20.060000   5.230000  25.290000 ( 25.293366)
    String#+ 2      9.750000   2.670000  12.420000 ( 12.425229)
    String#<< 1     0.390000   0.000000   0.390000 (  0.397733)
    String#<< 2     0.390000   0.000000   0.390000 (  0.390540)
    String#<< 2 A   0.330000   0.000000   0.330000 (  0.333791)

#1

Use Array#join when you can, and String#<< when you can't.

可以时使用数组#join，不能时使用字符串#<。

Ruby 1.9.2p180      user     system      total        real
Array#join      0.320000   0.000000   0.320000 (  0.330224)
String#+ 1      7.730000   0.200000   7.930000 (  8.373900)
String#+ 2      4.670000   0.600000   5.270000 (  5.546633)
String#<< 1     1.260000   0.010000   1.270000 (  1.315991)
String#<< 2     1.600000   0.020000   1.620000 (  1.793415)

JRuby 1.6.1         user     system      total        real
Array#join      0.185000   0.000000   0.185000 (  0.185000)
String#+ 1      9.118000   0.000000   9.118000 (  9.118000)
String#+ 2      4.544000   0.000000   4.544000 (  4.544000)
String#<< 1     0.865000   0.000000   0.865000 (  0.866000)
String#<< 2     0.852000   0.000000   0.852000 (  0.852000)

Ruby 1.8.7p334      user     system      total        real
Array#join      0.290000   0.010000   0.300000 (  0.305367)
String#+ 1      7.620000   0.060000   7.680000 (  7.682265)
String#+ 2      4.820000   0.130000   4.950000 (  4.957258)
String#<< 1     1.290000   0.010000   1.300000 (  1.304764)
String#<< 2     1.350000   0.010000   1.360000 (  1.347226)

Rubinius (head)     user     system      total        real
Array#join      0.864054   0.008001   0.872055 (  0.870757)
String#+ 1      9.636602   0.076005   9.712607 (  9.714820)
String#+ 2      6.456403   0.064004   6.520407 (  6.521633)
String#<< 1     2.196138   0.016001   2.212139 (  2.212564)
String#<< 2     2.176136   0.012001   2.188137 (  2.186298)

Here's the benchmarking code:

这是基准测试代码:

WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000

require 'benchmark'
Benchmark.bmbm do |x|
  x.report("Array#join"){
    N.times{ s = WORDS.join(', ') }
  }
  x.report("String#+ 1"){
    N.times{
      s = WORDS.first
      WORDS[1..-1].each{ |w| s += ", "; s += w }
    }
  }
  x.report("String#+ 2"){
    N.times{
      s = WORDS.first
      WORDS[1..-1].each{ |w| s += ", " + w }
    }
  }
  x.report("String#<< 1"){
    N.times{
      s = WORDS.first.dup
      WORDS[1..-1].each{ |w| s << ", "; s << w }
    }
  }
  x.report("String#<< 2"){
    N.times{
      s = WORDS.first.dup
      WORDS[1..-1].each{ |w| s << ", " << w }
    }
  }
end

Results obtained on Ubuntu under RVM. Results from Ruby 1.9.2p180 from RubyInstaller on Windows are similar to the 1.9.2 shown above.

RVM下的Ubuntu结果。来自Windows上RubyInstaller的Ruby 1.9.2p180的结果与上面所示的1.9.2相似。

#2

What if your source of string bits is not an array?

基准测试的老手可能已经看过了@Phrogz的答案，并认为“但是…”，因为join基准没有其他基准那样的数组枚举开销。我很好奇这有多大的不同，所以……

    WORDS = (1..1000).map{ rand(10000).to_s }
    N = 1000

    require 'benchmark'
    Benchmark.bmbm do |x|
      x.report("Array#join"){
        N.times{ s = WORDS.join(', ') }
      }
      x.report("Array#join 2"){
        N.times{
          arr = Array.new(WORDS.length)
          arr[0] = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; }
          s = WORDS.join(', ')
        }
      }
      x.report("String#+ 1"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; s += ", "; s += w }
        }
      }
      x.report("String#+ 2"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first
          WORDS[1..-1].each{ |w| arr << w; s += ", " + w }
        }
      }
      x.report("String#<< 1"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| arr << w; s << ", "; s << w }
        }
      }
      x.report("String#<< 2"){
        N.times{
          arr = Array.new(WORDS.length)
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| arr << w; s << ", " << w }
        }
      }
      x.report("String#<< 2 A"){
        N.times{
          s = WORDS.first.dup
          WORDS[1..-1].each{ |w| s << ", " << w }
        }
      }
    end

small words, ruby 2.1.1

                        user     system      total        real
    Array#join      0.130000   0.000000   0.130000 (  0.128281)
    Array#join 2    0.220000   0.000000   0.220000 (  0.219588)
    String#+ 1      1.720000   0.770000   2.490000 (  2.478555)
    String#+ 2      1.040000   0.370000   1.410000 (  1.407190)
    String#<< 1     0.370000   0.000000   0.370000 (  0.371125)
    String#<< 2     0.360000   0.000000   0.360000 (  0.360161)
    String#<< 2 A   0.310000   0.000000   0.310000 (  0.318130)

small words, ruby 2.1.1

                        user     system      total        real
    Array#join      0.090000   0.000000   0.090000 (  0.092072)
    Array#join 2    0.180000   0.000000   0.180000 (  0.180423)
    String#+ 1      3.400000   0.750000   4.150000 (  4.149934)
    String#+ 2      1.740000   0.370000   2.110000 (  2.122511)
    String#<< 1     0.360000   0.000000   0.360000 (  0.359707)
    String#<< 2     0.340000   0.000000   0.340000 (  0.343233)
    String#<< 2 A   0.300000   0.000000   0.300000 (  0.297420)

I was also curious how the benchmark would be affected by string bits that are (sometimes) longer than 23 characters so I reran with:

我还想知道，如果字符串位(有时)超过23个字符，那么基准会受到什么影响，因此我重新运行如下代码:

    WORDS = (1..1000).map{ rand(100000).to_s * (rand(15)+1) }

as I expected, the impact on + was quite significant, but I was pleasantly surprised that it had very little impact on join or <<

正如我所预期的那样，对+的影响是相当显著的，但我惊喜地发现，它对join或<< <

words often longer than 23 chars, ruby 2.1.1

                        user     system      total        real
    Array#join      0.150000   0.000000   0.150000 (  0.152846)
    Array#join 2    0.230000   0.010000   0.240000 (  0.231272)
    String#+ 1      7.450000   5.490000  12.940000 ( 12.936776)
    String#+ 2      4.200000   2.590000   6.790000 (  6.791125)
    String#<< 1     0.400000   0.000000   0.400000 (  0.399452)
    String#<< 2     0.380000   0.010000   0.390000 (  0.389791)
    String#<< 2 A   0.340000   0.000000   0.340000 (  0.341099)

words often longer than 23 chars, ruby 1.9.3

                        user     system      total        real
    Array#join      0.130000   0.010000   0.140000 (  0.132957)
    Array#join 2    0.220000   0.000000   0.220000 (  0.220181)
    String#+ 1     20.060000   5.230000  25.290000 ( 25.293366)
    String#+ 2      9.750000   2.670000  12.420000 ( 12.425229)
    String#<< 1     0.390000   0.000000   0.390000 (  0.397733)
    String#<< 2     0.390000   0.000000   0.390000 (  0.390540)
    String#<< 2 A   0.330000   0.000000   0.330000 (  0.333791)

秒客网

用Ruby构建字符串最快的方法是什么?

2 个解决方案

#1

#2

What if your source of string bits is not an array?

small words, ruby 2.1.1

small words, ruby 2.1.1

words often longer than 23 chars, ruby 2.1.1

words often longer than 23 chars, ruby 1.9.3

#1

#2

What if your source of string bits is not an array?

small words, ruby 2.1.1

small words, ruby 2.1.1

words often longer than 23 chars, ruby 2.1.1

words often longer than 23 chars, ruby 1.9.3

相关文章