I have a string:
我有一个字符串:
"foo (2 spaces) bar (3 spaces) baaar (6 spaces) fooo"
“foo(2位)bar(3位)baaar(6位)fooo”
How do I remove repetitious spaces in it so there should be no more than one space between any two words?
如何删除重复的空格,使两个单词之间的空格不超过一个?
7 个解决方案
#1
39
>> str = "foo bar bar baaar"
=> "foo bar bar baaar"
>> str.split.join(" ")
=> "foo bar bar baaar"
>>
#2
76
String#squeeze has an optional parameter to specify characters to squeeze.
字符串#挤压有一个可选参数来指定要挤压的字符。
irb> "asd asd asd asd".squeeze(" ")
=> "asd asd asd asd"
#3
23
Updated benchmark from @zetetic's answer:
更新的基准从@zetetic的回答:
require 'benchmark'
include Benchmark
string = "foo bar bar baaar"
n = 1_000_000
bm(12) do |x|
x.report("gsub ") { n.times { string.gsub(/\s+/, " ") } }
x.report("squeeze(' ')") { n.times { string.squeeze(' ') } }
x.report("split/join") { n.times { string.split.join(" ") } }
end
Which results in these values when run on my desktop after running it twice:
当在我的桌面上运行两次后,就会产生这些值:
ruby test.rb; ruby test.rb
user system total real
gsub 6.060000 0.000000 6.060000 ( 6.061435)
squeeze(' ') 4.200000 0.010000 4.210000 ( 4.201619)
split/join 3.620000 0.000000 3.620000 ( 3.614499)
user system total real
gsub 6.020000 0.000000 6.020000 ( 6.023391)
squeeze(' ') 4.150000 0.010000 4.160000 ( 4.153204)
split/join 3.590000 0.000000 3.590000 ( 3.587590)
The issue is that squeeze
removes any repeated character, which results in a different output string and doesn't meet the OP's need. squeeze(' ')
does meet the needs, but slows down its operation.
问题是,挤压会删除任何重复的字符,这会导致输出字符串不同,并不能满足OP的需要。挤压(')确实能满足需要,但会减慢它的运行速度。
string.squeeze
=> "fo bar bar bar"
I was thinking about how the split.join
could be faster and it didn't seem like that would hold up in large strings, so I adjusted the benchmark to see what effect long strings would have:
我在想怎么分开的。join可能会更快,而且看起来不会在大字符串中保持住,所以我调整了基准,看看长字符串会有什么影响:
require 'benchmark'
include Benchmark
string = (["foo bar bar baaar"] * 10_000).join
puts "String length: #{ string.length } characters"
n = 100
bm(12) do |x|
x.report("gsub ") { n.times { string.gsub(/\s+/, " ") } }
x.report("squeeze(' ')") { n.times { string.squeeze(' ') } }
x.report("split/join") { n.times { string.split.join(" ") } }
end
ruby test.rb ; ruby test.rb
String length: 250000 characters
user system total real
gsub 2.570000 0.010000 2.580000 ( 2.576149)
squeeze(' ') 0.140000 0.000000 0.140000 ( 0.150298)
split/join 1.400000 0.010000 1.410000 ( 1.396078)
String length: 250000 characters
user system total real
gsub 2.570000 0.010000 2.580000 ( 2.573802)
squeeze(' ') 0.140000 0.000000 0.140000 ( 0.150384)
split/join 1.400000 0.010000 1.410000 ( 1.397748)
So, long lines do make a big difference.
所以,长长的队伍确实会有很大的不同。
If you do use gsub then gsub/\s{2,}/, ' ') is slightly faster.
如果您使用gsub,那么gsub/\s{2,}/, ')会稍微快一些。
Not really. Here's a version of the benchmark to test just that assertion:
不是真的。这里有一个测试这个断言的基准版本:
require 'benchmark'
include Benchmark
string = "foo bar bar baaar"
puts string.gsub(/\s+/, " ")
puts string.gsub(/\s{2,}/, ' ')
puts string.gsub(/\s\s+/, " ")
string = (["foo bar bar baaar"] * 10_000).join
puts "String length: #{ string.length } characters"
n = 100
bm(18) do |x|
x.report("gsub") { n.times { string.gsub(/\s+/, " ") } }
x.report('gsub/\s{2,}/, "")') { n.times { string.gsub(/\s{2,}/, ' ') } }
x.report("gsub2") { n.times { string.gsub(/\s\s+/, " ") } }
end
# >> foo bar bar baaar
# >> foo bar bar baaar
# >> foo bar bar baaar
# >> String length: 250000 characters
# >> user system total real
# >> gsub 1.380000 0.010000 1.390000 ( 1.381276)
# >> gsub/\s{2,}/, "") 1.590000 0.000000 1.590000 ( 1.609292)
# >> gsub2 1.050000 0.010000 1.060000 ( 1.051005)
If you want speed, use gsub2
. squeeze(' ')
will still run circles around a gsub
implementation though.
如果你想要速度,请使用gsub2。尽管如此,压缩(' ')仍然会围绕一个gsub实现运行。
#4
16
To complement the other answers, note that both Activesupport and Facets provide String#squish (note that it also removes newlines within the string):
要补充其他答案,请注意Activesupport和facet都提供了字符串#squish(注意,它还删除了字符串中的换行):
>> "foo bar bar baaar".squish
=> "foo bar bar baaar"
#5
7
Use a regular expression to match repeating whitespace (\s+)
and replace it by a space.
使用正则表达式匹配重复空格(\s+),并用空格替换它。
"foo bar foobar".gsub(/\s+/, ' ')
=> "foo bar foobar"
This matches every whitespace, as you only want to replace spaces, use / +/
instead of /\s+/
.
这与每个空格匹配,因为您只想替换空格,使用/ +/而不是/\s+/。
"foo bar \nfoobar".gsub(/ +/, ' ')
=> "foo bar \nfoobar"
#6
5
Which method performs better?
该方法执行更好?
$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [i686-linux]
$ cat squeeze.rb
require 'benchmark'
include Benchmark
string = "foo bar bar baaar"
n = 1_000_000
bm(6) do |x|
x.report("gsub ") { n.times { string.gsub(/\s+/, " ") } }
x.report("squeeze ") { n.times { string.squeeze } }
x.report("split/join") { n.times { string.split.join(" ") } }
end
$ ruby squeeze.rb
user system total real
gsub 4.970000 0.020000 4.990000 ( 5.624229)
squeeze 0.600000 0.000000 0.600000 ( 0.677733)
split/join 2.950000 0.020000 2.970000 ( 3.243022)
#7
3
Just use gsub
and regexp. For example:
只需使用gsub和regexp。例如:
str = "foo bar bar baaar"
str.gsub(/\s+/, " ")
will return new string or you can modify str directly using gsub!
.
将返回新的字符串或您可以直接使用gsub修改str !。
BTW. Regexp are very useful - there are plenty resources in the internet, for testing your own regexpes try rubular.com for example.
顺便说一句。Regexp非常有用——internet中有很多资源,例如,用于测试您自己的regexpes try rubular.com。
#1
39
>> str = "foo bar bar baaar"
=> "foo bar bar baaar"
>> str.split.join(" ")
=> "foo bar bar baaar"
>>
#2
76
String#squeeze has an optional parameter to specify characters to squeeze.
字符串#挤压有一个可选参数来指定要挤压的字符。
irb> "asd asd asd asd".squeeze(" ")
=> "asd asd asd asd"
#3
23
Updated benchmark from @zetetic's answer:
更新的基准从@zetetic的回答:
require 'benchmark'
include Benchmark
string = "foo bar bar baaar"
n = 1_000_000
bm(12) do |x|
x.report("gsub ") { n.times { string.gsub(/\s+/, " ") } }
x.report("squeeze(' ')") { n.times { string.squeeze(' ') } }
x.report("split/join") { n.times { string.split.join(" ") } }
end
Which results in these values when run on my desktop after running it twice:
当在我的桌面上运行两次后,就会产生这些值:
ruby test.rb; ruby test.rb
user system total real
gsub 6.060000 0.000000 6.060000 ( 6.061435)
squeeze(' ') 4.200000 0.010000 4.210000 ( 4.201619)
split/join 3.620000 0.000000 3.620000 ( 3.614499)
user system total real
gsub 6.020000 0.000000 6.020000 ( 6.023391)
squeeze(' ') 4.150000 0.010000 4.160000 ( 4.153204)
split/join 3.590000 0.000000 3.590000 ( 3.587590)
The issue is that squeeze
removes any repeated character, which results in a different output string and doesn't meet the OP's need. squeeze(' ')
does meet the needs, but slows down its operation.
问题是,挤压会删除任何重复的字符,这会导致输出字符串不同,并不能满足OP的需要。挤压(')确实能满足需要,但会减慢它的运行速度。
string.squeeze
=> "fo bar bar bar"
I was thinking about how the split.join
could be faster and it didn't seem like that would hold up in large strings, so I adjusted the benchmark to see what effect long strings would have:
我在想怎么分开的。join可能会更快,而且看起来不会在大字符串中保持住,所以我调整了基准,看看长字符串会有什么影响:
require 'benchmark'
include Benchmark
string = (["foo bar bar baaar"] * 10_000).join
puts "String length: #{ string.length } characters"
n = 100
bm(12) do |x|
x.report("gsub ") { n.times { string.gsub(/\s+/, " ") } }
x.report("squeeze(' ')") { n.times { string.squeeze(' ') } }
x.report("split/join") { n.times { string.split.join(" ") } }
end
ruby test.rb ; ruby test.rb
String length: 250000 characters
user system total real
gsub 2.570000 0.010000 2.580000 ( 2.576149)
squeeze(' ') 0.140000 0.000000 0.140000 ( 0.150298)
split/join 1.400000 0.010000 1.410000 ( 1.396078)
String length: 250000 characters
user system total real
gsub 2.570000 0.010000 2.580000 ( 2.573802)
squeeze(' ') 0.140000 0.000000 0.140000 ( 0.150384)
split/join 1.400000 0.010000 1.410000 ( 1.397748)
So, long lines do make a big difference.
所以,长长的队伍确实会有很大的不同。
If you do use gsub then gsub/\s{2,}/, ' ') is slightly faster.
如果您使用gsub,那么gsub/\s{2,}/, ')会稍微快一些。
Not really. Here's a version of the benchmark to test just that assertion:
不是真的。这里有一个测试这个断言的基准版本:
require 'benchmark'
include Benchmark
string = "foo bar bar baaar"
puts string.gsub(/\s+/, " ")
puts string.gsub(/\s{2,}/, ' ')
puts string.gsub(/\s\s+/, " ")
string = (["foo bar bar baaar"] * 10_000).join
puts "String length: #{ string.length } characters"
n = 100
bm(18) do |x|
x.report("gsub") { n.times { string.gsub(/\s+/, " ") } }
x.report('gsub/\s{2,}/, "")') { n.times { string.gsub(/\s{2,}/, ' ') } }
x.report("gsub2") { n.times { string.gsub(/\s\s+/, " ") } }
end
# >> foo bar bar baaar
# >> foo bar bar baaar
# >> foo bar bar baaar
# >> String length: 250000 characters
# >> user system total real
# >> gsub 1.380000 0.010000 1.390000 ( 1.381276)
# >> gsub/\s{2,}/, "") 1.590000 0.000000 1.590000 ( 1.609292)
# >> gsub2 1.050000 0.010000 1.060000 ( 1.051005)
If you want speed, use gsub2
. squeeze(' ')
will still run circles around a gsub
implementation though.
如果你想要速度,请使用gsub2。尽管如此,压缩(' ')仍然会围绕一个gsub实现运行。
#4
16
To complement the other answers, note that both Activesupport and Facets provide String#squish (note that it also removes newlines within the string):
要补充其他答案,请注意Activesupport和facet都提供了字符串#squish(注意,它还删除了字符串中的换行):
>> "foo bar bar baaar".squish
=> "foo bar bar baaar"
#5
7
Use a regular expression to match repeating whitespace (\s+)
and replace it by a space.
使用正则表达式匹配重复空格(\s+),并用空格替换它。
"foo bar foobar".gsub(/\s+/, ' ')
=> "foo bar foobar"
This matches every whitespace, as you only want to replace spaces, use / +/
instead of /\s+/
.
这与每个空格匹配,因为您只想替换空格,使用/ +/而不是/\s+/。
"foo bar \nfoobar".gsub(/ +/, ' ')
=> "foo bar \nfoobar"
#6
5
Which method performs better?
该方法执行更好?
$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [i686-linux]
$ cat squeeze.rb
require 'benchmark'
include Benchmark
string = "foo bar bar baaar"
n = 1_000_000
bm(6) do |x|
x.report("gsub ") { n.times { string.gsub(/\s+/, " ") } }
x.report("squeeze ") { n.times { string.squeeze } }
x.report("split/join") { n.times { string.split.join(" ") } }
end
$ ruby squeeze.rb
user system total real
gsub 4.970000 0.020000 4.990000 ( 5.624229)
squeeze 0.600000 0.000000 0.600000 ( 0.677733)
split/join 2.950000 0.020000 2.970000 ( 3.243022)
#7
3
Just use gsub
and regexp. For example:
只需使用gsub和regexp。例如:
str = "foo bar bar baaar"
str.gsub(/\s+/, " ")
will return new string or you can modify str directly using gsub!
.
将返回新的字符串或您可以直接使用gsub修改str !。
BTW. Regexp are very useful - there are plenty resources in the internet, for testing your own regexpes try rubular.com for example.
顺便说一句。Regexp非常有用——internet中有很多资源,例如,用于测试您自己的regexpes try rubular.com。