
时间:2021-09-27 14:30:30

Using Ruby 2.4. I have an array of strings. I want to strip off non-breaking and breaking space from the end of each item in the array as well as replace multiple consecutive occurrences of white space with a single white space. I thought teh below was the way, but I get an error

使用Ruby 2.4。我有一个字符串数组。我想从数组中每个项目的末尾剥离非破坏和破坏空间,以及用一个空格替换多个连续出现的空白区域。我认为下面是这样的,但我得到一个错误

 > words = ["1", "HUMPHRIES \t\t\t\t\t\t\t\t\t\t\t\t\t\t, \t\t\t\t\t\t\t\t\t\t\t\t\tJASON", "328", "FAIRVIEW, OR (US)", "US", "M", " 27 ", "00:27:30.00 \t\t\t\t\t\t\t\t\t\t\t \n"]

 > words.map{|word| word ? word.gsub!(/\A\p{Space}+|\p{Space}+\z/, '').gsub!(/[[:space:]]+/, ' ') : nil }
NoMethodError: undefined method `gsub!' for nil:NilClass
    from (irb):4:in `block in irb_binding'
    from (irb):4:in `map'
    from (irb):4
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/console.rb:65:in `start'
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/console_helper.rb:9:in `start'
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/commands_tasks.rb:78:in `console'
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/commands_tasks.rb:49:in `run_command!'
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands.rb:18:in `<top (required)>'
    from bin/rails:4:in `require'
    from bin/rails:4:in `<main>'

How can I properly replace consecutive occurrences of white space as well as strip it off from each word in the array?


3 个解决方案



Do it with simple gsub not gsub!


words.map do |w|
  #respond_to?(:gsub) if you are not sure that array only from strings
  w.gsub(/(?<=[^\,\.])\s+|\A\s+/, '') if w.respond_to?(:gsub)

Because gsub! can return nil if don't change the string and then you try to do gsub! again with nil. That's why you get an undefined method gsub!' for nil:NilClass error.


From gsub! explanation in ruby doc:

来自gsub! ruby doc中的解释:

Performs the substitutions of String#gsub in place, returning str, or nil if no substitutions were performed. If no block and no replacement is given, an enumerator is returned instead.


As mentioned @CarySwoveland in comments \s doesn't handle non-breaking spaces. To handle it you should use [[:space:]] insted of \s.

如上所述@CarySwoveland在comments \ s中不处理不间断的空格。要处理它你应该使用\ [的[[:space:]]。



You can use the following:


words.map { |w| w.gsub(/(?<=[^\,\.])\s+/,'') }
 #=> ["1", "HUMPHRIES, JASON", "328", "FAIRVIEW,
 #     OR(US)", "US", "M", " 27", "00:27:30.00"]



I assume all whitespace and non-breaking spaces at the send of each string are to be removed and, of what's left, all substrings of whitespace characters and non-breaking spaces is to be replaced by one space. (Natalia, if that's not correct please let me know in a comment.)

我假设每个字符串的发送中的所有空格和非中断空格都将被删除,而剩下的内容,空白字符和非中断空格的所有子字符串将被一个空格替换。 (纳塔利娅,如果这不正确,请在评论中告诉我。)

words =
   "HUMPHRIES \t\t\t\, \t\t\t\t\t\t\t\t\t\t\t\t\tJASON",
   " M\u00A0    \u00A0",
   "    27 ",
   "00:27:30.00 \t\t\t\t\t\t\t\t\t\t\t \n"]

R = /
    [[:space:]]     # match a POSIX bracket expression for one character
    (?=[[:space:]]) # match a POSIX bracket expression for in a positive lookahead
    |               # or
    [[:space:]]+    # match a POSIX bracket expression one or more times
    \z              # match end of string
    /x              # free-spacing regex definition mode

words.map { |w| w.gsub(R, '').gsub(/[[:space:]]/, ' ') }
  #=> ["1", "HUMPHRIES , JASON", " M", " 27", "00:27:30.00"]

Note that the POSIX [[:space:]] includes ASCII whitespace and Unicode's non-breaking space character, \u00A0.

请注意,POSIX [[:space:]]包括ASCII空格和Unicode的不间断空格字符\ u00A0。

To see why the second gsub is needed, note that


words.map { |w| w.gsub(R, '') }
  #=> ["1", "HUMPHRIES\t,\tJASON", " M", " 27", "00:27:30.00"] 



Do it with simple gsub not gsub!


words.map do |w|
  #respond_to?(:gsub) if you are not sure that array only from strings
  w.gsub(/(?<=[^\,\.])\s+|\A\s+/, '') if w.respond_to?(:gsub)

Because gsub! can return nil if don't change the string and then you try to do gsub! again with nil. That's why you get an undefined method gsub!' for nil:NilClass error.


From gsub! explanation in ruby doc:

来自gsub! ruby doc中的解释:

Performs the substitutions of String#gsub in place, returning str, or nil if no substitutions were performed. If no block and no replacement is given, an enumerator is returned instead.


As mentioned @CarySwoveland in comments \s doesn't handle non-breaking spaces. To handle it you should use [[:space:]] insted of \s.

如上所述@CarySwoveland在comments \ s中不处理不间断的空格。要处理它你应该使用\ [的[[:space:]]。



You can use the following:


words.map { |w| w.gsub(/(?<=[^\,\.])\s+/,'') }
 #=> ["1", "HUMPHRIES, JASON", "328", "FAIRVIEW,
 #     OR(US)", "US", "M", " 27", "00:27:30.00"]



I assume all whitespace and non-breaking spaces at the send of each string are to be removed and, of what's left, all substrings of whitespace characters and non-breaking spaces is to be replaced by one space. (Natalia, if that's not correct please let me know in a comment.)

我假设每个字符串的发送中的所有空格和非中断空格都将被删除,而剩下的内容,空白字符和非中断空格的所有子字符串将被一个空格替换。 (纳塔利娅,如果这不正确,请在评论中告诉我。)

words =
   "HUMPHRIES \t\t\t\, \t\t\t\t\t\t\t\t\t\t\t\t\tJASON",
   " M\u00A0    \u00A0",
   "    27 ",
   "00:27:30.00 \t\t\t\t\t\t\t\t\t\t\t \n"]

R = /
    [[:space:]]     # match a POSIX bracket expression for one character
    (?=[[:space:]]) # match a POSIX bracket expression for in a positive lookahead
    |               # or
    [[:space:]]+    # match a POSIX bracket expression one or more times
    \z              # match end of string
    /x              # free-spacing regex definition mode

words.map { |w| w.gsub(R, '').gsub(/[[:space:]]/, ' ') }
  #=> ["1", "HUMPHRIES , JASON", " M", " 27", "00:27:30.00"]

Note that the POSIX [[:space:]] includes ASCII whitespace and Unicode's non-breaking space character, \u00A0.

请注意,POSIX [[:space:]]包括ASCII空格和Unicode的不间断空格字符\ u00A0。

To see why the second gsub is needed, note that


words.map { |w| w.gsub(R, '') }
  #=> ["1", "HUMPHRIES\t,\tJASON", " M", " 27", "00:27:30.00"]