如何在阵列的每个元素中替换连续出现的空白区域？

Using Ruby 2.4. I have an array of strings. I want to strip off non-breaking and breaking space from the end of each item in the array as well as replace multiple consecutive occurrences of white space with a single white space. I thought teh below was the way, but I get an error

使用Ruby 2.4。我有一个字符串数组。我想从数组中每个项目的末尾剥离非破坏和破坏空间,以及用一个空格替换多个连续出现的空白区域。我认为下面是这样的,但我得到一个错误

 > words = ["1", "HUMPHRIES \t\t\t\t\t\t\t\t\t\t\t\t\t\t, \t\t\t\t\t\t\t\t\t\t\t\t\tJASON", "328", "FAIRVIEW, OR (US)", "US", "M", " 27 ", "00:27:30.00 \t\t\t\t\t\t\t\t\t\t\t \n"]

 > words.map{|word| word ? word.gsub!(/\A\p{Space}+|\p{Space}+\z/, '').gsub!(/[[:space:]]+/, ' ') : nil }
NoMethodError: undefined method `gsub!' for nil:NilClass
    from (irb):4:in `block in irb_binding'
    from (irb):4:in `map'
    from (irb):4
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/console.rb:65:in `start'
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/console_helper.rb:9:in `start'
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/commands_tasks.rb:78:in `console'
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/commands_tasks.rb:49:in `run_command!'
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands.rb:18:in `<top (required)>'
    from bin/rails:4:in `require'
    from bin/rails:4:in `<main>'

How can I properly replace consecutive occurrences of white space as well as strip it off from each word in the array?

如何正确替换连续出现的空白区域以及从数组中的每个单词中删除它?

3 个解决方案

#1

Do it with simple gsub not gsub!

用简单的gsub而不是gsub做吧!

words.map do |w|
  #respond_to?(:gsub) if you are not sure that array only from strings
  w.gsub(/(?<=[^\,\.])\s+|\A\s+/, '') if w.respond_to?(:gsub)
end

Because gsub! can return nil if don't change the string and then you try to do gsub! again with nil. That's why you get an undefined method gsub!' for nil:NilClass error.

因为gsub!如果不更改字符串然后你尝试做gsub可以返回nil!再次没有。这就是为什么你得到一个未定义的方法gsub!'为nil:NilClass错误。

From gsub! explanation in ruby doc:

来自gsub! ruby doc中的解释:

Performs the substitutions of String#gsub in place, returning str, or nil if no substitutions were performed. If no block and no replacement is given, an enumerator is returned instead.

如果没有执行替换,则执行String#gsub的替换,返回str或nil。如果没有给出块和没有替换,则返回枚举器。

As mentioned @CarySwoveland in comments \s doesn't handle non-breaking spaces. To handle it you should use [[:space:]] insted of \s.

如上所述@CarySwoveland在comments \ s中不处理不间断的空格。要处理它你应该使用\ [的[[:space:]]。

#2

You can use the following:

您可以使用以下内容:

words.map { |w| w.gsub(/(?<=[^\,\.])\s+/,'') }
 #=> ["1", "HUMPHRIES, JASON", "328", "FAIRVIEW,
 #     OR(US)", "US", "M", " 27", "00:27:30.00"]

#3

I assume all whitespace and non-breaking spaces at the send of each string are to be removed and, of what's left, all substrings of whitespace characters and non-breaking spaces is to be replaced by one space. (Natalia, if that's not correct please let me know in a comment.)

我假设每个字符串的发送中的所有空格和非中断空格都将被删除,而剩下的内容,空白字符和非中断空格的所有子字符串将被一个空格替换。 (纳塔利娅,如果这不正确,请在评论中告诉我。)

words =
  ["1",
   "HUMPHRIES \t\t\t\, \t\t\t\t\t\t\t\t\t\t\t\t\tJASON",
   " M\u00A0    \u00A0",
   "    27 ",
   "00:27:30.00 \t\t\t\t\t\t\t\t\t\t\t \n"]

R = /
    [[:space:]]     # match a POSIX bracket expression for one character
    (?=[[:space:]]) # match a POSIX bracket expression for in a positive lookahead
    |               # or
    [[:space:]]+    # match a POSIX bracket expression one or more times
    \z              # match end of string
    /x              # free-spacing regex definition mode

words.map { |w| w.gsub(R, '').gsub(/[[:space:]]/, ' ') }
  #=> ["1", "HUMPHRIES , JASON", " M", " 27", "00:27:30.00"]

Note that the POSIX [[:space:]] includes ASCII whitespace and Unicode's non-breaking space character, \u00A0.

请注意,POSIX [[:space:]]包括ASCII空格和Unicode的不间断空格字符\ u00A0。

To see why the second gsub is needed, note that

要了解为什么需要第二个gsub,请注意

words.map { |w| w.gsub(R, '') }
  #=> ["1", "HUMPHRIES\t,\tJASON", " M", " 27", "00:27:30.00"]

#1