尝试将字符串拆分为单个单词或“引用的单词”,并希望在结果数组中保留引号

时间:2021-03-09 21:36:39

I'm trying to split a string like Presentation about "Test Driven Development" into an array like this:

我正在尝试将类似于“测试驱动开发”的演示文稿的字符串拆分为如下数组:

[ 'Presentation',
  'about',
  '"Behavior Driven Development"' ]

I have tried CSV::parse_line(string, col_sep: ' '), but this results in

我试过了CSV :: parse_line(string,col_sep:''),但这导致了

[ 'Presentation',
  'about',
  'Behavior Driven Development' ] # I'm missing the quotes here

I also tried some regexp magic, but I'm still a beginner and didn't succeed. I guess this is quite simple for a pro, so maybe someone could point me into the right direction? Thanks.

我也试过一些正则表达式的魔法,但我还是初学者并没有成功。我想对于专业人士来说这很简单,所以也许有人可以指出我正确的方向?谢谢。

3 个解决方案

#1


15  

You may use the following regular expression split:

您可以使用以下正则表达式拆分:

str = 'Presentation about "Test Driven Development"'
p str.split(/\s(?=(?:[^"]|"[^"]*")*$)/)
# => ["Presentation", "about", "\"Test Driven Development\""]

It splits if there is a space but only if the text following until the end contains an even number of ". Be aware that this version will only work if all your strings are properly quoted.

如果有空格,它会分裂,但只有在结尾之后的文本包含偶数“。请注意,只有在所有字符串都被正确引用时,此版本才会生效。

An alternative solution uses scan to read the parts of the string (besides spaces):

另一种解决方案是使用scan来读取字符串的各个部分(除了空格):

p str.scan(/(?:\w|"[^"]*")+/)
# => ["Presentation", "about", "\"Test Driven Development\""]

#2


2  

Just to extend the previous answer from Howard, you can add this method:

只是为了扩展霍华德之前的答案,你可以添加这个方法:

class String
  def tokenize
    self.
      split(/\s(?=(?:[^'"]|'[^']*'|"[^"]*")*$)/).
      select {|s| not s.empty? }.
      map {|s| s.gsub(/(^ +)|( +$)|(^["']+)|(["']+$)/,'')}
  end
end

And the result:

结果如下:

> 'Presentation      about "Test Driven Development"  '.tokenize
=> ["Presentation", "about", "Test Driven Development"]

#3


0  

Here:

这里:

"Presentation about \"Test Driven Development\"".scan(/\s?\w+\s?|"[\w\s]*"/).map {|s| s.strip}

#1


15  

You may use the following regular expression split:

您可以使用以下正则表达式拆分:

str = 'Presentation about "Test Driven Development"'
p str.split(/\s(?=(?:[^"]|"[^"]*")*$)/)
# => ["Presentation", "about", "\"Test Driven Development\""]

It splits if there is a space but only if the text following until the end contains an even number of ". Be aware that this version will only work if all your strings are properly quoted.

如果有空格,它会分裂,但只有在结尾之后的文本包含偶数“。请注意,只有在所有字符串都被正确引用时,此版本才会生效。

An alternative solution uses scan to read the parts of the string (besides spaces):

另一种解决方案是使用scan来读取字符串的各个部分(除了空格):

p str.scan(/(?:\w|"[^"]*")+/)
# => ["Presentation", "about", "\"Test Driven Development\""]

#2


2  

Just to extend the previous answer from Howard, you can add this method:

只是为了扩展霍华德之前的答案,你可以添加这个方法:

class String
  def tokenize
    self.
      split(/\s(?=(?:[^'"]|'[^']*'|"[^"]*")*$)/).
      select {|s| not s.empty? }.
      map {|s| s.gsub(/(^ +)|( +$)|(^["']+)|(["']+$)/,'')}
  end
end

And the result:

结果如下:

> 'Presentation      about "Test Driven Development"  '.tokenize
=> ["Presentation", "about", "Test Driven Development"]

#3


0  

Here:

这里:

"Presentation about \"Test Driven Development\"".scan(/\s?\w+\s?|"[\w\s]*"/).map {|s| s.strip}