I'm trying to split a string like Presentation about "Test Driven Development"
into an array like this:
我正在尝试将类似于“测试驱动开发”的演示文稿的字符串拆分为如下数组:
[ 'Presentation',
'about',
'"Behavior Driven Development"' ]
I have tried CSV::parse_line(string, col_sep: ' ')
, but this results in
我试过了CSV :: parse_line(string,col_sep:''),但这导致了
[ 'Presentation',
'about',
'Behavior Driven Development' ] # I'm missing the quotes here
I also tried some regexp magic, but I'm still a beginner and didn't succeed. I guess this is quite simple for a pro, so maybe someone could point me into the right direction? Thanks.
我也试过一些正则表达式的魔法,但我还是初学者并没有成功。我想对于专业人士来说这很简单,所以也许有人可以指出我正确的方向?谢谢。
3 个解决方案
#1
15
You may use the following regular expression split
:
您可以使用以下正则表达式拆分:
str = 'Presentation about "Test Driven Development"'
p str.split(/\s(?=(?:[^"]|"[^"]*")*$)/)
# => ["Presentation", "about", "\"Test Driven Development\""]
It splits if there is a space but only if the text following until the end contains an even number of "
. Be aware that this version will only work if all your strings are properly quoted.
如果有空格,它会分裂,但只有在结尾之后的文本包含偶数“。请注意,只有在所有字符串都被正确引用时,此版本才会生效。
An alternative solution uses scan
to read the parts of the string (besides spaces):
另一种解决方案是使用scan来读取字符串的各个部分(除了空格):
p str.scan(/(?:\w|"[^"]*")+/)
# => ["Presentation", "about", "\"Test Driven Development\""]
#2
2
Just to extend the previous answer from Howard, you can add this method:
只是为了扩展霍华德之前的答案,你可以添加这个方法:
class String
def tokenize
self.
split(/\s(?=(?:[^'"]|'[^']*'|"[^"]*")*$)/).
select {|s| not s.empty? }.
map {|s| s.gsub(/(^ +)|( +$)|(^["']+)|(["']+$)/,'')}
end
end
And the result:
结果如下:
> 'Presentation about "Test Driven Development" '.tokenize
=> ["Presentation", "about", "Test Driven Development"]
#3
0
Here:
这里:
"Presentation about \"Test Driven Development\"".scan(/\s?\w+\s?|"[\w\s]*"/).map {|s| s.strip}
#1
15
You may use the following regular expression split
:
您可以使用以下正则表达式拆分:
str = 'Presentation about "Test Driven Development"'
p str.split(/\s(?=(?:[^"]|"[^"]*")*$)/)
# => ["Presentation", "about", "\"Test Driven Development\""]
It splits if there is a space but only if the text following until the end contains an even number of "
. Be aware that this version will only work if all your strings are properly quoted.
如果有空格,它会分裂,但只有在结尾之后的文本包含偶数“。请注意,只有在所有字符串都被正确引用时,此版本才会生效。
An alternative solution uses scan
to read the parts of the string (besides spaces):
另一种解决方案是使用scan来读取字符串的各个部分(除了空格):
p str.scan(/(?:\w|"[^"]*")+/)
# => ["Presentation", "about", "\"Test Driven Development\""]
#2
2
Just to extend the previous answer from Howard, you can add this method:
只是为了扩展霍华德之前的答案,你可以添加这个方法:
class String
def tokenize
self.
split(/\s(?=(?:[^'"]|'[^']*'|"[^"]*")*$)/).
select {|s| not s.empty? }.
map {|s| s.gsub(/(^ +)|( +$)|(^["']+)|(["']+$)/,'')}
end
end
And the result:
结果如下:
> 'Presentation about "Test Driven Development" '.tokenize
=> ["Presentation", "about", "Test Driven Development"]
#3
0
Here:
这里:
"Presentation about \"Test Driven Development\"".scan(/\s?\w+\s?|"[\w\s]*"/).map {|s| s.strip}