用Ruby将字符串分割成单词和标点符号

I'm working in Ruby and I want to split a string and its punctuation into an array, but I want to consider apostrophes and hyphens as parts of words. For example,

我在Ruby中工作，我想把一个字符串和它的标点符号分割成一个数组，但是我想把撇号和连字符作为单词的一部分。例如,

s = "here...is a     happy-go-lucky string that I'm writing"

should become

应该成为

["here", "...", "is", "a", "happy-go-lucky", "string", "that", "I'm", "writing"].

The closest I've gotten is still inadequate because it doesn't properly consider hyphens and apostrophes as part of the word.

我得到的最接近的词仍然是不够的，因为它没有正确地将连字符和撇号作为词的一部分。

This is the closest I've gotten so far:

这是我到目前为止最接近的一次:

s.scan(/\w+|\W+/).select {|x| x.match(/\S/)}

which yields

的收益率

["here", "...", "is", "a", "happy", "-", "go", "-", "lucky", "string", "that", "I", "'", "m", "writing"]

。

4 个解决方案

#1

You can try the following:

你可以试试以下方法:

s.scan(/[\w'-]+|[[:punct:]]+/)
#=> ["here", "...", "is", "a", "happy-go-lucky", "string", "that", "I'm", "writing"]

#2

You were close:

你是亲密:

s.scan(/[\w'-]+|[.,!?]+/)

The idea is we match either words with possibly '/- in them or punctuation characters.

我们的想法是用可能的“/”或标点符号来匹配单词。

#3

After nearly giving up then tinkering some more, I appear to have solved the puzzle. This seems to work: s.scan(/[\w'-]+|\W+/).select {|x| x.match(/\S/)}. It yields ["here", "...", "is", "a", "happy-go-lucky", "string", "that", "I'm", "writing"].

在几乎放弃之后，我似乎已经解决了这个难题。这似乎是可行的:扫描(/[\w'-]+|\ w +/)。选择{ x | | x.match(\ S /)}。它的收益率(“在这里”、“……”“”、“”、“随遇而安的”、“字符串”、“那”,“我”、“写作”)。

Is there an even cleaner way to do it though, without having to use #select?

有没有一种更简洁的方法，不用使用#select?

#4

Use the splitmethod.

使用splitmethod。

Example:

例子:

str = "word, anotherWord, foo"
puts str.split(",")

It returns

它返回

word
anotherWord
foo

Hope it works for you!

希望对你有用!

Also you can chek this http://ruby.about.com/od/advancedruby/a/split.htm

您还可以访问http://ruby.about.com/od/advancedruby/a/split.htm

#1