I am looking for a way to split this array of strings:
我正在寻找一种方法来分割这个字符串数组:
["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can", "parse", "this",
"text", "?", "Without", "any", "errors", "!"]
into groups terminated by a punctuation:
由标点符号终止的群组:
[
["this", "is", "a", "test", "."],
["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"],
["Without", "any", "errors", "!"]
]
Is there a simple method to do this? Is the most sane approach to iterate the array, adding each index to a temporary array, and append that temporary array to the container array when punctuation is found?
有一个简单的方法来做到这一点?迭代数组是最理智的方法,将每个索引添加到临时数组,并在找到标点符号时将该临时数组附加到容器数组中?
I was thinking of using slice
or map
, but I can't figure out if it is possible or not.
我正在考虑使用切片或地图,但我无法弄清楚它是否可能。
2 个解决方案
#1
11
Check out Enumerable#slice_after
:
查看Enumerable#slice_after:
x.slice_after { |e| '.?!'.include?(e) }.to_a
#2
2
@ndn has given the best answer to this question, but I will suggest another approach that may have application to other problems.
@ndn给出了这个问题的最佳答案,但我会建议另一种可能适用于其他问题的方法。
Arrays such as the one you have given are generally obtained by splitting strings on whitespace or punctuation. For example:
您给出的数组通常是通过在空格或标点符号上拆分字符串来获得的。例如:
s = "this is a test. I wonder if I can parse this text? Without any errors!"
s.scan /\w+|[.?!]/
#=> ["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can",
# "parse", "this", "text", "?", "Without", "any", "errors", "!"]
When this is the case you may find it more convenient to manipulate the string directly in some other way. Here, for example, you could first use String#split with a regex to break the string s
into sentences:
在这种情况下,您可能会发现以其他方式直接操作字符串更方便。例如,您可以首先使用带有正则表达式的String#split将字符串s分解为句子:
r1 = /
(?<=[.?!]) # match one of the given punctuation characters in capture group 1
\s* # match >= 0 whitespace characters to remove spaces
/x # extended/free-spacing regex definition mode
a = s.split(r1)
#=> ["this is a test.", "I wonder if I can parse this text?",
# "Without any errors!"]
and then split up the sentences:
然后拆分句子:
r2 = /
\s+ # match >= 1 whitespace characters
| # or
(?=[.?!]) # use a positive lookahead to match a zero-width string
# followed by one of the punctuation characters
/x
b = a.map { |s| s.split(r2) }
#=> [["this", "is", "a", "test", "."],
# ["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"],
# ["Without", "any", "errors", "!"]]
#1
11
Check out Enumerable#slice_after
:
查看Enumerable#slice_after:
x.slice_after { |e| '.?!'.include?(e) }.to_a
#2
2
@ndn has given the best answer to this question, but I will suggest another approach that may have application to other problems.
@ndn给出了这个问题的最佳答案,但我会建议另一种可能适用于其他问题的方法。
Arrays such as the one you have given are generally obtained by splitting strings on whitespace or punctuation. For example:
您给出的数组通常是通过在空格或标点符号上拆分字符串来获得的。例如:
s = "this is a test. I wonder if I can parse this text? Without any errors!"
s.scan /\w+|[.?!]/
#=> ["this", "is", "a", "test", ".", "I", "wonder", "if", "I", "can",
# "parse", "this", "text", "?", "Without", "any", "errors", "!"]
When this is the case you may find it more convenient to manipulate the string directly in some other way. Here, for example, you could first use String#split with a regex to break the string s
into sentences:
在这种情况下,您可能会发现以其他方式直接操作字符串更方便。例如,您可以首先使用带有正则表达式的String#split将字符串s分解为句子:
r1 = /
(?<=[.?!]) # match one of the given punctuation characters in capture group 1
\s* # match >= 0 whitespace characters to remove spaces
/x # extended/free-spacing regex definition mode
a = s.split(r1)
#=> ["this is a test.", "I wonder if I can parse this text?",
# "Without any errors!"]
and then split up the sentences:
然后拆分句子:
r2 = /
\s+ # match >= 1 whitespace characters
| # or
(?=[.?!]) # use a positive lookahead to match a zero-width string
# followed by one of the punctuation characters
/x
b = a.map { |s| s.split(r2) }
#=> [["this", "is", "a", "test", "."],
# ["I", "wonder", "if", "I", "can", "parse", "this", "text", "?"],
# ["Without", "any", "errors", "!"]]