如何使用Ruby正则表达式从字符串中提取重复的字符序列?

时间:2022-04-08 22:39:11

I have such a string "++++001------zx.......?????????xxxxxxx" I would like to extract the more than one length continuous sequences into a flattened array with a Ruby regex:

我有这样一个字符串“++++ 001 ------ zx ....... ????????? xxxxxxx”我想将多个长度的连续序列提取成一个带有Ruby正则表达式的扁平数组:

["++++",
"00",
"------",
".......",
"?????????",
"xxxxxxx"]

I can achieve this with a nested loop:

我可以用嵌套循环来实现这个目的:

s="++++001------zx.......?????????xxxxxxx"
t=s.split(//)
i=0
f=[]
while i<=t.length-1 do
  j=i
  part=""
  while t[i]==t[j] do
    part=part+t[j]
    j=j+1
  end
  i=j
  if part.length>=2 then f.push(part) end
end

But I am unable to find an appropriate regex to feed into the scan method. I tried this: s.scan(/(.)\1++/x) but it only captures the first character of the repeating sequences. Is it possible at all?

但我无法找到适当的正则表达式来提供扫描方法。我试过这个:s.scan(/(。)\ 1 ++ / x),但它只捕获重复序列的第一个字符。有可能吗?

2 个解决方案

#1


2  

This is a bit tricky.

这有点棘手。

You do want to capture any group that is more than one of any given character. So a good way to do this is using backreferences. Your solution is close to being correct.

您确实希望捕获任何给定字符中的多个组。所以这样做的好方法是使用反向引用。您的解决方案接近正确。

/((.)\2+)/ should do the trick.

/((。)\ 2 +)/应该做的伎俩。

Note that if you use scan, this will return two values for each match group. The first being the sequence, and the second being the value.

请注意,如果使用扫描,则会为每个匹配组返回两个值。第一个是序列,第二个是值。

#2


1  

str =  "++++001------zx.......?????????xxxxxxx" 
str.chars.chunk{|e| e}.map{|e| e[1].join if e[1].size >1 }.compact
# => ["++++", "00", "------", ".......", "?????????", "xxxxxxx"]

#1


2  

This is a bit tricky.

这有点棘手。

You do want to capture any group that is more than one of any given character. So a good way to do this is using backreferences. Your solution is close to being correct.

您确实希望捕获任何给定字符中的多个组。所以这样做的好方法是使用反向引用。您的解决方案接近正确。

/((.)\2+)/ should do the trick.

/((。)\ 2 +)/应该做的伎俩。

Note that if you use scan, this will return two values for each match group. The first being the sequence, and the second being the value.

请注意,如果使用扫描,则会为每个匹配组返回两个值。第一个是序列,第二个是值。

#2


1  

str =  "++++001------zx.......?????????xxxxxxx" 
str.chars.chunk{|e| e}.map{|e| e[1].join if e[1].size >1 }.compact
# => ["++++", "00", "------", ".......", "?????????", "xxxxxxx"]