"aaabbcde".scan(/((\w)\2*)/)
This line of code will get a result as below
这一行代码将得到如下所示的结果
[["aaa", "a"], ["bb", "b"], ["c", "c"], ["d", "d"], ["e", "e"]]
The part where I don't understand is what \2*
does. And why does this generate a two dimension array?
我不明白的是\2*做了什么。为什么会生成一个二维数组?
Edited:
编辑:
Just a summary from what I understand after getting helps and doing some research. Hope this would help anyone searching for similar topic.
在得到帮助并做了一些研究之后,我所了解到的只是一个总结。希望这将有助于任何搜索类似主题的人。
You can create capture groups using regex. And the latter group can refer to earlier group. Each parentheses is a capture group. So if you do this /(\w)/
, you create 1 group, and it will extract all the word characters, and put each single character into an individual group.
您可以使用regex创建捕获组。后一组可指前一组。每个圆括号都是一个捕获组。如果你这样做/(\w)/,你创建一个组,它将提取所有的单词字符,并将每个字符放入一个单独的组中。
So you will get something like this with a string "rubyy"
你会得到这样的字符串"rubyy"
Match 1
1. r
Match 2
1. u
Match 3
1. b
Match 4
1. y
Match 5
1. y
To create a second capture group, you just need to add another pair of parentheses, like this /((\w))/
. But note that, the outer pair of parentheses is the first group, and the inner one is the second group. And this can go forth and forth and forth.
要创建第二个捕获组,只需添加另一对圆括号,如/(\w))/。但是注意,外对括号是第一组,内括号是第二组。这个可以前后移动。
Given the same string "rubyy"
, this would have a result like this.
给定相同的字符串“rubyy”,这将会产生这样的结果。
Match 1
1. r
2. r
Match 2
1. u
2. u
Match 3
1. b
2. b
Match 4
1. y
2. y
Match 5
1. y
2. y
You can try to change the regex to /(()\w)/
, or /(\w)()/
and see what happen (remember I just said the inner pair of parentheses is the second group?). http://www.rubular.com is a good place to experiment your regex in ruby.
您可以尝试将regex更改为/()\w)/,或/(\w)()/,看看会发生什么(还记得我刚才说的圆括号内对是第二组吗?)http://www.rubular.com是一个在ruby中试验regex的好地方。
A pointer that refers to another capture group: So the regex I originally asked about, /((\w)\2*)/
, \2
this part just means "give me what you get from group #2 (the inner one is group #2), and put it in group #1 (the outer one, where \2
is). Then *
is just a regular regex that means zero or more. In this case, zero or more of what group #2 extracted.
一个指向另一个捕获组的指针:所以我最初询问的regex, /(\w)\2*)/, \2这个部分的意思是“给我你从第2组得到的东西(内部的是第2组),然后把它放在第1组(外部的是第2组)。”那么*只是一个常规的regex,意味着0或更多。在这种情况下,2组提取的0或更多。
Give the above understanding, you can try do this /(\w)(\1*)/
. This would achieve something similar too. But you should experiment the difference. And remember, /(\2*)(\w)/
don't work, because I guess ruby runs sequantially in this type of parallel structure, so \2
points to a capture group that didn't exist yet.
如果你有以上的理解,你可以尝试做这个/(只\w)(只\1*)/。这也将实现类似的目标。但你应该尝试一下这种差异。记住,/(\2*)(\w)/不要工作,因为我猜ruby在这种类型的并行结构中运行的是sequantially,所以\2指向一个还不存在的捕获组。
2 个解决方案
#1
3
You have two capture groups, the first one, ((\w)\2*)
, being the first encountered when parsed left-to-right, the second being (\w)
. \2*
matches the result of capture group #2, zero or more times.
您有两个捕获组,第一个((\w)\2*),当从左到右解析时第一次遇到,第二个(\w)。\2*匹配捕获组#2的结果,0或更多次。
For "aaa"
, the inner capture group (#2) matches the first "a"
, then \2*
becomes a*
, which matches the next two a
's. Hence, the first capture group matches 'aaa'
.
对于“aaa”,内部捕获组(#2)匹配第一个“a”,然后\2*变成一个*,它匹配下两个a。因此,第一个捕获组匹配“aaa”。
Notice that capture group #2 always matches just one character.
注意,捕获组#2总是只匹配一个字符。
#2
0
Consider the following
考虑以下
- anything inside the
//
is a regular expression pattern to match - //中的任何内容都是要匹配的正则表达式模式
-
\2
is a variable (called a backreference) to point to whatever is matched in the second set of parentheses, in this case the\w
. If it was matched in another parentheses, you'd use\3
; these unescaped parentheses are known as capture groups - \2是一个变量(称为backreference),指向第二组括号中匹配的内容,在本例中是\w。如果它在另一个括号中被匹配,您将使用\3;这些未转义的括号称为捕获组
-
*
is a 0 or more match - *是一个0或更多的匹配项
For a better explanation, refer to any of the plethora of guides about regex. For example: http://www.regular-expressions.info/refcapture.html
要获得更好的解释,请参考任何有关regex的指南。例如:http://www.regular-expressions.info/refcapture.html。
#1
3
You have two capture groups, the first one, ((\w)\2*)
, being the first encountered when parsed left-to-right, the second being (\w)
. \2*
matches the result of capture group #2, zero or more times.
您有两个捕获组,第一个((\w)\2*),当从左到右解析时第一次遇到,第二个(\w)。\2*匹配捕获组#2的结果,0或更多次。
For "aaa"
, the inner capture group (#2) matches the first "a"
, then \2*
becomes a*
, which matches the next two a
's. Hence, the first capture group matches 'aaa'
.
对于“aaa”,内部捕获组(#2)匹配第一个“a”,然后\2*变成一个*,它匹配下两个a。因此,第一个捕获组匹配“aaa”。
Notice that capture group #2 always matches just one character.
注意,捕获组#2总是只匹配一个字符。
#2
0
Consider the following
考虑以下
- anything inside the
//
is a regular expression pattern to match - //中的任何内容都是要匹配的正则表达式模式
-
\2
is a variable (called a backreference) to point to whatever is matched in the second set of parentheses, in this case the\w
. If it was matched in another parentheses, you'd use\3
; these unescaped parentheses are known as capture groups - \2是一个变量(称为backreference),指向第二组括号中匹配的内容,在本例中是\w。如果它在另一个括号中被匹配,您将使用\3;这些未转义的括号称为捕获组
-
*
is a 0 or more match - *是一个0或更多的匹配项
For a better explanation, refer to any of the plethora of guides about regex. For example: http://www.regular-expressions.info/refcapture.html
要获得更好的解释,请参考任何有关regex的指南。例如:http://www.regular-expressions.info/refcapture.html。