为什么正则表达式的“非捕获”组不起作用

时间:2022-06-10 12:15:14

In my snippet below, the non-capturing group "(?:aaa)" should be ignored in matching result, so the result should be "_bbb" only.
However, I get "aaa_bbb" in matching result; only when I specify group(2) does it show "_bbb".

在下面的代码片段中,在匹配结果中应该忽略非捕获组“(?:aaa)”,因此结果应该是“_bbb”。但匹配结果为“aaa_bbb”;只有当我指定group(2)时,它才显示“_bbb”。

import re

string1 = "aaa_bbb"
print(re.match(r"(?:aaa)(_bbb)", string1).group())

>>> aaa_bbb

6 个解决方案

#1


25  

group() and group(0) will return the entire match. Subsequent groups are actual capture groups.

group()和group(0)将返回整个匹配。后续的组是实际的捕获组。

>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(0))
aaa_bbb
>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(1))
_bbb
>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(2))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IndexError: no such group

#2


59  

I think you're misunderstanding the concept of a "non-capturing group". The text matched by a non-capturing group still becomes part of the overall regex match.

我认为你误解了“非捕获组”的概念。由非捕获组匹配的文本仍然是整个regex匹配的一部分。

Both the regex (?:aaa)(_bbb) and the regex (aaa)(_bbb) return aaa_bbb as the overall match. The difference is that the first regex has one capturing group which returns _bbb as its match, while the second regex has two capturing groups that return aaa and _bbb as their respective matches. In your Python code, to get _bbb, you'd need to use group(1) with the first regex, and group(2) with the second regex.

regex (?:aaa)(_bbb)和regex (aaa)(_bbb)都返回aaa_bbb作为整体匹配。不同之处在于,第一个regex有一个捕获组,返回_bbb作为其匹配,而第二个regex有两个捕获组,返回aaa和_bbb作为它们各自的匹配。在Python代码中,要获取_bbb,需要使用group(1)和第一个regex,使用group(2)和第二个regex。

The main benefit of non-capturing groups is that you can add them to a regex without upsetting the numbering of the capturing groups in the regex. They also offer (slightly) better performance as the regex engine doesn't have to keep track of the text matched by non-capturing groups.

非捕获组的主要好处是可以将它们添加到regex中,而不会破坏regex中捕获组的编号。它们还提供(稍微)更好的性能,因为regex引擎不需要跟踪非捕获组匹配的文本。

If you really want to exclude aaa from the overall regex match then you need to use lookaround. In this case, positive lookbehind does the trick: (?<=aaa)_bbb. With this regex, group() returns _bbb in Python. No capturing groups needed.

如果您真的想要将aaa排除在整个regex匹配之外,那么您需要使用lookaround。在本例中,积极的lookbehind是诀窍:(?<=aaa)_bbb。使用这个regex, group()在Python中返回_bbb。不需要捕获组。

My recommendation is that if you have the ability to use capturing groups to get part of the regex match, use that method instead of lookaround.

我的建议是,如果您能够使用捕获组来获得regex匹配的一部分,那么使用该方法而不是查找。

#3


2  

TFM:

解冻:

class re.MatchObject

类re.MatchObject

group([group1, ...])

集团((group1…))

Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string.

返回匹配的一个或多个子组。如果只有一个参数,那么结果就是一个字符串;如果有多个参数,则结果是一个元组,每个参数有一个条目。没有参数,group1默认为0(返回整个匹配)。如果groupN参数为零,则相应的返回值是整个匹配字符串。

#4


1  

Try:

试一试:

print(re.match(r"(?:aaa)(_bbb)", string1).group(1))

group() is same as group(0) and Group 0 is always present and it's the whole RE match.

group()与group(0)相同,而group 0始终存在,这是整个RE匹配。

#5


0  

You have to specify group(1) to get just the part captured by the parenthesis (_bbb in this case).

必须指定group(1)才能获得括号捕获的部分(在本例中为_bbb)。

group() without parameters will return the whole string the complete regular expression matched, no matter if some parts of it were additionally captured by parenthesis or not.

不带参数的group()将返回整个字符串所匹配的完整正则表达式,无论它的某些部分是否被括号捕获。

#6


0  

Use the groups method on the match object instead of group. It returns a list of all capture buffers. The group method with no argument is returning the entire match of the regular expression.

在匹配对象上使用groups而不是group方法。它返回所有捕获缓冲区的列表。没有参数的组方法返回正则表达式的整个匹配。

#1


25  

group() and group(0) will return the entire match. Subsequent groups are actual capture groups.

group()和group(0)将返回整个匹配。后续的组是实际的捕获组。

>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(0))
aaa_bbb
>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(1))
_bbb
>>> print (re.match(r"(?:aaa)(_bbb)", string1).group(2))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IndexError: no such group

#2


59  

I think you're misunderstanding the concept of a "non-capturing group". The text matched by a non-capturing group still becomes part of the overall regex match.

我认为你误解了“非捕获组”的概念。由非捕获组匹配的文本仍然是整个regex匹配的一部分。

Both the regex (?:aaa)(_bbb) and the regex (aaa)(_bbb) return aaa_bbb as the overall match. The difference is that the first regex has one capturing group which returns _bbb as its match, while the second regex has two capturing groups that return aaa and _bbb as their respective matches. In your Python code, to get _bbb, you'd need to use group(1) with the first regex, and group(2) with the second regex.

regex (?:aaa)(_bbb)和regex (aaa)(_bbb)都返回aaa_bbb作为整体匹配。不同之处在于,第一个regex有一个捕获组,返回_bbb作为其匹配,而第二个regex有两个捕获组,返回aaa和_bbb作为它们各自的匹配。在Python代码中,要获取_bbb,需要使用group(1)和第一个regex,使用group(2)和第二个regex。

The main benefit of non-capturing groups is that you can add them to a regex without upsetting the numbering of the capturing groups in the regex. They also offer (slightly) better performance as the regex engine doesn't have to keep track of the text matched by non-capturing groups.

非捕获组的主要好处是可以将它们添加到regex中,而不会破坏regex中捕获组的编号。它们还提供(稍微)更好的性能,因为regex引擎不需要跟踪非捕获组匹配的文本。

If you really want to exclude aaa from the overall regex match then you need to use lookaround. In this case, positive lookbehind does the trick: (?<=aaa)_bbb. With this regex, group() returns _bbb in Python. No capturing groups needed.

如果您真的想要将aaa排除在整个regex匹配之外,那么您需要使用lookaround。在本例中,积极的lookbehind是诀窍:(?<=aaa)_bbb。使用这个regex, group()在Python中返回_bbb。不需要捕获组。

My recommendation is that if you have the ability to use capturing groups to get part of the regex match, use that method instead of lookaround.

我的建议是,如果您能够使用捕获组来获得regex匹配的一部分,那么使用该方法而不是查找。

#3


2  

TFM:

解冻:

class re.MatchObject

类re.MatchObject

group([group1, ...])

集团((group1…))

Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string.

返回匹配的一个或多个子组。如果只有一个参数,那么结果就是一个字符串;如果有多个参数,则结果是一个元组,每个参数有一个条目。没有参数,group1默认为0(返回整个匹配)。如果groupN参数为零,则相应的返回值是整个匹配字符串。

#4


1  

Try:

试一试:

print(re.match(r"(?:aaa)(_bbb)", string1).group(1))

group() is same as group(0) and Group 0 is always present and it's the whole RE match.

group()与group(0)相同,而group 0始终存在,这是整个RE匹配。

#5


0  

You have to specify group(1) to get just the part captured by the parenthesis (_bbb in this case).

必须指定group(1)才能获得括号捕获的部分(在本例中为_bbb)。

group() without parameters will return the whole string the complete regular expression matched, no matter if some parts of it were additionally captured by parenthesis or not.

不带参数的group()将返回整个字符串所匹配的完整正则表达式,无论它的某些部分是否被括号捕获。

#6


0  

Use the groups method on the match object instead of group. It returns a list of all capture buffers. The group method with no argument is returning the entire match of the regular expression.

在匹配对象上使用groups而不是group方法。它返回所有捕获缓冲区的列表。没有参数的组方法返回正则表达式的整个匹配。