什么时候需要非捕获组?(复制)

This question already has an answer here:

这个问题已经有了答案:

Regex: Question mark and colon [duplicate] 2 answers
问号和冒号[重复]两个答案

In this example, I understand it can match strings with three letters or three dashes at their starts and the final three characters must be digits. But I do not understand what ?: does in this example:

在这个示例中，我理解它可以在字符串的开始处匹配三个字母或三个破折号，最后三个字符必须是数字。但是我不明白在这个例子中:

re.match("(?:(?:\w{3})|(?:\-{3}))\d\d\d$", v)

Could someone please explain when we need non-capturing groups? Thanks.

谁能解释一下什么时候需要非捕获组吗?谢谢。

3 个解决方案

#1

Non capturing group help to don't get unwanted data in capturing groups.

非捕获组有助于在捕获组中不获取不需要的数据。

For instance you string look like

例如，字符串看起来是这样的

abc and bcd
def or cef

Here you want to capture first and third column data which is separated by and && or. so you write the regex as follows

在这里，您希望捕获由and & or分隔的第一和第三列数据。因此，您可以按照以下方式编写regex

(\w+)\s+(and|or)\s+(\w+)

Here $1 contain first column

这里$1包含第一列

abc def

then $3 contain

然后3美元包含

bcd cef

and then unnecessary data stored in to the $2 which is and or. In this case you don't want to store the unnecessary data so will use non capturing group.

然后将不必要的数据存储到$2，即是和或。在这种情况下，您不希望存储不必要的数据，因此将使用非捕获组。

(\w+)\s+(?:and|or)\s+(\w+)

Here $1 contain

这里包含1美元

abc 
def

$2 contain

包含2美元

bcd
def

And will get the exact data from the non capturing group.

将得到非捕获组的精确数据。

For example

例如

(?:don't (want))

Now the $1 contain the data want.

现在$1包含需要的数据。

Then it also help to perform the | condition inside grouping. For example

然后它还有助于在分组中执行|条件。例如

(?:don't(want)|some(what))

In the above example $1 contain the data want and the $2 contain the data what.

在上面的例子中$1包含了数据，$2包含了数据。

#2

You never absolutely need non-capturing groups, but they have a few advantages:

你绝对不需要非捕获组，但是它们有一些优势:

Capturing groups are numbered from left to right. You use those numbers to refer to the group in backreferences, and when extracting the text matched by the group. By marking some groups as non-capturing, they do not contribute to the numbering, which means the numbering for the groups you do care about will be simpler: 1,2,3... without any gaps; and you can later insert or remove non-capturing groups without the numbers changing for any of the capturing groups.

捕获组由左至右编号。您使用这些数字来引用反向引用中的组，以及在提取组匹配的文本时。通过将某些组标记为非捕获，它们不会对编号做出贡献，这意味着您确实关心的组的编号将更简单:1、2、3……没有任何差距;之后，您可以插入或删除非捕获组，而不必更改任何捕获组的编号。
Not capturing a group makes it more efficient (depending on the particular regex API), since it does not need to store or return the string matched for that group.

不捕获组可以提高效率(取决于特定的regex API)，因为它不需要存储或返回与该组匹配的字符串。
Documentation: Marking which groups are capturing and non-capturing makes their individual purposes clearer.

文档:标记正在捕获的组和未捕获的组，可以使它们各自的目的更清晰。

In your specific example, the two inner groups are totally unnecessary, since they are not used for capturing, nor alternation, nor any other feature. It could be shortened to: (?:\w{3}|-{3})\d\d\d$

在您的具体示例中，这两个内部组完全没有必要，因为它们不用于捕获、交互或任何其他特性。它可以缩写为:(?:\w{3}|-{3})\d\d\d \d\d$

#3

I've used non capturing groups with preg_match() in php where an optional group was needed for the pattern but didn't want it included in results, e.g:

在php中，我使用了preg_match()的非捕获组，该模式需要一个可选组，但不希望包含在结果中，例如:

Apr(?:il)? ([0-9]{1,2})

Would match the date in both "Apr 10" and "April 10" while only capturing the date "10". If the "il" portion were captured I'd have no easy way of knowing which group to reference in the result set.

将在“Apr 10”和“4月10日”的日期匹配，同时只记录“10”的日期。如果“il”部分被捕获，我将很难知道在结果集中要引用哪个组。

#1