命名组内的非捕获组

时间:2022-11-09 22:33:44

I'm working with a Python regex for extracting time durations in '2h30m' format. I run into an issue where non-capturing groups ((?:...)) are getting captured inside named groups.

我正在使用Python正则表达式以'2h30m'格式提取持续时间。我遇到了一个问题,其中非捕获组((?:...))被捕获在命名组内。

e.g. matching 2h30m against:

例如匹配2h30m对:

(?P<hours>\d+(?:h))?(?P<minutes>\d+(?:m))?

would match {'hours': '2h', 'minutes': '30m'}, and not 2 and 30.

将匹配{'小时':'2h','分钟':'30m'},而不是2和30。

The workaround would be to use a positive lookahead assertions ((?=...)), but this doesn't update the state of the regex FSM so we have to repeat the h, m suffixes:

解决方法是使用正向前瞻断言((?= ...)),但这不会更新正则表达式FSM的状态,因此我们必须重复h,m后缀:

(?P<hours>\d+(?=h))?h?(?P<minutes>\d+(?=m))?m?

Is there a better way to do this?

有一个更好的方法吗?

1 个解决方案

#1


7  

Non-capturing groups don't "anti-capture" what they match and remove them from outer groups. They're just a way to group things together so you can apply quantifiers to them.

非捕获组不会“反捕获”它们匹配的内容并将其从外部组中删除。它们只是将事物分组在一起的一种方式,因此您可以将量词应用于它们。

To get the effect you want, you can rearrange the groups to put the non-capturing groups outside the capturing groups:

要获得所需的效果,可以重新排列组以将非捕获组置于捕获组之外:

(?:(?P<hours>\d+)h)?(?:(?P<minutes>\d+)m)?

#1


7  

Non-capturing groups don't "anti-capture" what they match and remove them from outer groups. They're just a way to group things together so you can apply quantifiers to them.

非捕获组不会“反捕获”它们匹配的内容并将其从外部组中删除。它们只是将事物分组在一起的一种方式,因此您可以将量词应用于它们。

To get the effect you want, you can rearrange the groups to put the non-capturing groups outside the capturing groups:

要获得所需的效果,可以重新排列组以将非捕获组置于捕获组之外:

(?:(?P<hours>\d+)h)?(?:(?P<minutes>\d+)m)?