Python正则表达式问号运算符不工作?

时间:2021-01-25 22:25:28
import re
str='abc defg'
m1 = re.match(".*(def)?",str)
m2 = re.match(".*(def)",str)
print (m1.group(1),m2.group(1))

The output of the above is:

上述输出为:

(None, 'def')

(没有“def”)

What is going on? Even with a non-greedy repetition operator, the optional capture group (def)? is not matched.

什么是怎么回事?即使使用非贪婪的重复操作符,可选捕获组(def)?不匹配。

1 个解决方案

#1


13  

Here's what happens when the regex engine tries to match .*(def) against abc defg:

下面是regex引擎试图与abc defg匹配时发生的情况:

  • First, the engine starts trying to match the regex at the beginning of the string.
  • 首先,引擎开始尝试匹配字符串开头的regex。
  • The greedy subpattern .* initially tries to match as many times as it can, matching the entire string.
  • 贪婪子模式。*最初尝试尽可能多地匹配,以匹配整个字符串。
  • Since this causes the rest of the match to fail, the regex engine backtracks until it finds a way to match the (def), which happens when the .* matches only abc .
  • 由于这会导致其余的匹配失败,所以regex引擎将后退,直到找到匹配(def)的方法,这在.*仅匹配abc时发生。

However, if we change the regex to .*(def)?, the following happens instead:

但是,如果我们将regex更改为.*(def)?,改为:

  • First, the regex engine again starts at the beginning of the string.
  • 首先,regex引擎再次从字符串的开头开始。
  • Next, it again tries to match .* as many times as possible, matching the entire string.
  • 接下来,它再次尝试匹配。*尽可能多地匹配整个字符串。
  • But at that point, since all the rest of the regex is optional, it has found a match for the entire regex! Since (def)? is greedy, the engine would prefer to match it if it could, but it's not going to backtrack earlier subpatterns just to see if it can. Instead, it just lets the .* gobble up the entire string, leaving nothing for (def)?.
  • 但此时,由于所有regex的其余部分都是可选的,所以它找到了与整个regex匹配的结果!自(def)?是贪心的,引擎更愿意匹配它,如果可能的话,但它不会仅仅为了看看它是否可以,而回溯早期的子模式。相反,它只是让。*狼吞虎咽整个字符串,没有留下什么(def)?

Something similar happens with .*?(def) and .*?(def)?:

*?(def)和.*?(def)?:

  • Again, the engine starts at the beginning of the string.
  • 同样,引擎从弦的开始开始。
  • The ungreedy subpattern .*? tries to match as few times as it can, i.e. not at all.
  • ungreedy子模式。* ?尽可能少地匹配,也就是说根本不匹配。
  • At that point, (def) cannot match, but (def)? can. Thus, for (def) the regex engine has to go back and consider longer matches for .*? until it finds one that lets the full pattern match, whereas for (def)? it doesn't have to do that, and so it doesn't.
  • 此时(def)不能匹配,但是(def)?可以。因此,对于(def), regex引擎必须返回并考虑更长的匹配。*?直到找到一个让整个模式匹配的模式,而对于(def)?它不需要这么做,所以它不需要。

For more information, see the "Combining RE Pieces" section of the Perl regular expressions manual (which matches the behavior of Python's "Perl-compatible" regular expressions).

有关更多信息,请参阅Perl正则表达式手册(它匹配Python的“Perl兼容”正则表达式的行为)的“组合RE片段”一节。

#1


13  

Here's what happens when the regex engine tries to match .*(def) against abc defg:

下面是regex引擎试图与abc defg匹配时发生的情况:

  • First, the engine starts trying to match the regex at the beginning of the string.
  • 首先,引擎开始尝试匹配字符串开头的regex。
  • The greedy subpattern .* initially tries to match as many times as it can, matching the entire string.
  • 贪婪子模式。*最初尝试尽可能多地匹配,以匹配整个字符串。
  • Since this causes the rest of the match to fail, the regex engine backtracks until it finds a way to match the (def), which happens when the .* matches only abc .
  • 由于这会导致其余的匹配失败,所以regex引擎将后退,直到找到匹配(def)的方法,这在.*仅匹配abc时发生。

However, if we change the regex to .*(def)?, the following happens instead:

但是,如果我们将regex更改为.*(def)?,改为:

  • First, the regex engine again starts at the beginning of the string.
  • 首先,regex引擎再次从字符串的开头开始。
  • Next, it again tries to match .* as many times as possible, matching the entire string.
  • 接下来,它再次尝试匹配。*尽可能多地匹配整个字符串。
  • But at that point, since all the rest of the regex is optional, it has found a match for the entire regex! Since (def)? is greedy, the engine would prefer to match it if it could, but it's not going to backtrack earlier subpatterns just to see if it can. Instead, it just lets the .* gobble up the entire string, leaving nothing for (def)?.
  • 但此时,由于所有regex的其余部分都是可选的,所以它找到了与整个regex匹配的结果!自(def)?是贪心的,引擎更愿意匹配它,如果可能的话,但它不会仅仅为了看看它是否可以,而回溯早期的子模式。相反,它只是让。*狼吞虎咽整个字符串,没有留下什么(def)?

Something similar happens with .*?(def) and .*?(def)?:

*?(def)和.*?(def)?:

  • Again, the engine starts at the beginning of the string.
  • 同样,引擎从弦的开始开始。
  • The ungreedy subpattern .*? tries to match as few times as it can, i.e. not at all.
  • ungreedy子模式。* ?尽可能少地匹配,也就是说根本不匹配。
  • At that point, (def) cannot match, but (def)? can. Thus, for (def) the regex engine has to go back and consider longer matches for .*? until it finds one that lets the full pattern match, whereas for (def)? it doesn't have to do that, and so it doesn't.
  • 此时(def)不能匹配,但是(def)?可以。因此,对于(def), regex引擎必须返回并考虑更长的匹配。*?直到找到一个让整个模式匹配的模式,而对于(def)?它不需要这么做,所以它不需要。

For more information, see the "Combining RE Pieces" section of the Perl regular expressions manual (which matches the behavior of Python's "Perl-compatible" regular expressions).

有关更多信息,请参阅Perl正则表达式手册(它匹配Python的“Perl兼容”正则表达式的行为)的“组合RE片段”一节。