为什么这个正则表达式模式不能按预期工作?

时间:2022-05-02 21:46:20

I needed a regex pattern to catch any 16 digit string of numbers (each four number group separated by a hyphen) without any number being repeated more than 3 times, with or without hyphens in between.

我需要一个正则表达式模式来捕获任何16位数字的数字(每个四个数字组用连字符分隔),没有任何数字重复超过3次,其间有或没有连字符。

So the pattern I wrote is

所以我写的模式是

a=re.compile(r'(?!(\d)\-?\1\-?\1\-?\1)(^d{4}\-?\d{4}\-?\d{4}\-?\d{4}$)')

But the example "5133-3367-8912-3456" gets matched even when 3 is repeated 4 times. (What is the problem with the negative lookahead section?)

但是,即使3重复4次,示例“5133-3367-8912-3456”也会匹配。 (负前瞻部分有什么问题?)

1 个解决方案

#1


2  

Lookaheads only do the check at the position they are at, so in your case at the start of the string. If you want a lookahead to basically check the whole string, if a certain pattern can or can't be matched, you can add .* in front to make go deeper into the string.

Lookaheads只在它们所处的位置进行检查,所以在你的情况下,在字符串的开头。如果你想要一个前瞻基本上检查整个字符串,如果某个模式可以匹配或不匹配,你可以在前面添加。*以深入了解字符串。

In your case, you could change it to r'(?!.*(\d)\-?\1\-?\1\-?\1)(^d{4}\-?\d{4}\-?\d{4}\-?\d{4}$)'.

在你的情况下,你可以将它改为r'(?!。*(\ d)\ - ?\ 1 \ - ?\ 1 \ - ?\ 1)(^ d {4} \ - ?\ d {4} \ - ?\ d {4} \ - ?\ d {4} $)”。

There is also no need to escape the minus at the position they are at and I would move the lookahead right after the ^. I don't know how well python regexes are optimized, but that way the start of the string anchor is matched first (only 1 valid position) instead of checking the lookahead at any place just to fail the match at ^. This would give r'^(?!.*(\d)-?\1-?\1-?\1)(\d{4}-?\d{4}-?\d{4}-?\d{4}$)'

也没有必要在他们所处的位置逃避减号,我会在^之后移动前瞻。我不知道python正则表达式的优化程度如何,但是这样首先匹配字符串锚点的开始(只有1个有效位置),而不是在任何地方检查前瞻,只是为了使匹配失败^。这将给出r'^(?!。*(\ d) - ?\ 1 - ?\ 1 - ?\ 1)(\ d {4} - ?\ d {4} - ?\ d {4} - ? \ d {4} $)”

#1


2  

Lookaheads only do the check at the position they are at, so in your case at the start of the string. If you want a lookahead to basically check the whole string, if a certain pattern can or can't be matched, you can add .* in front to make go deeper into the string.

Lookaheads只在它们所处的位置进行检查,所以在你的情况下,在字符串的开头。如果你想要一个前瞻基本上检查整个字符串,如果某个模式可以匹配或不匹配,你可以在前面添加。*以深入了解字符串。

In your case, you could change it to r'(?!.*(\d)\-?\1\-?\1\-?\1)(^d{4}\-?\d{4}\-?\d{4}\-?\d{4}$)'.

在你的情况下,你可以将它改为r'(?!。*(\ d)\ - ?\ 1 \ - ?\ 1 \ - ?\ 1)(^ d {4} \ - ?\ d {4} \ - ?\ d {4} \ - ?\ d {4} $)”。

There is also no need to escape the minus at the position they are at and I would move the lookahead right after the ^. I don't know how well python regexes are optimized, but that way the start of the string anchor is matched first (only 1 valid position) instead of checking the lookahead at any place just to fail the match at ^. This would give r'^(?!.*(\d)-?\1-?\1-?\1)(\d{4}-?\d{4}-?\d{4}-?\d{4}$)'

也没有必要在他们所处的位置逃避减号,我会在^之后移动前瞻。我不知道python正则表达式的优化程度如何,但是这样首先匹配字符串锚点的开始(只有1个有效位置),而不是在任何地方检查前瞻,只是为了使匹配失败^。这将给出r'^(?!。*(\ d) - ?\ 1 - ?\ 1 - ?\ 1)(\ d {4} - ?\ d {4} - ?\ d {4} - ? \ d {4} $)”