PCRE负向前瞻产生意外匹配

时间:2022-04-06 15:20:00

I want a Perl regular expression to match std::foo but not match std::foo::bar. This is what I have so far:

我想要一个Perl正则表达式匹配std :: foo但不匹配std :: foo :: bar。这是我到目前为止:

/((?<!\w)([A-Za-z0-9_]+)::([A-Za-z0-9_]+))(?!:)/

This matches std::foo::bar as far as std::fo, but I want the whole match to fail for this input, not give a partial match.

这与std :: fo匹配std :: foo :: bar,但是我希望整个匹配对于此输入失败,而不是给出部分匹配。

What regex to I actually want?

我真正想要什么正则表达式?

4 个解决方案

#1


4  

This solution uses a possessive quantifer on the foo part of the pattern \w++. That means it will refuse to backtrack after finding a series of "word" characters, even if the rest of the pattern -- the negative look-ahead -- then fails. I've also had to change the negative look-behind to reject word characters or colons : to prevent things like baz::std::foo from matching

该解决方案在模式\ w ++的foo部分使用所有格量化器。这意味着它会在找到一系列“单词”字符后拒绝回溯,即使模式的其余部分 - 负面预测 - 然后失败。我还必须改变负面的后视来拒绝单词字符或冒号:以防止像baz :: std :: foo这样的东西匹配

It is mostly a tidy-up of the answer from Sebastian Proske. It uses \w instead of the literal character class, adds layout using the /x modifier, and removes unnecessary parentheses. It also provides a working example

塞巴斯蒂安·普罗克斯的回答大多是一个整理。它使用\ w而不是文字字符类,使用/ x修饰符添加布局,并删除不必要的括号。它还提供了一个工作示例

use strict;
use warnings 'all';
use feature 'say';

my $s = 'match std::foo but not match std::foo::bar.';

say $1 while $s =~ / (?<![\w:]) ( \w+::\w++) (?!:) /gx;

output

std::foo

#2


1  

Other than \b you could also:

除了\ b你还可以:

  • use possessive matching to avoid backtracking into foo: ((?<!\w)([A-Za-z0-9_]+)::([A-Za-z0-9_]++))(?!:)
  • 使用所有格匹配来避免回溯到foo:((?

  • add the word character class to the lookahead, so it can't backtrack into foo: ((?<!\w)([A-Za-z0-9_]+)::([A-Za-z0-9_]+))(?![:\w])
  • 在前瞻中添加单词character class,因此无法回溯到foo :((?

#3


1  

Just add \b before the negative lookahead which ensures that there is an word boundary exisst and also don't forget to add : in the first negative lookahead. Otherwise it would match the second part.

只需在负前瞻之前添加\ b,以确保存在一个单词边界,并且不要忘记在第一个负向前瞻中添加:否则它将匹配第二部分。

((?<![:\w])([A-Za-z0-9_]+)::([A-Za-z0-9_]+))\b(?!:)

OR

This would match the import string only if it's not preceded by a non-space character.

仅当导入字符串前面没有非空格字符时,才匹配导入字符串。

(?<!\S)([A-Za-z0-9_]+)::([A-Za-z0-9_]+)\b(?!:)

DEMO

#4


-1  

To match this whole string std::foo::bar instead of partial values We can use this regex

要匹配整个字符串std :: foo :: bar而不是部分值我们可以使用此正则表达式

((\b(.+)\b)+?[::]{2}.+)

It will give you two matches

它会给你两场比赛

  • std::foo
  • std::foo::bar

#1


4  

This solution uses a possessive quantifer on the foo part of the pattern \w++. That means it will refuse to backtrack after finding a series of "word" characters, even if the rest of the pattern -- the negative look-ahead -- then fails. I've also had to change the negative look-behind to reject word characters or colons : to prevent things like baz::std::foo from matching

该解决方案在模式\ w ++的foo部分使用所有格量化器。这意味着它会在找到一系列“单词”字符后拒绝回溯,即使模式的其余部分 - 负面预测 - 然后失败。我还必须改变负面的后视来拒绝单词字符或冒号:以防止像baz :: std :: foo这样的东西匹配

It is mostly a tidy-up of the answer from Sebastian Proske. It uses \w instead of the literal character class, adds layout using the /x modifier, and removes unnecessary parentheses. It also provides a working example

塞巴斯蒂安·普罗克斯的回答大多是一个整理。它使用\ w而不是文字字符类,使用/ x修饰符添加布局,并删除不必要的括号。它还提供了一个工作示例

use strict;
use warnings 'all';
use feature 'say';

my $s = 'match std::foo but not match std::foo::bar.';

say $1 while $s =~ / (?<![\w:]) ( \w+::\w++) (?!:) /gx;

output

std::foo

#2


1  

Other than \b you could also:

除了\ b你还可以:

  • use possessive matching to avoid backtracking into foo: ((?<!\w)([A-Za-z0-9_]+)::([A-Za-z0-9_]++))(?!:)
  • 使用所有格匹配来避免回溯到foo:((?

  • add the word character class to the lookahead, so it can't backtrack into foo: ((?<!\w)([A-Za-z0-9_]+)::([A-Za-z0-9_]+))(?![:\w])
  • 在前瞻中添加单词character class,因此无法回溯到foo :((?

#3


1  

Just add \b before the negative lookahead which ensures that there is an word boundary exisst and also don't forget to add : in the first negative lookahead. Otherwise it would match the second part.

只需在负前瞻之前添加\ b,以确保存在一个单词边界,并且不要忘记在第一个负向前瞻中添加:否则它将匹配第二部分。

((?<![:\w])([A-Za-z0-9_]+)::([A-Za-z0-9_]+))\b(?!:)

OR

This would match the import string only if it's not preceded by a non-space character.

仅当导入字符串前面没有非空格字符时,才匹配导入字符串。

(?<!\S)([A-Za-z0-9_]+)::([A-Za-z0-9_]+)\b(?!:)

DEMO

#4


-1  

To match this whole string std::foo::bar instead of partial values We can use this regex

要匹配整个字符串std :: foo :: bar而不是部分值我们可以使用此正则表达式

((\b(.+)\b)+?[::]{2}.+)

It will give you two matches

它会给你两场比赛

  • std::foo
  • std::foo::bar