由于负向前瞻的位置而导致的匹配差异?

时间:2021-05-28 15:19:25

I have plenty of confusion in regular expression and I am trying to solve them. Here I have the following string:

我在正则表达式中有很多困惑,我正试图解决它们。这里我有以下字符串:

{start}do or die{end}extended string

My two different regexes, where I only changed the position of the dot:

我的两个不同的正则表达式,我只改变了点的位置:

(.(?!{end}))* //returns: {start}do or di
                                      //^ See here
((?!{end}).)* //returns: {start}do or die
                                      //^ See here

Why does the first regex eats the last "e" ?

为什么第一个正则表达式吃最后一个“e”?

And also how does this negative lookahead make this * quantifier non greedy? I mean why it can't consume characters beyond {end}?

而且这种负面前瞻如何使这个*量词非贪婪?我的意思是为什么它不能消耗超过{end}的字符?

2 个解决方案

#1


2  

With your negative lookahead you say, that it is impossible to match the regex, which in your case is: {end}. And . captures everything except new line.

你说你的负面预测是不可能匹配正则表达式,在你的情况下是:{end}。并且。捕获除新线以外的所有内容。

So with your first regex:

所以你的第一个正则表达式:

(.(?!{end}))*

It leaves out the e, because: e{end} can't match because of the negative lookahead. While in your second regex, where you have the dot on the other side it can until: {end}d so the e is included in your second regex.

它省略了e,因为:e {end}由于负前瞻而无法匹配。在你的第二个正则表达式中,你在另一边有点,它可以直到:{end} d所以e包含在你的第二个正则表达式中。

#2


1  

i have figured a work flow for the regex engine for both the regex on completing the task...

我已经为完成任务的正则表达式计算了正则表达式引擎的工作流程...

First, for (.(?!{end}))* the approach for the regex engine as follows...

首先,对于(。(?!{end}))*正则表达式引擎的方法如下......

"{start}do or die{end}extended string"
^   .(dot) matches "{" and {end} tries to match here but fails.So "{" included
"{start}do or die{end}extended string"
 ^  . (dot) matches "s" and {end} tries to match here but fails.So "s" included

....
....so on...
"{start}do or die{end}extended string"
               ^ (dot) matches "e" and {end} here matches "{end}" so "e" is excluded..
so the match we get is "{start}do or di"

for the secodn regex ((?!{end}).)*....

对于secodn正则表达式((?!{end})。)* ....

"{start}do or die{end}extended string"
^ {end} regex tries to match here but fails to match.So dot consumes "{".

"{start}do or die{end}extended string"
 ^ {end} regex tries to match here but fails again.So dot consumes "s".

....
..so on..
"{start}do or die{end}extended string"
               ^   {end} regex tries to match here but fails.So dot consumes the "e"
"{start}do or die{end}extended string"
                ^   {end} regex tries to match here and succeed.So the whole regex fail here.

So we ended up with a match which is "{start}do or die"

#1


2  

With your negative lookahead you say, that it is impossible to match the regex, which in your case is: {end}. And . captures everything except new line.

你说你的负面预测是不可能匹配正则表达式,在你的情况下是:{end}。并且。捕获除新线以外的所有内容。

So with your first regex:

所以你的第一个正则表达式:

(.(?!{end}))*

It leaves out the e, because: e{end} can't match because of the negative lookahead. While in your second regex, where you have the dot on the other side it can until: {end}d so the e is included in your second regex.

它省略了e,因为:e {end}由于负前瞻而无法匹配。在你的第二个正则表达式中,你在另一边有点,它可以直到:{end} d所以e包含在你的第二个正则表达式中。

#2


1  

i have figured a work flow for the regex engine for both the regex on completing the task...

我已经为完成任务的正则表达式计算了正则表达式引擎的工作流程...

First, for (.(?!{end}))* the approach for the regex engine as follows...

首先,对于(。(?!{end}))*正则表达式引擎的方法如下......

"{start}do or die{end}extended string"
^   .(dot) matches "{" and {end} tries to match here but fails.So "{" included
"{start}do or die{end}extended string"
 ^  . (dot) matches "s" and {end} tries to match here but fails.So "s" included

....
....so on...
"{start}do or die{end}extended string"
               ^ (dot) matches "e" and {end} here matches "{end}" so "e" is excluded..
so the match we get is "{start}do or di"

for the secodn regex ((?!{end}).)*....

对于secodn正则表达式((?!{end})。)* ....

"{start}do or die{end}extended string"
^ {end} regex tries to match here but fails to match.So dot consumes "{".

"{start}do or die{end}extended string"
 ^ {end} regex tries to match here but fails again.So dot consumes "s".

....
..so on..
"{start}do or die{end}extended string"
               ^   {end} regex tries to match here but fails.So dot consumes the "e"
"{start}do or die{end}extended string"
                ^   {end} regex tries to match here and succeed.So the whole regex fail here.

So we ended up with a match which is "{start}do or die"