正则表达式后面不使用量词('+'或'*')

时间:2022-06-11 21:45:06

I am trying to use lookbehinds in a regular expression and it doesn't seem to work as I expected. So, this is not my real usage, but to simplify I will put an example. Imagine I want to match "example" on a string that says "this is an example". So, according to my understanding of lookbehinds this should work:

我试着用一种常规的表达方式来使用“后视镜”,但它似乎并不像我预期的那样有效。这不是我的实际用法,为了简化,我举个例子。假设我想匹配字符串中的“example”,该字符串表示“这是一个示例”。所以,根据我对后视镜的理解,这应该是可行的:

(?<=this\sis\san\s*?)example

What this should do is find "this is an", then space characters and finally match the word "example". Now, it doesn't work and I don't understand why, is it impossible to use '+' or '*' inside lookbehinds?

这应该做的是找到“这是一个”,然后空格字符,最后匹配单词“example”。现在,它不起作用了,我不明白为什么,难道不能在后视镜里使用“+”或“*”吗?

I also tried those two and they work correctly, but don't fulfill my needs:

我也尝试了这两种方法,它们都是正确的,但是不能满足我的需要:

(?<=this\sis\san\s)example
this\sis\san\s*?example

I am using this site to test my regular expressions: http://gskinner.com/RegExr/

我正在使用这个站点测试我的正则表达式:http://gskinner.com/RegExr/

5 个解决方案

#1


18  

Many regular expression libraries do only allow strict expressions to be used in look behind assertions like:

许多正则表达式库只允许在查找断言后面使用严格的表达式,比如:

  • only match strings of the same fixed length: (?<=foo|bar|\s,\s) (three characters each)
  • 只匹配相同固定长度的字符串:(?<=foo|bar|\s,\s)(三个字符)
  • only match strings of fixed lengths: (?<=foobar|\r\n) (each branch with fixed length)
  • 只匹配固定长度的字符串:(?<=foobar|\r\n)(每个固定长度的分支)
  • only match strings with a upper bound length: (?<=\s{,4}) (up to four repetitions)
  • 只匹配上限长度为(?<=\s{,4})的字符串(最多4次)

The reason for these limitations are mainly because those libraries can’t process regular expressions backwards at all or only a limited subset.

这些限制的原因主要是因为这些库不能向后处理正则表达式,也不能只处理有限子集。

Another reason could be to avoid authors to build too complex regular expressions that are heavy to process as they have a so called pathological behavior (see also ReDoS).

另一个原因可能是避免作者构建过于复杂的、难以处理的正则表达式,因为它们具有所谓的病理行为(参见ReDoS)。

See also section about limitations of look-behind assertions on Regular-Expressions.info.

有关在常规表达式.info上查找隐藏断言的限制,请参见小节。

#2


11  

Hey if your not using python variable look behind assertion you can trick the regex engine by escaping the match and starting over by using \K.

嘿,如果您不使用python变量,请查看断言后面的内容,您可以通过转义匹配和使用\K重新启动来欺骗regex引擎。

This site explains it well .. http://www.phpfreaks.com/blog/pcre-regex-spotlight-k ..

这个网站解释得很好。http://www.phpfreaks.com/blog/pcre-regex-spotlight-k . .

But pretty much when you have an expression that you match and you want to get everything behind it using \K will force it to start over again...

但是当你有一个匹配的表达式,你想用\K来获取它背后的所有东西时,它就会强迫它重新开始……

Example:

例子:

string = '<a this is a tag> with some information <div this is another tag > LOOK FOR ME </div>'

matching /(\<a).+?(\<div).+?(\>)\K.+?(?=\<div)/ will cause the regex to restart after you match the ending div tag so the regex won't include that in the result. The (?=\div) will make the engine get everything in front of ending div tag

匹配/(\ )\ +?(?=\div)将使引擎在结束div标记前获得所有内容 ).+?(\

#3


4  

What Amber said is true, but you can work around it with another approach: A non-capturing parentheses group

Amber的话是对的,但是你可以用另一种方法来解决:一个非捕获的圆括号组

(?<=this\sis\san)(?:\s*)example

That make it a fixed length look behind, so it should work.

这使得它的长度是固定的,所以它应该是有效的。

#4


0  

Most regex engines don't support variable-length expressions for lookbehind assertions.

大多数regex引擎不支持查询后置断言的可变长度表达式。

#5


0  

You can use sub-expressions.

您可以使用子表达式。

(this\sis\san\s*?)(example)

So to retrieve group 2, "example", $2 for regex, or \2 if you're using a format string (like for python's re.sub)

要检索组2,“example”,regex为$2,如果使用格式字符串(如python的re.sub),则为$2

正则表达式后面不使用量词('+'或'*')

#1


18  

Many regular expression libraries do only allow strict expressions to be used in look behind assertions like:

许多正则表达式库只允许在查找断言后面使用严格的表达式,比如:

  • only match strings of the same fixed length: (?<=foo|bar|\s,\s) (three characters each)
  • 只匹配相同固定长度的字符串:(?<=foo|bar|\s,\s)(三个字符)
  • only match strings of fixed lengths: (?<=foobar|\r\n) (each branch with fixed length)
  • 只匹配固定长度的字符串:(?<=foobar|\r\n)(每个固定长度的分支)
  • only match strings with a upper bound length: (?<=\s{,4}) (up to four repetitions)
  • 只匹配上限长度为(?<=\s{,4})的字符串(最多4次)

The reason for these limitations are mainly because those libraries can’t process regular expressions backwards at all or only a limited subset.

这些限制的原因主要是因为这些库不能向后处理正则表达式,也不能只处理有限子集。

Another reason could be to avoid authors to build too complex regular expressions that are heavy to process as they have a so called pathological behavior (see also ReDoS).

另一个原因可能是避免作者构建过于复杂的、难以处理的正则表达式,因为它们具有所谓的病理行为(参见ReDoS)。

See also section about limitations of look-behind assertions on Regular-Expressions.info.

有关在常规表达式.info上查找隐藏断言的限制,请参见小节。

#2


11  

Hey if your not using python variable look behind assertion you can trick the regex engine by escaping the match and starting over by using \K.

嘿,如果您不使用python变量,请查看断言后面的内容,您可以通过转义匹配和使用\K重新启动来欺骗regex引擎。

This site explains it well .. http://www.phpfreaks.com/blog/pcre-regex-spotlight-k ..

这个网站解释得很好。http://www.phpfreaks.com/blog/pcre-regex-spotlight-k . .

But pretty much when you have an expression that you match and you want to get everything behind it using \K will force it to start over again...

但是当你有一个匹配的表达式,你想用\K来获取它背后的所有东西时,它就会强迫它重新开始……

Example:

例子:

string = '<a this is a tag> with some information <div this is another tag > LOOK FOR ME </div>'

matching /(\<a).+?(\<div).+?(\>)\K.+?(?=\<div)/ will cause the regex to restart after you match the ending div tag so the regex won't include that in the result. The (?=\div) will make the engine get everything in front of ending div tag

匹配/(\ )\ +?(?=\div)将使引擎在结束div标记前获得所有内容 ).+?(\

#3


4  

What Amber said is true, but you can work around it with another approach: A non-capturing parentheses group

Amber的话是对的,但是你可以用另一种方法来解决:一个非捕获的圆括号组

(?<=this\sis\san)(?:\s*)example

That make it a fixed length look behind, so it should work.

这使得它的长度是固定的,所以它应该是有效的。

#4


0  

Most regex engines don't support variable-length expressions for lookbehind assertions.

大多数regex引擎不支持查询后置断言的可变长度表达式。

#5


0  

You can use sub-expressions.

您可以使用子表达式。

(this\sis\san\s*?)(example)

So to retrieve group 2, "example", $2 for regex, or \2 if you're using a format string (like for python's re.sub)

要检索组2,“example”,regex为$2,如果使用格式字符串(如python的re.sub),则为$2

#3


4  

What Amber said is true, but you can work around it with another approach: A non-capturing parentheses group

Amber的话是对的,但是你可以用另一种方法来解决:一个非捕获的圆括号组

(?<=this\sis\san)(?:\s*)example

That make it a fixed length look behind, so it should work.

这使得它的长度是固定的,所以它应该是有效的。

#4


0  

Most regex engines don't support variable-length expressions for lookbehind assertions.

大多数regex引擎不支持查询后置断言的可变长度表达式。

#5


0  

You can use sub-expressions.

您可以使用子表达式。

(this\sis\san\s*?)(example)

So to retrieve group 2, "example", $2 for regex, or \2 if you're using a format string (like for python's re.sub)

要检索组2,“example”,regex为$2,如果使用格式字符串(如python的re.sub),则为$2