I am trying to use lookbehinds in a regular expression and it doesn't seem to work as I expected. So, this is not my real usage, but to simplify I will put an example. Imagine I want to match "example" on a string that says "this is an example". So, according to my understanding of lookbehinds this should work:
我试着用一种常规的表达方式来使用“后视镜”,但它似乎并不像我预期的那样有效。这不是我的实际用法,为了简化,我举个例子。假设我想匹配字符串中的“example”,该字符串表示“这是一个示例”。所以,根据我对后视镜的理解,这应该是可行的:
(?<=this\sis\san\s*?)example
What this should do is find "this is an", then space characters and finally match the word "example". Now, it doesn't work and I don't understand why, is it impossible to use '+' or '*' inside lookbehinds?
这应该做的是找到“这是一个”,然后空格字符,最后匹配单词“example”。现在,它不起作用了,我不明白为什么,难道不能在后视镜里使用“+”或“*”吗?
I also tried those two and they work correctly, but don't fulfill my needs:
我也尝试了这两种方法,它们都是正确的,但是不能满足我的需要:
(?<=this\sis\san\s)example
this\sis\san\s*?example
I am using this site to test my regular expressions: http://gskinner.com/RegExr/
我正在使用这个站点测试我的正则表达式:http://gskinner.com/RegExr/
5 个解决方案
#1
18
Many regular expression libraries do only allow strict expressions to be used in look behind assertions like:
许多正则表达式库只允许在查找断言后面使用严格的表达式,比如:
- only match strings of the same fixed length:
(?<=foo|bar|\s,\s)
(three characters each) - 只匹配相同固定长度的字符串:(?<=foo|bar|\s,\s)(三个字符)
- only match strings of fixed lengths:
(?<=foobar|\r\n)
(each branch with fixed length) - 只匹配固定长度的字符串:(?<=foobar|\r\n)(每个固定长度的分支)
- only match strings with a upper bound length:
(?<=\s{,4})
(up to four repetitions) - 只匹配上限长度为(?<=\s{,4})的字符串(最多4次)
The reason for these limitations are mainly because those libraries can’t process regular expressions backwards at all or only a limited subset.
这些限制的原因主要是因为这些库不能向后处理正则表达式,也不能只处理有限子集。
Another reason could be to avoid authors to build too complex regular expressions that are heavy to process as they have a so called pathological behavior (see also ReDoS).
另一个原因可能是避免作者构建过于复杂的、难以处理的正则表达式,因为它们具有所谓的病理行为(参见ReDoS)。
See also section about limitations of look-behind assertions on Regular-Expressions.info.
有关在常规表达式.info上查找隐藏断言的限制,请参见小节。
#2
11
Hey if your not using python variable look behind assertion you can trick the regex engine by escaping the match and starting over by using \K
.
嘿,如果您不使用python变量,请查看断言后面的内容,您可以通过转义匹配和使用\K重新启动来欺骗regex引擎。
This site explains it well .. http://www.phpfreaks.com/blog/pcre-regex-spotlight-k ..
这个网站解释得很好。http://www.phpfreaks.com/blog/pcre-regex-spotlight-k . .
But pretty much when you have an expression that you match and you want to get everything behind it using \K will force it to start over again...
但是当你有一个匹配的表达式,你想用\K来获取它背后的所有东西时,它就会强迫它重新开始……
Example:
例子:
string = '<a this is a tag> with some information <div this is another tag > LOOK FOR ME </div>'
matching /(\<a).+?(\<div).+?(\>)\K.+?(?=\<div)/
will cause the regex to restart after you match the ending div
tag so the regex won't include that in the result. The (?=\div)
will make the engine get everything in front of ending div tag
匹配/(\ )\ +?(?=\div)将使引擎在结束div标记前获得所有内容 ).+?(\
#3
4
What Amber said is true, but you can work around it with another approach: A non-capturing parentheses group
Amber的话是对的,但是你可以用另一种方法来解决:一个非捕获的圆括号组
(?<=this\sis\san)(?:\s*)example
That make it a fixed length look behind, so it should work.
这使得它的长度是固定的,所以它应该是有效的。
#4
0
Most regex engines don't support variable-length expressions for lookbehind assertions.
大多数regex引擎不支持查询后置断言的可变长度表达式。
#5
0
You can use sub-expressions.
您可以使用子表达式。
(this\sis\san\s*?)(example)
So to retrieve group 2, "example", $2
for regex, or \2
if you're using a format string (like for python's re.sub
)
要检索组2,“example”,regex为$2,如果使用格式字符串(如python的re.sub),则为$2
#1
18
Many regular expression libraries do only allow strict expressions to be used in look behind assertions like:
许多正则表达式库只允许在查找断言后面使用严格的表达式,比如:
- only match strings of the same fixed length:
(?<=foo|bar|\s,\s)
(three characters each) - 只匹配相同固定长度的字符串:(?<=foo|bar|\s,\s)(三个字符)
- only match strings of fixed lengths:
(?<=foobar|\r\n)
(each branch with fixed length) - 只匹配固定长度的字符串:(?<=foobar|\r\n)(每个固定长度的分支)
- only match strings with a upper bound length:
(?<=\s{,4})
(up to four repetitions) - 只匹配上限长度为(?<=\s{,4})的字符串(最多4次)
The reason for these limitations are mainly because those libraries can’t process regular expressions backwards at all or only a limited subset.
这些限制的原因主要是因为这些库不能向后处理正则表达式,也不能只处理有限子集。
Another reason could be to avoid authors to build too complex regular expressions that are heavy to process as they have a so called pathological behavior (see also ReDoS).
另一个原因可能是避免作者构建过于复杂的、难以处理的正则表达式,因为它们具有所谓的病理行为(参见ReDoS)。
See also section about limitations of look-behind assertions on Regular-Expressions.info.
有关在常规表达式.info上查找隐藏断言的限制,请参见小节。
#2
11
Hey if your not using python variable look behind assertion you can trick the regex engine by escaping the match and starting over by using \K
.
嘿,如果您不使用python变量,请查看断言后面的内容,您可以通过转义匹配和使用\K重新启动来欺骗regex引擎。
This site explains it well .. http://www.phpfreaks.com/blog/pcre-regex-spotlight-k ..
这个网站解释得很好。http://www.phpfreaks.com/blog/pcre-regex-spotlight-k . .
But pretty much when you have an expression that you match and you want to get everything behind it using \K will force it to start over again...
但是当你有一个匹配的表达式,你想用\K来获取它背后的所有东西时,它就会强迫它重新开始……
Example:
例子:
string = '<a this is a tag> with some information <div this is another tag > LOOK FOR ME </div>'
matching /(\<a).+?(\<div).+?(\>)\K.+?(?=\<div)/
will cause the regex to restart after you match the ending div
tag so the regex won't include that in the result. The (?=\div)
will make the engine get everything in front of ending div tag
匹配/(\ )\ +?(?=\div)将使引擎在结束div标记前获得所有内容 ).+?(\
#3
4
What Amber said is true, but you can work around it with another approach: A non-capturing parentheses group
Amber的话是对的,但是你可以用另一种方法来解决:一个非捕获的圆括号组
(?<=this\sis\san)(?:\s*)example
That make it a fixed length look behind, so it should work.
这使得它的长度是固定的,所以它应该是有效的。
#4
0
Most regex engines don't support variable-length expressions for lookbehind assertions.
大多数regex引擎不支持查询后置断言的可变长度表达式。
#5
0
You can use sub-expressions.
您可以使用子表达式。
(this\sis\san\s*?)(example)
So to retrieve group 2, "example", $2
for regex, or \2
if you're using a format string (like for python's re.sub
)
要检索组2,“example”,regex为$2,如果使用格式字符串(如python的re.sub),则为$2
#3
4
What Amber said is true, but you can work around it with another approach: A non-capturing parentheses group
Amber的话是对的,但是你可以用另一种方法来解决:一个非捕获的圆括号组
(?<=this\sis\san)(?:\s*)example
That make it a fixed length look behind, so it should work.
这使得它的长度是固定的,所以它应该是有效的。
#4
0
Most regex engines don't support variable-length expressions for lookbehind assertions.
大多数regex引擎不支持查询后置断言的可变长度表达式。
#5
0
You can use sub-expressions.
您可以使用子表达式。
(this\sis\san\s*?)(example)
So to retrieve group 2, "example", $2
for regex, or \2
if you're using a format string (like for python's re.sub
)
要检索组2,“example”,regex为$2,如果使用格式字符串(如python的re.sub),则为$2