为什么这个正则表达式前瞻不起作用?

时间:2022-06-11 21:45:12

I am designing a regex to use in some IIS Url Rewrites. The intent is to capture urls which:

我正在设计一个在一些IIS Url重写中使用的正则表达式。目的是捕获以下网址:

  1. Are not just a file (as identified by containing a period) in the root directory, and
  2. 不仅仅是根目录中的文件(通过包含句点标识),以及

  3. Do not contain a querystring, and
  4. 不包含查询字符串,和

  5. Do not belong to a specific set of sub-directories, specifically "Account" and "Public"
  6. 不属于特定的子目录集,特别是“帐户”和“公共”

My current regex looks like:

我目前的正则表达式如下:

^(?!(Account)|(Public))([^./]+)(/[^?]*)?$

Using RegexPal with the test set of:

使用RegexPal测试集:

file.aspx
Account/otherfile.aspx
Public/otherfile.aspx
otherfolder1/otherfile.aspx?stuff=otherstuff
otherfolder2/otherfolder/otherfile.aspx
otherfolder3/
otherfolder4

My regex correctly ignores the first two cases, but it is still matching on the third case. What is wrong with the lookahead here?

我的正则表达式正确地忽略了前两种情况,但它仍然匹配第三种情况。前瞻有什么问题?

4 个解决方案

#1


3  

I couldn't resist trying to come up with something that would work in RegExPal (did not succeed - Edit: just verified and this does work in RegExPal) but I thought I would throw this out there as another way to do what you need, that may be a little easier to understand:

我无法抗拒试图提出一些可以在RegExPal中工作的东西(没有成功 - 编辑:刚刚验证过,这确实在RegExPal中有效)但是我想我会把它作为另一种方式来做你需要的东西,这可能更容易理解:

^(?!Account|Public|[a-zA-Z_0-9]+\.)[a-zA-Z_0-9/.]+$

Explained:

^                   # start
(?!                 # open a negative lookahead
Account|Public|     # ignore both Account and Public
[a-zA-Z_0-9]+\.     # ignore files in root (i.e., letters/numbers, followed by period)
)                   # close negative lookahead
[a-zA-Z_0-9/.]+     # now match anything with letters/numbers, periods and slashes, but no '?' (ignores URLs with query string)
$                   # end

#2


1  

RegexPal is confused, but the real problem is that the regex isin't designed correctly.

RegexPal很困惑,但真正的问题是正则表达式设计不正确。

Not sure what you are trying to do but when using multi-line mode and the anchors ^$
within a regex, unless you specifically design it that way, care must be taken NOT to
overflow the anchors. This applies to both greedy/non-greedy quantifiers.
Its made even worse when throwing negative lookahead conditions into the mix.

不确定你要做什么,但是当在正则表达式中使用多线模式和锚点^ $时,除非你专门设计它,否则必须注意不要溢出锚点。这适用于贪婪/非贪婪量词。当将负面前瞻条件投入混合物时,它变得更糟。

In this case, it caused RegexPal to go bonkers and apparently backtrack before ^
without reevaluating ^ again. This is probably not a JavaScript problem though.

在这种情况下,它导致RegexPal去疯狂,显然在^之前回溯,而不再重新评估^。这可能不是JavaScript问题。

Adding not newline to your consumption classes fixes all the problems. It must be
added to both classes.

在您的消费类中添加非换行符可以解决所有问题。必须将它添加到两个类中。

^(?!Account|Public)[^./\n]+(?:/[^?\n]*)?$

#3


1  

As reported by sln, the problem with these tests in RegexPal is that running a multi-line test enables multiple lines to group together to create a single match when they otherwise shouldn't.

正如sln所报告的那样,RegexPal中这些测试的问题在于,运行多行测试可以将多行组合在一起以创建单个匹配,否则它们不会。

The regex is fine for the purposes that it is designed to fulfill. It's actually overkill. For IIS Rewrites and Redirects, if you are using the IIS URL Rewrite Module, you have the option of specifying conditions on which it will or will not accept matches. Some of those options include:

正则表达式适用于它旨在实现的目的。这实际上是矫枉过正的。对于IIS重写和重定向,如果您使用的是IIS URL重写模块,则可以选择指定它将接受或不接受匹配的条件。其中一些选项包括:

  • Item is not a physical file
  • 项目不是物理文件

  • Item is not a physical directory
  • 项目不是物理目录

  • Item does (or does not) match a secondary pattern
  • 项目与(或不匹配)匹配辅助模式

These will achieve the desired effect more completely than the negative-lookahead.

这些将比负面前瞻更完全地实现期望的效果。

#4


0  

Maybe you wanted to use ^(?!Account|Public)([^\.\/]+\/[^\?]*)$ regex.

也许你想使用^(?!Account | Public)([^ \。\ /] + \ / [^ \?] *)$ regex。

Take a look here: http://ideone.com/q3lAv

看看这里:http://ideone.com/q3lAv

Then correct RegExPal pattern would be ^(?!Account|Public)([^\.\/]+\/[^\?\n]*)$

那么正确的RegExPal模式将是^(?!Account | Public)([^ \。\ /] + \ / [^ \?\ n] *)$


[UPDATE]

Filename doesn't have to include dot . in its name and on the other hand folder/directory name may have dot . in its name, but if you want to have a positive match also on 7th line, then you should go with the pattern ^(?!Account|Public)([^\.\/]+(?:\/[^\?]*|[^\.\?]*))$ and it should work also as the RegExPal pattern.

文件名不必包含点。在其名称和另一方面文件夹/目录名称可能有点。在它的名字中,但是如果你想在第7行也有积极的匹配,那么你应该使用模式^(?!Account | Public)([^ \。\ /] +(?:\ / [^ \ ?] * | [^ \。\?] *))$,它也应该作为RegExPal模式。

Take a look here: http://ideone.com/VcmEP

看看这里:http://ideone.com/VcmEP

#1


3  

I couldn't resist trying to come up with something that would work in RegExPal (did not succeed - Edit: just verified and this does work in RegExPal) but I thought I would throw this out there as another way to do what you need, that may be a little easier to understand:

我无法抗拒试图提出一些可以在RegExPal中工作的东西(没有成功 - 编辑:刚刚验证过,这确实在RegExPal中有效)但是我想我会把它作为另一种方式来做你需要的东西,这可能更容易理解:

^(?!Account|Public|[a-zA-Z_0-9]+\.)[a-zA-Z_0-9/.]+$

Explained:

^                   # start
(?!                 # open a negative lookahead
Account|Public|     # ignore both Account and Public
[a-zA-Z_0-9]+\.     # ignore files in root (i.e., letters/numbers, followed by period)
)                   # close negative lookahead
[a-zA-Z_0-9/.]+     # now match anything with letters/numbers, periods and slashes, but no '?' (ignores URLs with query string)
$                   # end

#2


1  

RegexPal is confused, but the real problem is that the regex isin't designed correctly.

RegexPal很困惑,但真正的问题是正则表达式设计不正确。

Not sure what you are trying to do but when using multi-line mode and the anchors ^$
within a regex, unless you specifically design it that way, care must be taken NOT to
overflow the anchors. This applies to both greedy/non-greedy quantifiers.
Its made even worse when throwing negative lookahead conditions into the mix.

不确定你要做什么,但是当在正则表达式中使用多线模式和锚点^ $时,除非你专门设计它,否则必须注意不要溢出锚点。这适用于贪婪/非贪婪量词。当将负面前瞻条件投入混合物时,它变得更糟。

In this case, it caused RegexPal to go bonkers and apparently backtrack before ^
without reevaluating ^ again. This is probably not a JavaScript problem though.

在这种情况下,它导致RegexPal去疯狂,显然在^之前回溯,而不再重新评估^。这可能不是JavaScript问题。

Adding not newline to your consumption classes fixes all the problems. It must be
added to both classes.

在您的消费类中添加非换行符可以解决所有问题。必须将它添加到两个类中。

^(?!Account|Public)[^./\n]+(?:/[^?\n]*)?$

#3


1  

As reported by sln, the problem with these tests in RegexPal is that running a multi-line test enables multiple lines to group together to create a single match when they otherwise shouldn't.

正如sln所报告的那样,RegexPal中这些测试的问题在于,运行多行测试可以将多行组合在一起以创建单个匹配,否则它们不会。

The regex is fine for the purposes that it is designed to fulfill. It's actually overkill. For IIS Rewrites and Redirects, if you are using the IIS URL Rewrite Module, you have the option of specifying conditions on which it will or will not accept matches. Some of those options include:

正则表达式适用于它旨在实现的目的。这实际上是矫枉过正的。对于IIS重写和重定向,如果您使用的是IIS URL重写模块,则可以选择指定它将接受或不接受匹配的条件。其中一些选项包括:

  • Item is not a physical file
  • 项目不是物理文件

  • Item is not a physical directory
  • 项目不是物理目录

  • Item does (or does not) match a secondary pattern
  • 项目与(或不匹配)匹配辅助模式

These will achieve the desired effect more completely than the negative-lookahead.

这些将比负面前瞻更完全地实现期望的效果。

#4


0  

Maybe you wanted to use ^(?!Account|Public)([^\.\/]+\/[^\?]*)$ regex.

也许你想使用^(?!Account | Public)([^ \。\ /] + \ / [^ \?] *)$ regex。

Take a look here: http://ideone.com/q3lAv

看看这里:http://ideone.com/q3lAv

Then correct RegExPal pattern would be ^(?!Account|Public)([^\.\/]+\/[^\?\n]*)$

那么正确的RegExPal模式将是^(?!Account | Public)([^ \。\ /] + \ / [^ \?\ n] *)$


[UPDATE]

Filename doesn't have to include dot . in its name and on the other hand folder/directory name may have dot . in its name, but if you want to have a positive match also on 7th line, then you should go with the pattern ^(?!Account|Public)([^\.\/]+(?:\/[^\?]*|[^\.\?]*))$ and it should work also as the RegExPal pattern.

文件名不必包含点。在其名称和另一方面文件夹/目录名称可能有点。在它的名字中,但是如果你想在第7行也有积极的匹配,那么你应该使用模式^(?!Account | Public)([^ \。\ /] +(?:\ / [^ \ ?] * | [^ \。\?] *))$,它也应该作为RegExPal模式。

Take a look here: http://ideone.com/VcmEP

看看这里:http://ideone.com/VcmEP