正则表达式如何'(?

时间:2022-03-12 13:22:27

I have the following regex in a C# program, and have difficulties understanding it:

在c#程序中,我有以下regex,并且很难理解它:

(?<=#)[^#]+(?=#)

I'll break it down to what I think I understood:

我将把它分解成我所理解的:

(?<=#)    a group, matching a hash. what's `?<=`?
[^#]+     one or more non-hashes (used to achieve non-greediness)
(?=#)     another group, matching a hash. what's the `?=`?

So the problem I have is the ?<= and ?< part. From reading MSDN, ?<name> is used for naming groups, but in this case the angle bracket is never closed.

我的问题是?<=和? <部分。从读取msdn, ? 用于命名组,但在本例中,尖括号从不关闭。

I couldn't find ?= in the docs, and searching for it is really difficult, because search engines will mostly ignore those special chars.

我在文档中找不到?=,搜索它真的很困难,因为搜索引擎通常会忽略那些特殊的字符。

3 个解决方案

#1


31  

They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:

他们被称为看看;它们允许您断言模式是否匹配,而无需实际匹配。有4个基本的看点:

  • Positive lookarounds: see if we CAN match the pattern...
    • (?=pattern) - ... to the right of current position (look ahead)
    • (? =模式)-…在当前位置右侧(向前看)
    • (?<=pattern) - ... to the left of current position (look behind)
    • (? < =模式)-…在当前位置的左边(向后看)
  • 积极的观察:看看我们是否能匹配这个模式……(? =模式)-…在当前位置的右边(向前看)(?<=图案)-…在当前位置的左边(向后看)
  • Negative lookarounds - see if we can NOT match the pattern
    • (?!pattern) - ... to the right
    • (? !模式)-…向右
    • (?<!pattern) - ... to the left
    • (? < !模式)-…左边
  • 消极的lookarounds -看看我们是否不能匹配模式(?!模式)-…向右(?

As an easy reminder, for a lookaround:

作为一个简单的提醒,看看周围:

  • = is positive, ! is negative
  • =是正的,!是负的
  • < is look behind, otherwise it's look ahead
  • <是向后看,否则就是向前看< li>

References


But why use lookarounds?

One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)# will do the job just fine (extracting the string captured by \1 to get the non-#).

或许有人会说,看看在上面的模式并不是必要的,和#([^ #]+)#将做这项工作得很好(提取字符串被\ 1的非#)。

Not quite. The difference is that since a lookaround doesn't match the #, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.

不完全是。不同之处在于,由于lookaround与#不匹配,因此可以在下一次查找匹配的尝试中再次“使用”它。简单地说,查找允许“匹配”重叠。

Consider the following input string:

考虑以下输入字符串:

and #one# and #two# and #three#four#

Now, #([a-z]+)# will give the following matches (as seen on rubular.com):

现在,#([a-z]+)#将给出以下匹配(如rubular.com上看到的):

and #one# and #two# and #three#four#
    \___/     \___/     \_____/

Compare this with (?<=#)[a-z]+(?=#), which matches:

比较这个和(? < = #)[a - z]+(? = #),匹配:

and #one# and #two# and #three#four#
     \_/       \_/       \___/ \__/

Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with #([a-z]+)(?=#), which matches (as seen on rubular.com):

不幸的是,这不能在rubular.com上演示,因为它不支持lookbehind。但是,它确实支持lookahead,所以我们可以做一些类似于#([a-z]+)(?=#)的事情,它匹配(如在rubular.com上看到的):

and #one# and #two# and #three#four#
    \__/      \__/      \____/\___/

References

#2


4  

As another poster mentioned, these are lookarounds, special constructs for changing what gets matched and when. This says:

正如另一个海报所提到的,这些都是变通的,特殊的结构,用来改变什么时候匹配什么。这表示:

(?<=#)    match but don't capture, the string `#`
            when followed by the next expression

[^#]+     one or more characters that are not `#`, and

(?=#)     match but don't capture, the string `#`
            when preceded by the last expression

So this will match all the characters in between two #s.

所以这将匹配两个#之间的所有字符。

Lookaheads and lookbehinds are very useful in many cases. Consider, for example, the rule "match all bs not followed by an a." Your first attempt might be something like b[^a], but that's not right: this will also match the bu in bus or the bo in boy, but you only wanted the b. And it won't match the b in cab, even though that's not followed by an a, because there are no more characters to match.

在许多情况下,傻瓜和后视镜是非常有用的。考虑一下,例如,规则“匹配所有不跟随a的b”。你的第一次尝试可能是类似b[^],但这是不正确的:这也将匹配总线的布鲁里溃疡或bo的男孩,但是你只希望b。和它不会匹配b在出租车,虽然这不是紧随其后的是一个,因为没有更多的字符匹配。

To do that correctly, you need a lookahead: b(?!a). This says "match a b but don't match an a afterwards, and don't make that part of the match". Thus it'll match just the b in bolo, which is what you want; likewise it'll match the b in cab.

要正确地做到这一点,你需要有一个前瞻性:b(?!a)。这里写的是"匹配a b,但之后不要匹配a,也不要匹配a "这样它就会匹配上波洛的b,这就是你想要的;同样,它也会匹配出租车里的b。

#3


1  

They're called look-arounds: http://www.regular-expressions.info/lookaround.html

它们被称为并且:http://www.regular-expressions.info/lookaround.html

#1


31  

They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:

他们被称为看看;它们允许您断言模式是否匹配,而无需实际匹配。有4个基本的看点:

  • Positive lookarounds: see if we CAN match the pattern...
    • (?=pattern) - ... to the right of current position (look ahead)
    • (? =模式)-…在当前位置右侧(向前看)
    • (?<=pattern) - ... to the left of current position (look behind)
    • (? < =模式)-…在当前位置的左边(向后看)
  • 积极的观察:看看我们是否能匹配这个模式……(? =模式)-…在当前位置的右边(向前看)(?<=图案)-…在当前位置的左边(向后看)
  • Negative lookarounds - see if we can NOT match the pattern
    • (?!pattern) - ... to the right
    • (? !模式)-…向右
    • (?<!pattern) - ... to the left
    • (? < !模式)-…左边
  • 消极的lookarounds -看看我们是否不能匹配模式(?!模式)-…向右(?

As an easy reminder, for a lookaround:

作为一个简单的提醒,看看周围:

  • = is positive, ! is negative
  • =是正的,!是负的
  • < is look behind, otherwise it's look ahead
  • <是向后看,否则就是向前看< li>

References


But why use lookarounds?

One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)# will do the job just fine (extracting the string captured by \1 to get the non-#).

或许有人会说,看看在上面的模式并不是必要的,和#([^ #]+)#将做这项工作得很好(提取字符串被\ 1的非#)。

Not quite. The difference is that since a lookaround doesn't match the #, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.

不完全是。不同之处在于,由于lookaround与#不匹配,因此可以在下一次查找匹配的尝试中再次“使用”它。简单地说,查找允许“匹配”重叠。

Consider the following input string:

考虑以下输入字符串:

and #one# and #two# and #three#four#

Now, #([a-z]+)# will give the following matches (as seen on rubular.com):

现在,#([a-z]+)#将给出以下匹配(如rubular.com上看到的):

and #one# and #two# and #three#four#
    \___/     \___/     \_____/

Compare this with (?<=#)[a-z]+(?=#), which matches:

比较这个和(? < = #)[a - z]+(? = #),匹配:

and #one# and #two# and #three#four#
     \_/       \_/       \___/ \__/

Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with #([a-z]+)(?=#), which matches (as seen on rubular.com):

不幸的是,这不能在rubular.com上演示,因为它不支持lookbehind。但是,它确实支持lookahead,所以我们可以做一些类似于#([a-z]+)(?=#)的事情,它匹配(如在rubular.com上看到的):

and #one# and #two# and #three#four#
    \__/      \__/      \____/\___/

References

#2


4  

As another poster mentioned, these are lookarounds, special constructs for changing what gets matched and when. This says:

正如另一个海报所提到的,这些都是变通的,特殊的结构,用来改变什么时候匹配什么。这表示:

(?<=#)    match but don't capture, the string `#`
            when followed by the next expression

[^#]+     one or more characters that are not `#`, and

(?=#)     match but don't capture, the string `#`
            when preceded by the last expression

So this will match all the characters in between two #s.

所以这将匹配两个#之间的所有字符。

Lookaheads and lookbehinds are very useful in many cases. Consider, for example, the rule "match all bs not followed by an a." Your first attempt might be something like b[^a], but that's not right: this will also match the bu in bus or the bo in boy, but you only wanted the b. And it won't match the b in cab, even though that's not followed by an a, because there are no more characters to match.

在许多情况下,傻瓜和后视镜是非常有用的。考虑一下,例如,规则“匹配所有不跟随a的b”。你的第一次尝试可能是类似b[^],但这是不正确的:这也将匹配总线的布鲁里溃疡或bo的男孩,但是你只希望b。和它不会匹配b在出租车,虽然这不是紧随其后的是一个,因为没有更多的字符匹配。

To do that correctly, you need a lookahead: b(?!a). This says "match a b but don't match an a afterwards, and don't make that part of the match". Thus it'll match just the b in bolo, which is what you want; likewise it'll match the b in cab.

要正确地做到这一点,你需要有一个前瞻性:b(?!a)。这里写的是"匹配a b,但之后不要匹配a,也不要匹配a "这样它就会匹配上波洛的b,这就是你想要的;同样,它也会匹配出租车里的b。

#3


1  

They're called look-arounds: http://www.regular-expressions.info/lookaround.html

它们被称为并且:http://www.regular-expressions.info/lookaround.html