正则表达式如何'(?

I have the following regex in a C# program, and have difficulties understanding it:

在c#程序中，我有以下regex，并且很难理解它:

(?<=#)[^#]+(?=#)

I'll break it down to what I think I understood:

我将把它分解成我所理解的:

(?<=#)    a group, matching a hash. what's `?<=`?
[^#]+     one or more non-hashes (used to achieve non-greediness)
(?=#)     another group, matching a hash. what's the `?=`?

So the problem I have is the ?<= and ?< part. From reading MSDN, ?<name> is used for naming groups, but in this case the angle bracket is never closed.

我的问题是?<=和? <部分。从读取msdn， ? 用于命名组，但在本例中，尖括号从不关闭。

I couldn't find ?= in the docs, and searching for it is really difficult, because search engines will mostly ignore those special chars.

我在文档中找不到?=，搜索它真的很困难，因为搜索引擎通常会忽略那些特殊的字符。

3 个解决方案

#1

They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:

他们被称为看看;它们允许您断言模式是否匹配，而无需实际匹配。有4个基本的看点:

Positive lookarounds: see if we CAN match the pattern...
- (?=pattern) - ... to the right of current position (look ahead)
- (? =模式)-…在当前位置右侧(向前看)
- (?<=pattern) - ... to the left of current position (look behind)
- (? < =模式)-…在当前位置的左边(向后看)
积极的观察:看看我们是否能匹配这个模式……(? =模式)-…在当前位置的右边(向前看)(?<=图案)-…在当前位置的左边(向后看)
Negative lookarounds - see if we can NOT match the pattern
- (?!pattern) - ... to the right
- (? !模式)-…向右
- (?<!pattern) - ... to the left
- (? < !模式)-…左边
消极的lookarounds -看看我们是否不能匹配模式(?!模式)-…向右(?

As an easy reminder, for a lookaround:

作为一个简单的提醒，看看周围:

= is positive, ! is negative
=是正的,!是负的
< is look behind, otherwise it's look ahead
<是向后看，否则就是向前看< li>

References

regular-expressions.info/Lookarounds
regular-expressions.info /看看

But why use lookarounds?

One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)# will do the job just fine (extracting the string captured by \1 to get the non-#).

或许有人会说,看看在上面的模式并不是必要的,和#([^ #]+)#将做这项工作得很好(提取字符串被\ 1的非#)。

Not quite. The difference is that since a lookaround doesn't match the #, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.

不完全是。不同之处在于，由于lookaround与#不匹配，因此可以在下一次查找匹配的尝试中再次“使用”它。简单地说，查找允许“匹配”重叠。

Consider the following input string:

考虑以下输入字符串:

and #one# and #two# and #three#four#

Now, #([a-z]+)# will give the following matches (as seen on rubular.com):

现在，#([a-z]+)#将给出以下匹配(如rubular.com上看到的):

and #one# and #two# and #three#four#
    \___/     \___/     \_____/

Compare this with (?<=#)[a-z]+(?=#), which matches:

比较这个和(? < = #)[a - z]+(? = #),匹配:

and #one# and #two# and #three#four#
     \_/       \_/       \___/ \__/

Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with #([a-z]+)(?=#), which matches (as seen on rubular.com):

不幸的是，这不能在rubular.com上演示，因为它不支持lookbehind。但是，它确实支持lookahead，所以我们可以做一些类似于#([a-z]+)(?=#)的事情，它匹配(如在rubular.com上看到的):

and #one# and #two# and #three#four#
    \__/      \__/      \____/\___/

References

regular-expressions.info/Flavor Comparison
regular-expressions.info /味道比较

#2

As another poster mentioned, these are lookarounds, special constructs for changing what gets matched and when. This says:

正如另一个海报所提到的，这些都是变通的，特殊的结构，用来改变什么时候匹配什么。这表示:

(?<=#)    match but don't capture, the string `#`
            when followed by the next expression

[^#]+     one or more characters that are not `#`, and

(?=#)     match but don't capture, the string `#`
            when preceded by the last expression

So this will match all the characters in between two #s.

所以这将匹配两个#之间的所有字符。

Lookaheads and lookbehinds are very useful in many cases. Consider, for example, the rule "match all bs not followed by an a." Your first attempt might be something like b[^a], but that's not right: this will also match the bu in bus or the bo in boy, but you only wanted the b. And it won't match the b in cab, even though that's not followed by an a, because there are no more characters to match.

在许多情况下，傻瓜和后视镜是非常有用的。考虑一下，例如，规则“匹配所有不跟随a的b”。你的第一次尝试可能是类似b[^],但这是不正确的:这也将匹配总线的布鲁里溃疡或bo的男孩,但是你只希望b。和它不会匹配b在出租车,虽然这不是紧随其后的是一个,因为没有更多的字符匹配。

To do that correctly, you need a lookahead: b(?!a). This says "match a b but don't match an a afterwards, and don't make that part of the match". Thus it'll match just the b in bolo, which is what you want; likewise it'll match the b in cab.

要正确地做到这一点，你需要有一个前瞻性:b(?!a)。这里写的是"匹配a b，但之后不要匹配a，也不要匹配a "这样它就会匹配上波洛的b，这就是你想要的;同样，它也会匹配出租车里的b。

#3

They're called look-arounds: http://www.regular-expressions.info/lookaround.html

它们被称为并且:http://www.regular-expressions.info/lookaround.html

#1

They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:

他们被称为看看;它们允许您断言模式是否匹配，而无需实际匹配。有4个基本的看点:

Positive lookarounds: see if we CAN match the pattern...
- (?=pattern) - ... to the right of current position (look ahead)
- (? =模式)-…在当前位置右侧(向前看)
- (?<=pattern) - ... to the left of current position (look behind)
- (? < =模式)-…在当前位置的左边(向后看)
积极的观察:看看我们是否能匹配这个模式……(? =模式)-…在当前位置的右边(向前看)(?<=图案)-…在当前位置的左边(向后看)
Negative lookarounds - see if we can NOT match the pattern
- (?!pattern) - ... to the right
- (? !模式)-…向右
- (?<!pattern) - ... to the left
- (? < !模式)-…左边
消极的lookarounds -看看我们是否不能匹配模式(?!模式)-…向右(?

As an easy reminder, for a lookaround:

作为一个简单的提醒，看看周围:

= is positive, ! is negative
=是正的,!是负的
< is look behind, otherwise it's look ahead
<是向后看，否则就是向前看< li>

References

regular-expressions.info/Lookarounds
regular-expressions.info /看看

But why use lookarounds?

One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)# will do the job just fine (extracting the string captured by \1 to get the non-#).

或许有人会说,看看在上面的模式并不是必要的,和#([^ #]+)#将做这项工作得很好(提取字符串被\ 1的非#)。

不完全是。不同之处在于，由于lookaround与#不匹配，因此可以在下一次查找匹配的尝试中再次“使用”它。简单地说，查找允许“匹配”重叠。

Consider the following input string:

考虑以下输入字符串:

and #one# and #two# and #three#four#

Now, #([a-z]+)# will give the following matches (as seen on rubular.com):

现在，#([a-z]+)#将给出以下匹配(如rubular.com上看到的):

and #one# and #two# and #three#four#
    \___/     \___/     \_____/

Compare this with (?<=#)[a-z]+(?=#), which matches:

比较这个和(? < = #)[a - z]+(? = #),匹配:

and #one# and #two# and #three#four#
     \_/       \_/       \___/ \__/

and #one# and #two# and #three#four#
    \__/      \__/      \____/\___/

References

regular-expressions.info/Flavor Comparison
regular-expressions.info /味道比较

#2

As another poster mentioned, these are lookarounds, special constructs for changing what gets matched and when. This says:

正如另一个海报所提到的，这些都是变通的，特殊的结构，用来改变什么时候匹配什么。这表示:

(?<=#)    match but don't capture, the string `#`
            when followed by the next expression

[^#]+     one or more characters that are not `#`, and

(?=#)     match but don't capture, the string `#`
            when preceded by the last expression

So this will match all the characters in between two #s.

所以这将匹配两个#之间的所有字符。

#3

They're called look-arounds: http://www.regular-expressions.info/lookaround.html

它们被称为并且:http://www.regular-expressions.info/lookaround.html

秒客网

正则表达式如何'(?

3 个解决方案

#1

References

But why use lookarounds?

References

#2

#3

#1

References

But why use lookarounds?

References

#2

#3

相关文章