I have the following regex in a C# program, and have difficulties understanding it:
在c#程序中,我有以下regex,并且很难理解它:
(?<=#)[^#]+(?=#)
I'll break it down to what I think I understood:
我将把它分解成我所理解的:
(?<=#) a group, matching a hash. what's `?<=`?
[^#]+ one or more non-hashes (used to achieve non-greediness)
(?=#) another group, matching a hash. what's the `?=`?
So the problem I have is the ?<=
and ?<
part. From reading MSDN, ?<name>
is used for naming groups, but in this case the angle bracket is never closed.
我的问题是?<=和? <部分。从读取msdn, ?
I couldn't find ?=
in the docs, and searching for it is really difficult, because search engines will mostly ignore those special chars.
我在文档中找不到?=,搜索它真的很困难,因为搜索引擎通常会忽略那些特殊的字符。
3 个解决方案
#1
31
They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:
他们被称为看看;它们允许您断言模式是否匹配,而无需实际匹配。有4个基本的看点:
- Positive lookarounds: see if we CAN match the
pattern
...-
(?=pattern)
- ... to the right of current position (look ahead) - (? =模式)-…在当前位置右侧(向前看)
-
(?<=pattern)
- ... to the left of current position (look behind) - (? < =模式)-…在当前位置的左边(向后看)
-
- 积极的观察:看看我们是否能匹配这个模式……(? =模式)-…在当前位置的右边(向前看)(?<=图案)-…在当前位置的左边(向后看)
- Negative lookarounds - see if we can NOT match the
pattern
-
(?!pattern)
- ... to the right - (? !模式)-…向右
-
(?<!pattern)
- ... to the left - (? < !模式)-…左边
-
- 消极的lookarounds -看看我们是否不能匹配模式(?!模式)-…向右(?
As an easy reminder, for a lookaround:
作为一个简单的提醒,看看周围:
-
=
is positive,!
is negative - =是正的,!是负的
-
<
is look behind, otherwise it's look ahead - <是向后看,否则就是向前看< li>
References
- regular-expressions.info/Lookarounds
- regular-expressions.info /看看
But why use lookarounds?
One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)#
will do the job just fine (extracting the string captured by \1
to get the non-#
).
或许有人会说,看看在上面的模式并不是必要的,和#([^ #]+)#将做这项工作得很好(提取字符串被\ 1的非#)。
Not quite. The difference is that since a lookaround doesn't match the #
, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.
不完全是。不同之处在于,由于lookaround与#不匹配,因此可以在下一次查找匹配的尝试中再次“使用”它。简单地说,查找允许“匹配”重叠。
Consider the following input string:
考虑以下输入字符串:
and #one# and #two# and #three#four#
Now, #([a-z]+)#
will give the following matches (as seen on rubular.com):
现在,#([a-z]+)#将给出以下匹配(如rubular.com上看到的):
and #one# and #two# and #three#four#
\___/ \___/ \_____/
Compare this with (?<=#)[a-z]+(?=#)
, which matches:
比较这个和(? < = #)[a - z]+(? = #),匹配:
and #one# and #two# and #three#four#
\_/ \_/ \___/ \__/
Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with #([a-z]+)(?=#)
, which matches (as seen on rubular.com):
不幸的是,这不能在rubular.com上演示,因为它不支持lookbehind。但是,它确实支持lookahead,所以我们可以做一些类似于#([a-z]+)(?=#)的事情,它匹配(如在rubular.com上看到的):
and #one# and #two# and #three#four#
\__/ \__/ \____/\___/
References
- regular-expressions.info/Flavor Comparison
- regular-expressions.info /味道比较
#2
4
As another poster mentioned, these are lookarounds, special constructs for changing what gets matched and when. This says:
正如另一个海报所提到的,这些都是变通的,特殊的结构,用来改变什么时候匹配什么。这表示:
(?<=#) match but don't capture, the string `#`
when followed by the next expression
[^#]+ one or more characters that are not `#`, and
(?=#) match but don't capture, the string `#`
when preceded by the last expression
So this will match all the characters in between two #
s.
所以这将匹配两个#之间的所有字符。
Lookaheads and lookbehinds are very useful in many cases. Consider, for example, the rule "match all b
s not followed by an a
." Your first attempt might be something like b[^a]
, but that's not right: this will also match the bu
in bus
or the bo
in boy
, but you only wanted the b
. And it won't match the b
in cab
, even though that's not followed by an a
, because there are no more characters to match.
在许多情况下,傻瓜和后视镜是非常有用的。考虑一下,例如,规则“匹配所有不跟随a的b”。你的第一次尝试可能是类似b[^],但这是不正确的:这也将匹配总线的布鲁里溃疡或bo的男孩,但是你只希望b。和它不会匹配b在出租车,虽然这不是紧随其后的是一个,因为没有更多的字符匹配。
To do that correctly, you need a lookahead: b(?!a)
. This says "match a b
but don't match an a
afterwards, and don't make that part of the match". Thus it'll match just the b
in bolo
, which is what you want; likewise it'll match the b
in cab
.
要正确地做到这一点,你需要有一个前瞻性:b(?!a)。这里写的是"匹配a b,但之后不要匹配a,也不要匹配a "这样它就会匹配上波洛的b,这就是你想要的;同样,它也会匹配出租车里的b。
#3
1
They're called look-arounds: http://www.regular-expressions.info/lookaround.html
它们被称为并且:http://www.regular-expressions.info/lookaround.html
#1
31
They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:
他们被称为看看;它们允许您断言模式是否匹配,而无需实际匹配。有4个基本的看点:
- Positive lookarounds: see if we CAN match the
pattern
...-
(?=pattern)
- ... to the right of current position (look ahead) - (? =模式)-…在当前位置右侧(向前看)
-
(?<=pattern)
- ... to the left of current position (look behind) - (? < =模式)-…在当前位置的左边(向后看)
-
- 积极的观察:看看我们是否能匹配这个模式……(? =模式)-…在当前位置的右边(向前看)(?<=图案)-…在当前位置的左边(向后看)
- Negative lookarounds - see if we can NOT match the
pattern
-
(?!pattern)
- ... to the right - (? !模式)-…向右
-
(?<!pattern)
- ... to the left - (? < !模式)-…左边
-
- 消极的lookarounds -看看我们是否不能匹配模式(?!模式)-…向右(?
As an easy reminder, for a lookaround:
作为一个简单的提醒,看看周围:
-
=
is positive,!
is negative - =是正的,!是负的
-
<
is look behind, otherwise it's look ahead - <是向后看,否则就是向前看< li>
References
- regular-expressions.info/Lookarounds
- regular-expressions.info /看看
But why use lookarounds?
One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)#
will do the job just fine (extracting the string captured by \1
to get the non-#
).
或许有人会说,看看在上面的模式并不是必要的,和#([^ #]+)#将做这项工作得很好(提取字符串被\ 1的非#)。
Not quite. The difference is that since a lookaround doesn't match the #
, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.
不完全是。不同之处在于,由于lookaround与#不匹配,因此可以在下一次查找匹配的尝试中再次“使用”它。简单地说,查找允许“匹配”重叠。
Consider the following input string:
考虑以下输入字符串:
and #one# and #two# and #three#four#
Now, #([a-z]+)#
will give the following matches (as seen on rubular.com):
现在,#([a-z]+)#将给出以下匹配(如rubular.com上看到的):
and #one# and #two# and #three#four#
\___/ \___/ \_____/
Compare this with (?<=#)[a-z]+(?=#)
, which matches:
比较这个和(? < = #)[a - z]+(? = #),匹配:
and #one# and #two# and #three#four#
\_/ \_/ \___/ \__/
Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with #([a-z]+)(?=#)
, which matches (as seen on rubular.com):
不幸的是,这不能在rubular.com上演示,因为它不支持lookbehind。但是,它确实支持lookahead,所以我们可以做一些类似于#([a-z]+)(?=#)的事情,它匹配(如在rubular.com上看到的):
and #one# and #two# and #three#four#
\__/ \__/ \____/\___/
References
- regular-expressions.info/Flavor Comparison
- regular-expressions.info /味道比较
#2
4
As another poster mentioned, these are lookarounds, special constructs for changing what gets matched and when. This says:
正如另一个海报所提到的,这些都是变通的,特殊的结构,用来改变什么时候匹配什么。这表示:
(?<=#) match but don't capture, the string `#`
when followed by the next expression
[^#]+ one or more characters that are not `#`, and
(?=#) match but don't capture, the string `#`
when preceded by the last expression
So this will match all the characters in between two #
s.
所以这将匹配两个#之间的所有字符。
Lookaheads and lookbehinds are very useful in many cases. Consider, for example, the rule "match all b
s not followed by an a
." Your first attempt might be something like b[^a]
, but that's not right: this will also match the bu
in bus
or the bo
in boy
, but you only wanted the b
. And it won't match the b
in cab
, even though that's not followed by an a
, because there are no more characters to match.
在许多情况下,傻瓜和后视镜是非常有用的。考虑一下,例如,规则“匹配所有不跟随a的b”。你的第一次尝试可能是类似b[^],但这是不正确的:这也将匹配总线的布鲁里溃疡或bo的男孩,但是你只希望b。和它不会匹配b在出租车,虽然这不是紧随其后的是一个,因为没有更多的字符匹配。
To do that correctly, you need a lookahead: b(?!a)
. This says "match a b
but don't match an a
afterwards, and don't make that part of the match". Thus it'll match just the b
in bolo
, which is what you want; likewise it'll match the b
in cab
.
要正确地做到这一点,你需要有一个前瞻性:b(?!a)。这里写的是"匹配a b,但之后不要匹配a,也不要匹配a "这样它就会匹配上波洛的b,这就是你想要的;同样,它也会匹配出租车里的b。
#3
1
They're called look-arounds: http://www.regular-expressions.info/lookaround.html
它们被称为并且:http://www.regular-expressions.info/lookaround.html