I'm firm using regex. I've the following regex that matches all I want:
我公司使用正则表达式。我有下面的regex,匹配所有我想要的:
#(? < nonCapturing > \ \:)?(?:\(\ ?(? !(\)))(? < addOpt >[阿拉伯文]*)(?:\ -(? < remOpt >[阿拉伯文]+))? \))? \ b(? < id > \ w +)\ b(?:\[\]|(?(? = \[)(?:[^ \[\]]|(? <开放> \[)|(? < subFormat-open > \]))+ ?(?(打开)(? !))))?
Very long - sorry. It is used to parse a format string for arbitrary objects. It enables me to specify a property (e.g. IntValue) and forward an optional child format to it.
很长时间,对不起。它用于解析任意对象的格式字符串。它允许我指定一个属性(例如IntValue)并将可选的子格式转发给它。
It matches a #
followed by an optional non capturing pattern ?:
, followed by an optional 'options' pattern (?r)
or (?a-r)
. Then the property name followed by a pair of []
.
它匹配一个#,后面跟着一个可选的非捕获模式?:,后面跟着一个可选的“选项”模式(?r)或(?a-r)。然后属性名后面跟着一对[]。
For the following input:
下面的输入:
Int: #IntValue Bool: #BoolValue[]Word Str: '#StrValue' Double: #DoubleValue[#.00] #(?r)Bar[#(?r)StrValue[#Length]]
Int: #IntValue Bool: #BoolValue[]Word Str: '#StrValue' Double: #DoubleValue[#]。00)#(? r)酒吧(#(? r)StrValue[#长度]]
it matches:
它匹配:
- #IntValue
- # IntValue
- #BoolValue[]
- # BoolValue[]
- #StrValue
- # StrValue
- #DoubleValue[#.00]
- # DoubleValue(# 28美元)
- #(?r)Bar[#(?r)StrValue[#Length]]
- #(? r)酒吧(#(? r)StrValue[#长度]]
Fine.
很好。
But now I need all the other stuff. I want it in the same regex to be able to foreach over all matches (I can decide whitch case I have by checking whether id
or plain
has a capture).
但现在我需要所有其他的东西。我希望它在同一个regex中能够对所有匹配进行foreach(我可以通过检查id或plain是否具有捕获来确定我的whitch情况)。
The default pattern to do that is: ((?!<regex that matches what you want>).)*
这样做的默认模式是:(?!)
In my case that will look like (Pattern: <REG>|(?<plain>(?:(?!<REG>).)+)
) what resuls in that huge regex (whitch maches perfectly):
在我的例子中,它看起来像(模式:
(?:#(? < nonCapturing > \ \:)?(?:\(\ ?(? !(\)))(? < addOpt >[阿拉伯文]*)(?:\ -(? < remOpt >[阿拉伯文]+))? \))? \ b(? < id > \ w +)\ b(?:\[\]|(?(? = \[)(?:[^ \[\]]|(? <开放> \[)|(? < subFormat-open > \]))+ ?(?(打开)(? !))))?)|(? <平原> (?(? !(?:#(? < nonCapturing > \ \:)?(?:\(\ ?(? !(\)))(? < addOpt >[阿拉伯文]*)(?:\ -(? < remOpt >[阿拉伯文]+))? \))? \ b(? < id > \ w +)\ b(?:\[\]|(?(? = \[)(?:[^ \[\]]|(? <开放> \[)|(? < subFormat-open > \]))+ ?(?(打开)(? !))))?)|(? <平原> (?:#(? < nonCapturing > \ \:)?(?:\(\ ?(? !(\)))(? < addOpt >[阿拉伯文]*)(?:\ -(? < remOpt >[阿拉伯文]+))? \))? \ b(? < id > \ w +)\ b(?:\[\]|(?:(? = \[)(?:[^ \[\]]|(? <开放> \[)|(? < subFormat-open > \]))+ ?(?(打开)(? !))))?))+)))+)
Puh. It does what it should but...
Puh。它做了它应该做的,但是…
Is there any other way to match all that's not matched by a previous regex part?
是否还有其他方法可以匹配之前的regex部分不匹配的所有内容?
Is that clear?
明白了吗?
1 个解决方案
#1
2
You're lucky: your regex starts with an anchor character, which is #
. We can take advantage of that.
你很幸运:你的正则表达式以一个锚符#开头。我们可以利用这一点。
Add an alternative to the pattern: |[^#]+
. This will consume everything but #
characters, leaving the special cases starting with #
to the first part of the pattern. A #
character will therefore always start a new match.
添加另一个模式:| ^ # +。这将消耗除#字符之外的所有字符,将特殊情况从#开始到模式的第一部分。因此,一个#字符将总是开始一个新的匹配。
There's still a minor catch: you have a non-optional id
group surrounded by two \b
anchors in that first part, which means if you have a #
in the input string which is not followed by a letter (let's say something like foo#!bar
), that #
won't be matched by the second part of the pattern either.
还有一个小问题:你有一个可选的组id包围两个\ b锚的第一部分,这意味着如果你有一个号的输入字符串不是紧随其后的一封信(假设类似foo # ! bar),#不会匹配模式的第二部分。
A simple solution to this problem is to use |[^#]+|#
at the end of the pattern to account for this edge case. That third #
case will be matched only if the first case fails.
对这个问题的一个简单的解决方案是使用|(^ #)+ | #的模式来解释这条边的情况。第三个#案例只有在第一个案例失败时才会匹配。
#1
2
You're lucky: your regex starts with an anchor character, which is #
. We can take advantage of that.
你很幸运:你的正则表达式以一个锚符#开头。我们可以利用这一点。
Add an alternative to the pattern: |[^#]+
. This will consume everything but #
characters, leaving the special cases starting with #
to the first part of the pattern. A #
character will therefore always start a new match.
添加另一个模式:| ^ # +。这将消耗除#字符之外的所有字符,将特殊情况从#开始到模式的第一部分。因此,一个#字符将总是开始一个新的匹配。
There's still a minor catch: you have a non-optional id
group surrounded by two \b
anchors in that first part, which means if you have a #
in the input string which is not followed by a letter (let's say something like foo#!bar
), that #
won't be matched by the second part of the pattern either.
还有一个小问题:你有一个可选的组id包围两个\ b锚的第一部分,这意味着如果你有一个号的输入字符串不是紧随其后的一封信(假设类似foo # ! bar),#不会匹配模式的第二部分。
A simple solution to this problem is to use |[^#]+|#
at the end of the pattern to account for this edge case. That third #
case will be matched only if the first case fails.
对这个问题的一个简单的解决方案是使用|(^ #)+ | #的模式来解释这条边的情况。第三个#案例只有在第一个案例失败时才会匹配。