Regex匹配任何其他Regex不匹配的内容

时间:2021-07-25 20:13:23

I'm firm using regex. I've the following regex that matches all I want:

我公司使用正则表达式。我有下面的regex,匹配所有我想要的:

#(?<nonCapturing>\?\:)?(?:\(\?(?![\)])(?<addOpt>[ar]*)(?:\-(?<remOpt>[ar]+))?\))?\b(?<id>\w+)\b(?:\[\]|(?:(?=\[)(?:[^\[\]]|(?<open>\[)|(?<subFormat-open>\]))+?(?(open)(?!))))?

#(? < nonCapturing > \ \:)?(?:\(\ ?(? !(\)))(? < addOpt >[阿拉伯文]*)(?:\ -(? < remOpt >[阿拉伯文]+))? \))? \ b(? < id > \ w +)\ b(?:\[\]|(?(? = \[)(?:[^ \[\]]|(? <开放> \[)|(? < subFormat-open > \]))+ ?(?(打开)(? !))))?

Very long - sorry. It is used to parse a format string for arbitrary objects. It enables me to specify a property (e.g. IntValue) and forward an optional child format to it.

很长时间,对不起。它用于解析任意对象的格式字符串。它允许我指定一个属性(例如IntValue)并将可选的子格式转发给它。

It matches a # followed by an optional non capturing pattern ?:, followed by an optional 'options' pattern (?r) or (?a-r). Then the property name followed by a pair of [].

它匹配一个#,后面跟着一个可选的非捕获模式?:,后面跟着一个可选的“选项”模式(?r)或(?a-r)。然后属性名后面跟着一对[]。

For the following input:

下面的输入:

Int: #IntValue Bool: #BoolValue[]Word Str: '#StrValue' Double: #DoubleValue[#.00] #(?r)Bar[#(?r)StrValue[#Length]]

Int: #IntValue Bool: #BoolValue[]Word Str: '#StrValue' Double: #DoubleValue[#]。00)#(? r)酒吧(#(? r)StrValue[#长度]]

it matches:

它匹配:

  • #IntValue
  • # IntValue
  • #BoolValue[]
  • # BoolValue[]
  • #StrValue
  • # StrValue
  • #DoubleValue[#.00]
  • # DoubleValue(# 28美元)
  • #(?r)Bar[#(?r)StrValue[#Length]]
  • #(? r)酒吧(#(? r)StrValue[#长度]]

Fine.

很好。

But now I need all the other stuff. I want it in the same regex to be able to foreach over all matches (I can decide whitch case I have by checking whether id or plain has a capture).

但现在我需要所有其他的东西。我希望它在同一个regex中能够对所有匹配进行foreach(我可以通过检查id或plain是否具有捕获来确定我的whitch情况)。

The default pattern to do that is: ((?!<regex that matches what you want>).)*

这样做的默认模式是:(?!) )匹配 ,与您想要的>

In my case that will look like (Pattern: <REG>|(?<plain>(?:(?!<REG>).)+)) what resuls in that huge regex (whitch maches perfectly):

在我的例子中,它看起来像(模式: |(? ? plain>(?:(?! ). +)))在这个巨大的regex (whitch maches perfect)中会产生什么结果?

(?:#(?<nonCapturing>\?\:)?(?:\(\?(?![\)])(?<addOpt>[ar]*)(?:\-(?<remOpt>[ar]+))?\))?\b(?<id>\w+)\b(?:\[\]|(?:(?=\[)(?:[^\[\]]|(?<open>\[)|(?<subFormat-open>\]))+?(?(open)(?!))))?)|(?<plain>(?:(?!(?:#(?<nonCapturing>\?\:)?(?:\(\?(?![\)])(?<addOpt>[ar]*)(?:\-(?<remOpt>[ar]+))?\))?\b(?<id>\w+)\b(?:\[\]|(?:(?=\[)(?:[^\[\]]|(?<open>\[)|(?<subFormat-open>\]))+?(?(open)(?!))))?)|(?<plain>(?:(#(?<nonCapturing>\?\:)?(?:\(\?(?![\)])(?<addOpt>[ar]*)(?:\-(?<remOpt>[ar]+))?\))?\b(?<id>\w+)\b(?:\[\]|(?:(?=\[)(?:[^\[\]]|(?<open>\[)|(?<subFormat-open>\]))+?(?(open)(?!))))?).)+)).)+)

(?:#(? < nonCapturing > \ \:)?(?:\(\ ?(? !(\)))(? < addOpt >[阿拉伯文]*)(?:\ -(? < remOpt >[阿拉伯文]+))? \))? \ b(? < id > \ w +)\ b(?:\[\]|(?(? = \[)(?:[^ \[\]]|(? <开放> \[)|(? < subFormat-open > \]))+ ?(?(打开)(? !))))?)|(? <平原> (?(? !(?:#(? < nonCapturing > \ \:)?(?:\(\ ?(? !(\)))(? < addOpt >[阿拉伯文]*)(?:\ -(? < remOpt >[阿拉伯文]+))? \))? \ b(? < id > \ w +)\ b(?:\[\]|(?(? = \[)(?:[^ \[\]]|(? <开放> \[)|(? < subFormat-open > \]))+ ?(?(打开)(? !))))?)|(? <平原> (?:#(? < nonCapturing > \ \:)?(?:\(\ ?(? !(\)))(? < addOpt >[阿拉伯文]*)(?:\ -(? < remOpt >[阿拉伯文]+))? \))? \ b(? < id > \ w +)\ b(?:\[\]|(?:(? = \[)(?:[^ \[\]]|(? <开放> \[)|(? < subFormat-open > \]))+ ?(?(打开)(? !))))?))+)))+)

Puh. It does what it should but...

Puh。它做了它应该做的,但是…

Is there any other way to match all that's not matched by a previous regex part?

是否还有其他方法可以匹配之前的regex部分不匹配的所有内容?

Is that clear?

明白了吗?

1 个解决方案

#1


2  

You're lucky: your regex starts with an anchor character, which is #. We can take advantage of that.

你很幸运:你的正则表达式以一个锚符#开头。我们可以利用这一点。

Add an alternative to the pattern: |[^#]+. This will consume everything but # characters, leaving the special cases starting with # to the first part of the pattern. A # character will therefore always start a new match.

添加另一个模式:| ^ # +。这将消耗除#字符之外的所有字符,将特殊情况从#开始到模式的第一部分。因此,一个#字符将总是开始一个新的匹配。

There's still a minor catch: you have a non-optional id group surrounded by two \b anchors in that first part, which means if you have a # in the input string which is not followed by a letter (let's say something like foo#!bar), that # won't be matched by the second part of the pattern either.

还有一个小问题:你有一个可选的组id包围两个\ b锚的第一部分,这意味着如果你有一个号的输入字符串不是紧随其后的一封信(假设类似foo # ! bar),#不会匹配模式的第二部分。

A simple solution to this problem is to use |[^#]+|# at the end of the pattern to account for this edge case. That third # case will be matched only if the first case fails.

对这个问题的一个简单的解决方案是使用|(^ #)+ | #的模式来解释这条边的情况。第三个#案例只有在第一个案例失败时才会匹配。

#1


2  

You're lucky: your regex starts with an anchor character, which is #. We can take advantage of that.

你很幸运:你的正则表达式以一个锚符#开头。我们可以利用这一点。

Add an alternative to the pattern: |[^#]+. This will consume everything but # characters, leaving the special cases starting with # to the first part of the pattern. A # character will therefore always start a new match.

添加另一个模式:| ^ # +。这将消耗除#字符之外的所有字符,将特殊情况从#开始到模式的第一部分。因此,一个#字符将总是开始一个新的匹配。

There's still a minor catch: you have a non-optional id group surrounded by two \b anchors in that first part, which means if you have a # in the input string which is not followed by a letter (let's say something like foo#!bar), that # won't be matched by the second part of the pattern either.

还有一个小问题:你有一个可选的组id包围两个\ b锚的第一部分,这意味着如果你有一个号的输入字符串不是紧随其后的一封信(假设类似foo # ! bar),#不会匹配模式的第二部分。

A simple solution to this problem is to use |[^#]+|# at the end of the pattern to account for this edge case. That third # case will be matched only if the first case fails.

对这个问题的一个简单的解决方案是使用|(^ #)+ | #的模式来解释这条边的情况。第三个#案例只有在第一个案例失败时才会匹配。