regex,有条件地捕获不包含子字符串的行

时间:2021-06-18 20:40:08

my code is parsing some lines in a log file.

我的代码正在解析日志文件中的一些行。

i do many things with this, but a particular need has come up to be able to find a line which does not contain a certain sub-string. under a certain condition

我用它做了很多事情,但是有一个特别的需要,就是能够找到一条不包含特定子字符串的线。在一定的条件下

i have a pretty good understanding of regular expressions. but i cant seem to figure this one out.

我对正则表达式有很好的理解。但我似乎搞不清楚这个。

the problem: i want to capture any line which does not contain the word error or warn. unless it is the first part of the log entry and surrounded with square brackets.

问题:我想捕获任何不包含单词error或warn的行。除非它是日志条目的第一部分并被方括号包围。

so far, i have tried something like this:

到目前为止,我尝试过这样的方法:

(((?:abc|cba)\s+.*(?!\[?(?!error|warn)\]?).*)|((abc|cba)\s+\[(error|warn)\]\s+(.*)))

the lines in the log can look like some of these examples:

日志中的行可以看起来像这些例子中的一些:

capture group 2:

捕获组2:

abc [error] message
cba [error] message
cba [warn] message

capture group 1:

捕获组1:

abc something random
cba i dont know

don't capture:

不要截图:

abc some [error] message
cba some [warn] message

the problem in simpler English; I want to get any line which starts with abc or cba. capture group 1 should grab the line if it doesn't have [error] or [warn] anywhere in it. and capture group 2 should get it only if [error] or [warn] are the first part of the entry (after the abc or cba)

简单英语中的问题;我想要任何一条从abc或cba开始的线。捕获组1如果没有[错误]或[警告]在它的任何位置,就应该抓住这条线。捕获组2应该只有当[error]或[warn]是输入的第一部分(在abc或cba之后)

1 个解决方案

#1


3  

This should do the trick:

这应该可以做到:

^(?:abc|cba)(?:(?!.*(?:\[error\]|\[warn\]))|\s*(?:\[error\]|\[warn\])).*$

Note that I assert the whole line to match the regex with ^ and $.

注意,我维护整个线匹配regex ^和$。

I first check for abc and cba starting the line.

我先查一下abc和cba。

Then 2 cases:

然后2例:

  • Neither [error] nor [warn] appear anywhere in the line: (?!.*(?:\[error\]|\[warn\])) (The ?: is not very important, just non-capturing group).
  • 没有[错误]和[警告]出现在行:(?! *(?:\[错误\]|\[警告\])(?:不是很重要,只是非捕获组)。
  • Or [error] or [warn] follow right after abc and cba: \s*(?:\[error\]|\[warn\]). Note that you may want to change \s* to \s+, since current regex will match abc[error].
  • 或[error]或[warn]紧跟在abc和cba之后:\ s*(?:\[error\]|\[warn])。注意,您可能希望将\s*更改为s+,因为当前的regex将匹配abc[error]。

Then the rest I don't care .*, but it needs to be there, since I used $. I'm not totally sure about Python: check whether you can remove .*$ part of the regex.

其余的我都不介意。*,但它必须在那里,因为我用了$。我不完全确定Python:检查是否可以删除。*$部分的正则表达式。

I make all groups non-capturing, since you seem to be asserting that the line follow certain format. If you need to extract some data from the line at the same time, let me know.

我使所有的组都不捕获,因为您似乎断言行遵循某种格式。如果您需要同时从这一行中提取一些数据,请告诉我。

#1


3  

This should do the trick:

这应该可以做到:

^(?:abc|cba)(?:(?!.*(?:\[error\]|\[warn\]))|\s*(?:\[error\]|\[warn\])).*$

Note that I assert the whole line to match the regex with ^ and $.

注意,我维护整个线匹配regex ^和$。

I first check for abc and cba starting the line.

我先查一下abc和cba。

Then 2 cases:

然后2例:

  • Neither [error] nor [warn] appear anywhere in the line: (?!.*(?:\[error\]|\[warn\])) (The ?: is not very important, just non-capturing group).
  • 没有[错误]和[警告]出现在行:(?! *(?:\[错误\]|\[警告\])(?:不是很重要,只是非捕获组)。
  • Or [error] or [warn] follow right after abc and cba: \s*(?:\[error\]|\[warn\]). Note that you may want to change \s* to \s+, since current regex will match abc[error].
  • 或[error]或[warn]紧跟在abc和cba之后:\ s*(?:\[error\]|\[warn])。注意,您可能希望将\s*更改为s+,因为当前的regex将匹配abc[error]。

Then the rest I don't care .*, but it needs to be there, since I used $. I'm not totally sure about Python: check whether you can remove .*$ part of the regex.

其余的我都不介意。*,但它必须在那里,因为我用了$。我不完全确定Python:检查是否可以删除。*$部分的正则表达式。

I make all groups non-capturing, since you seem to be asserting that the line follow certain format. If you need to extract some data from the line at the same time, let me know.

我使所有的组都不捕获,因为您似乎断言行遵循某种格式。如果您需要同时从这一行中提取一些数据,请告诉我。