my code is parsing some lines in a log file.
我的代码正在解析日志文件中的一些行。
i do many things with this, but a particular need has come up to be able to find a line which does not contain a certain sub-string. under a certain condition
我用它做了很多事情,但是有一个特别的需要,就是能够找到一条不包含特定子字符串的线。在一定的条件下
i have a pretty good understanding of regular expressions. but i cant seem to figure this one out.
我对正则表达式有很好的理解。但我似乎搞不清楚这个。
the problem: i want to capture any line which does not contain the word error
or warn
. unless it is the first part of the log entry and surrounded with square brackets.
问题:我想捕获任何不包含单词error或warn的行。除非它是日志条目的第一部分并被方括号包围。
so far, i have tried something like this:
到目前为止,我尝试过这样的方法:
(((?:abc|cba)\s+.*(?!\[?(?!error|warn)\]?).*)|((abc|cba)\s+\[(error|warn)\]\s+(.*)))
the lines in the log can look like some of these examples:
日志中的行可以看起来像这些例子中的一些:
capture group 2:
捕获组2:
abc [error] message
cba [error] message
cba [warn] message
capture group 1:
捕获组1:
abc something random
cba i dont know
don't capture:
不要截图:
abc some [error] message
cba some [warn] message
the problem in simpler English; I want to get any line which starts with abc
or cba
. capture group 1 should grab the line if it doesn't have [error]
or [warn]
anywhere in it. and capture group 2 should get it only if [error]
or [warn]
are the first part of the entry (after the abc
or cba
)
简单英语中的问题;我想要任何一条从abc或cba开始的线。捕获组1如果没有[错误]或[警告]在它的任何位置,就应该抓住这条线。捕获组2应该只有当[error]或[warn]是输入的第一部分(在abc或cba之后)
1 个解决方案
#1
3
This should do the trick:
这应该可以做到:
^(?:abc|cba)(?:(?!.*(?:\[error\]|\[warn\]))|\s*(?:\[error\]|\[warn\])).*$
Note that I assert the whole line to match the regex with ^
and $
.
注意,我维护整个线匹配regex ^和$。
I first check for abc
and cba
starting the line.
我先查一下abc和cba。
Then 2 cases:
然后2例:
- Neither
[error]
nor[warn]
appear anywhere in the line:(?!.*(?:\[error\]|\[warn\]))
(The?:
is not very important, just non-capturing group). - 没有[错误]和[警告]出现在行:(?! *(?:\[错误\]|\[警告\])(?:不是很重要,只是非捕获组)。
- Or
[error]
or[warn]
follow right afterabc
andcba
:\s*(?:\[error\]|\[warn\])
. Note that you may want to change\s*
to\s+
, since current regex will matchabc[error]
. - 或[error]或[warn]紧跟在abc和cba之后:\ s*(?:\[error\]|\[warn])。注意,您可能希望将\s*更改为s+,因为当前的regex将匹配abc[error]。
Then the rest I don't care .*
, but it needs to be there, since I used $
. I'm not totally sure about Python: check whether you can remove .*$
part of the regex.
其余的我都不介意。*,但它必须在那里,因为我用了$。我不完全确定Python:检查是否可以删除。*$部分的正则表达式。
I make all groups non-capturing, since you seem to be asserting that the line follow certain format. If you need to extract some data from the line at the same time, let me know.
我使所有的组都不捕获,因为您似乎断言行遵循某种格式。如果您需要同时从这一行中提取一些数据,请告诉我。
#1
3
This should do the trick:
这应该可以做到:
^(?:abc|cba)(?:(?!.*(?:\[error\]|\[warn\]))|\s*(?:\[error\]|\[warn\])).*$
Note that I assert the whole line to match the regex with ^
and $
.
注意,我维护整个线匹配regex ^和$。
I first check for abc
and cba
starting the line.
我先查一下abc和cba。
Then 2 cases:
然后2例:
- Neither
[error]
nor[warn]
appear anywhere in the line:(?!.*(?:\[error\]|\[warn\]))
(The?:
is not very important, just non-capturing group). - 没有[错误]和[警告]出现在行:(?! *(?:\[错误\]|\[警告\])(?:不是很重要,只是非捕获组)。
- Or
[error]
or[warn]
follow right afterabc
andcba
:\s*(?:\[error\]|\[warn\])
. Note that you may want to change\s*
to\s+
, since current regex will matchabc[error]
. - 或[error]或[warn]紧跟在abc和cba之后:\ s*(?:\[error\]|\[warn])。注意,您可能希望将\s*更改为s+,因为当前的regex将匹配abc[error]。
Then the rest I don't care .*
, but it needs to be there, since I used $
. I'm not totally sure about Python: check whether you can remove .*$
part of the regex.
其余的我都不介意。*,但它必须在那里,因为我用了$。我不完全确定Python:检查是否可以删除。*$部分的正则表达式。
I make all groups non-capturing, since you seem to be asserting that the line follow certain format. If you need to extract some data from the line at the same time, let me know.
我使所有的组都不捕获,因为您似乎断言行遵循某种格式。如果您需要同时从这一行中提取一些数据,请告诉我。