如何编写正则表达式以匹配具有“错误”而非“信息”一词的字符串?

时间:2021-10-14 20:47:00

I have two strings a and b here:

我在这里有两个字符串a和b:

irb(main):022:0> a
=> "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:44,848 pid  10101 tid 139953357145856 INFO     env      Using"

irb(main):023:0> b
=> "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:45,712 pid  10101 tid 139953357145856 ERROR     env      Using"

I want to write a regex that can ignore a and match b.

我想写一个可以忽略a和匹配b的正则表达式。

In string a, ':error' is followed by 'INFO'.

在字符串a中,':error'后跟'INFO'。

In the second string b, ':error' is followed by 'ERROR'

在第二个字符串b中,':error'后跟'ERROR'

I have tried this

我试过这个

a.match(".*error.*(?!INFO).*")  

But the regex will return match for both a and b

但正则表达式将返回a和b的匹配

The use of match is a must because I am trying to pass the regex to a sensu script (https://github.com/sensu/sensu-community-plugins/blob/master/plugins/logging/check-log.rb#L189)

匹配的使用是必须的,因为我试图将正则表达式传递给sensu脚本(https://github.com/sensu/sensu-community-plugins/blob/master/plugins/logging/check-log.rb# L189)

2 个解决方案

#1


2  

The preceding .* should be placed inside of the lookahead assertion ...

前面的。*应该放在前瞻断言内...

.*error(?!.*INFO).*

Rubular — Also, I would consider using word boundaries.

Rubular - 另外,我会考虑使用单词边界。

#2


1  

You can match 'error' twice instead.

您可以将“错误”匹配两次。

a.match(".*error.*ERROR.*")

EDIT

As pointed out by Cary Swoveland, this will also match INFO log entries containing "ERROR" string inside as you can see below:

正如Cary Swoveland所指出的,这也将匹配包含“ERROR”字符串的INFO日志条目,如下所示:

irb(main):035:0> "error INFO ERROR".match(".*error.*ERROR.*")
=> #<MatchData "error INFO ERROR">

irb(main):036:0> "error ERROR INFO".match(".*error.*ERROR.*") # <-- HERE
=> #<MatchData "error ERROR INFO">

irb(main):037:0> "error INFO Praesent quis nisl posuere.".match(".*error.*ERROR.*")
=> nil

It will also happen with your initial regexp - skipping errors that contain the INFO string, like you can see below too:

您的初始正则表达式也会发生 - 跳过包含INFO字符串的错误,如下所示:

irb(main):048:0> "error INFO ERROR".match(".*error(?!.*INFO).*")
=> nil

irb(main):049:0> "error ERROR INFO".match(".*error(?!.*INFO).*")
=> nil

irb(main):050:0> "error INFO Praesent quis nisl posuere.".match(".*error(?!.*INFO).*")
=> nil

To avoid skipping or matching incorrect log entries I would rely in more parts of that string.

为了避免跳过或匹配不正确的日志条目,我会依赖该字符串的更多部分。

For that, getting your two initial samples, I would rely in the timestamp, check it out:

为此,获取两个初始样本,我会依赖时间戳,检查出来:

irb(main):055:0> "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:44,848 pid  10101 tid 139953357145856 INFO     env      Using ERROR".match(".*error(?!.*[0-9] INFO).*")
=> nil

irb(main):056:0> "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:44,848 pid  10101 tid 139953357145856 INFO     env      Using ERROR".match(".*error(?!.*[0-9] INFO).*")
=> nil

irb(main):057:0> "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:45,712 pid  10101 tid 139953357145856 ERROR     env      Using INFO".match(".*error(?!.*[0-9] INFO).*")
=> #<MatchData "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:45,712 pid  10101 tid 139953357145856 ERROR     env      Using INFO">

irb(main):058:0> "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:45,712 pid  10101 tid 139953357145856 ERROR     env      Using INFO".match(".*error(?!.*[0-9] INFO).*")
=> #<MatchData "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:45,712 pid  10101 tid 

So, my final version would be: ".*error(?!.*[0-9] INFO).*".

所以,我的最终版本是:“。*错误(?!。* [0-9] INFO)。*”。

#1


2  

The preceding .* should be placed inside of the lookahead assertion ...

前面的。*应该放在前瞻断言内...

.*error(?!.*INFO).*

Rubular — Also, I would consider using word boundaries.

Rubular - 另外,我会考虑使用单词边界。

#2


1  

You can match 'error' twice instead.

您可以将“错误”匹配两次。

a.match(".*error.*ERROR.*")

EDIT

As pointed out by Cary Swoveland, this will also match INFO log entries containing "ERROR" string inside as you can see below:

正如Cary Swoveland所指出的,这也将匹配包含“ERROR”字符串的INFO日志条目,如下所示:

irb(main):035:0> "error INFO ERROR".match(".*error.*ERROR.*")
=> #<MatchData "error INFO ERROR">

irb(main):036:0> "error ERROR INFO".match(".*error.*ERROR.*") # <-- HERE
=> #<MatchData "error ERROR INFO">

irb(main):037:0> "error INFO Praesent quis nisl posuere.".match(".*error.*ERROR.*")
=> nil

It will also happen with your initial regexp - skipping errors that contain the INFO string, like you can see below too:

您的初始正则表达式也会发生 - 跳过包含INFO字符串的错误,如下所示:

irb(main):048:0> "error INFO ERROR".match(".*error(?!.*INFO).*")
=> nil

irb(main):049:0> "error ERROR INFO".match(".*error(?!.*INFO).*")
=> nil

irb(main):050:0> "error INFO Praesent quis nisl posuere.".match(".*error(?!.*INFO).*")
=> nil

To avoid skipping or matching incorrect log entries I would rely in more parts of that string.

为了避免跳过或匹配不正确的日志条目,我会依赖该字符串的更多部分。

For that, getting your two initial samples, I would rely in the timestamp, check it out:

为此,获取两个初始样本,我会依赖时间戳,检查出来:

irb(main):055:0> "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:44,848 pid  10101 tid 139953357145856 INFO     env      Using ERROR".match(".*error(?!.*[0-9] INFO).*")
=> nil

irb(main):056:0> "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:44,848 pid  10101 tid 139953357145856 INFO     env      Using ERROR".match(".*error(?!.*[0-9] INFO).*")
=> nil

irb(main):057:0> "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:45,712 pid  10101 tid 139953357145856 ERROR     env      Using INFO".match(".*error(?!.*[0-9] INFO).*")
=> #<MatchData "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:45,712 pid  10101 tid 139953357145856 ERROR     env      Using INFO">

irb(main):058:0> "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:45,712 pid  10101 tid 139953357145856 ERROR     env      Using INFO".match(".*error(?!.*[0-9] INFO).*")
=> #<MatchData "[:error] [pid 10101:tid 139953357145856] 2015-03-15 20:33:45,712 pid  10101 tid 

So, my final version would be: ".*error(?!.*[0-9] INFO).*".

所以,我的最终版本是:“。*错误(?!。* [0-9] INFO)。*”。