Python正则表达式匹配字符串排除字

时间:2022-09-13 09:19:12

I have an issue on building a regex and I've searched for 2 days all around Google, Stack Overflow and other documentations...

我有关于构建正则表达式的问题,我在Google,Stack Overflow和其他文档中搜索了2天...

I have the following lines:

我有以下几行:

2015-07-08 12:49:07.183852|INFO    |VirtualServerBase|  3| client disconnected 'Ròem'(id:6336) reason 'invokerid=20 invokername=Alphonse invokeruid=loremipsum2= reasonmsg=test'
2015-07-08 11:59:23.178055|INFO    |VirtualServerBase|  3| client disconnected 'Trakiyen'(id:20460) reason 'invokerid=0 invokername=server reasonmsg=idle time exceeded'
2015-07-08 12:40:50.591450|INFO    |VirtualServerBase|  3| client disconnected 'kalash'(id:20464) reason 'invokerid=136 invokername=Charles invokeruid=loremipsum= reasonmsg=Aller, Bisous! bantime=0
2015-07-08 00:23:03.235312|INFO    |VirtualServerBase|  3| client disconnected 'Brigata FTW'(id:20451) reason 'invokerid=103 invokername=Bob invokeruid=loremipsum3= reasonmsg=En vous souhaitant une bonne soirée <3 bantime=28800'

I want to match only the first line, following those conditions:

我想只匹配第一行,遵循这些条件:

  1. No line with invokername=server
  2. 没有invokername = server的行

  3. No line with bantime
  4. 没有线与bantime

In that case the result should only match the first line with the following regex:

在这种情况下,结果应该只匹配第一行与以下正则表达式:

.*2015-07-08.*client disconnected.*invokername=[^server].*[^bantime=].*

I only write here one regex but I've tried many and many differents things (with ?!, etc). I've read a lot topics about excluding on Stack Overflow but could not find a solution. I hope someone will help me.

我只在这里写了一个正则表达式,但我尝试了许多不同的东西(用?!等)。我已经阅读了很多关于在Stack Overflow上排除但无法找到解决方案的主题。我希望有人会帮助我。

3 个解决方案

#1


4  

You can get your line with

你可以得到你的路线

(?m)^(?!.*\b(?:invokername=server|bantime)\b).*2015-07-08.*client disconnected.*invokername=.*$

See demo

EXPLANATION:

  • (?m) - A multiline flag so that ^ and $ could match at the start and end of the sentence.
  • (?m) - 一个多行标志,以便^和$可以匹配句子的开头和结尾。

  • ^ - Start of line anchor
  • ^ - 线锚的开始

  • (?!.*\b(?:invokername=server|bantime)\b) - A negative look-ahead that is making sure there is no whole words invokername=server or bantime further on the line
  • (?!。* \ b(?:invokername = server | bantime)\ b) - 一个负面的预测,确保没有整个单词invokername = server或bantime进一步上线

  • .*2015-07-08.*client disconnected.*invokername=.* - substring containing 2015-07-08, client disconnected, invokername= and anything can be in-between those substrings (but a linebreak).
  • 。* 2015-07-08。*客户端断开连接。* invokername =。* - 包含2015-07-08的子字符串,客户端断开连接,invokername =并且任何东西都可以在这些子串之间(但是换行符)。

  • $ - End of line
  • $ - 行尾

Alternatively, you can just match *any line that has no disallowed substrings:

或者,您可以匹配*任何没有不允许的子串的行:

(?m)^(?!.*\b(?:invokername=server|bantime)\b).*$

This is a much better alternative if it does not "overmatch" for you.

如果它没有“超匹配”,这是一个更好的选择。

#2


3  

You seem to confuse [^...] with (?!...). The former is a negated character class group, while the latter is a negative lookahead.

你似乎把[^ ...]与(?!...)混淆了。前者是一个否定的角色类群,而后者是一个消极的先行者。

If we now also keep in mind that negative lookahead is applied at the current position, we need:

如果我们现在还要记住在当前位置应用负前瞻,我们需要:

.*?2015-07-08.*?client disconnected.*?(invokername=(?!server))((?!.*?bantime=).*)

Edit: Credit where credit is due: @stribizhev's solution is better than mine:

编辑:信用到期的信用:@ stribizhev的解决方案比我的更好:

(?m)^(?!.*\b(?:invokername=server|bantime)\b).*$

#3


2  

Alongside the @llogiq's answer which explained the difference between negated character class and negative look-ahead,you can also use only following regex using negative look ahead :

除了@ llogiq的答案解释了否定字符类和负面预测之间的区别之外,您还可以使用以下正则表达式使用负面预测:

^((?!bantime|(?:invokername=server)).)*$

See demo https://regex101.com/r/hI5dR0/1

请参阅演示https://regex101.com/r/hI5dR0/1

>>> re.search(r'^((?!bantime|(invokername=server)).)*$',s,re.M).group()
"015-07-08 12:49:07.183852|INFO    |VirtualServerBase|  3| client disconnected 'R\xc3\xb2em'(id:6336) reason 'invokerid=20 invokername=Alphonse invokeruid=loremipsum2= reasonmsg=test'"

#1


4  

You can get your line with

你可以得到你的路线

(?m)^(?!.*\b(?:invokername=server|bantime)\b).*2015-07-08.*client disconnected.*invokername=.*$

See demo

EXPLANATION:

  • (?m) - A multiline flag so that ^ and $ could match at the start and end of the sentence.
  • (?m) - 一个多行标志,以便^和$可以匹配句子的开头和结尾。

  • ^ - Start of line anchor
  • ^ - 线锚的开始

  • (?!.*\b(?:invokername=server|bantime)\b) - A negative look-ahead that is making sure there is no whole words invokername=server or bantime further on the line
  • (?!。* \ b(?:invokername = server | bantime)\ b) - 一个负面的预测,确保没有整个单词invokername = server或bantime进一步上线

  • .*2015-07-08.*client disconnected.*invokername=.* - substring containing 2015-07-08, client disconnected, invokername= and anything can be in-between those substrings (but a linebreak).
  • 。* 2015-07-08。*客户端断开连接。* invokername =。* - 包含2015-07-08的子字符串,客户端断开连接,invokername =并且任何东西都可以在这些子串之间(但是换行符)。

  • $ - End of line
  • $ - 行尾

Alternatively, you can just match *any line that has no disallowed substrings:

或者,您可以匹配*任何没有不允许的子串的行:

(?m)^(?!.*\b(?:invokername=server|bantime)\b).*$

This is a much better alternative if it does not "overmatch" for you.

如果它没有“超匹配”,这是一个更好的选择。

#2


3  

You seem to confuse [^...] with (?!...). The former is a negated character class group, while the latter is a negative lookahead.

你似乎把[^ ...]与(?!...)混淆了。前者是一个否定的角色类群,而后者是一个消极的先行者。

If we now also keep in mind that negative lookahead is applied at the current position, we need:

如果我们现在还要记住在当前位置应用负前瞻,我们需要:

.*?2015-07-08.*?client disconnected.*?(invokername=(?!server))((?!.*?bantime=).*)

Edit: Credit where credit is due: @stribizhev's solution is better than mine:

编辑:信用到期的信用:@ stribizhev的解决方案比我的更好:

(?m)^(?!.*\b(?:invokername=server|bantime)\b).*$

#3


2  

Alongside the @llogiq's answer which explained the difference between negated character class and negative look-ahead,you can also use only following regex using negative look ahead :

除了@ llogiq的答案解释了否定字符类和负面预测之间的区别之外,您还可以使用以下正则表达式使用负面预测:

^((?!bantime|(?:invokername=server)).)*$

See demo https://regex101.com/r/hI5dR0/1

请参阅演示https://regex101.com/r/hI5dR0/1

>>> re.search(r'^((?!bantime|(invokername=server)).)*$',s,re.M).group()
"015-07-08 12:49:07.183852|INFO    |VirtualServerBase|  3| client disconnected 'R\xc3\xb2em'(id:6336) reason 'invokerid=20 invokername=Alphonse invokeruid=loremipsum2= reasonmsg=test'"