正则表达式匹配除字符串以外的所有

时间:2021-09-07 21:39:33

I need to find all the strings matching a pattern with the exception of two given strings.

我需要找到匹配模式的所有字符串,但两个给定的字符串除外。

For example, find all groups of letters with the exception of aa and bb. Starting from this string:

例如,查找除aa和bb之外的所有字母组。从这个字符串开始:

-a-bc-aa-def-bb-ghij-

Should return:

('a', 'bc', 'def', 'ghij')

I tried with this regular expression that captures 4 strings. I thought I was getting close, but (1) it doesn't work in Python and (2) I can't figure out how to exclude a few strings from the search. (Yes, I could remove them later, but my real regular expression does everything in one shot and I would like to include this last step in it.)

我尝试使用这个捕获4个字符串的正则表达式。我以为我越来越接近,但是(1)它在Python中不起作用,(2)我无法弄清楚如何从搜索中排除一些字符串。 (是的,我可以在以后删除它们,但是我的真实正则表达式一次性完成所有操作,我想在其中包含最后一步。)

I said it doesn't work in Python because I tried this, expecting the exact same result, but instead I get only the first group:

我说它在Python中不起作用,因为我试过这个,期望完全相同的结果,但我得到的只是第一组:

>>> import re
>>> re.search('-(\w.*?)(?=-)', '-a-bc-def-ghij-').groups()
('a',)

I tried with negative look ahead, but I couldn't find a working solution for this case.

我试着用负面向前看,但我找不到适合这种情况的解决方案。

3 个解决方案

#1


6  

You can make use of negative look aheads.

你可以利用负向前瞻。

For example,

>>> re.findall(r'-(?!aa|bb)([^-]+)', string)
['a', 'bc', 'def', 'ghij']

  • - Matches -

    - 火柴 -

  • (?!aa|bb) Negative lookahead, checks if - is not followed by aa or bb

    (?!aa | bb)负向前瞻,检查 - 是否 - 后面没有aa或bb

  • ([^-]+) Matches ony or more character other than -

    ([^ - ] +)匹配除了以外的ony或更多字符 -


Edit

The above regex will not match those which start with aa or bb, for example like -aabc-. To take care of that we can add - to the lookaheads like,

上面的正则表达式与那些以aa或bb开头的正则表达式不匹配,例如-aabc-。为了照顾我们可以添加 - 像前面这样的,

>>> re.findall(r'-(?!aa-|bb-)([^-]+)', string)

#2


2  

You need to use a negative lookahead to restrict a more generic pattern, and a re.findall to find all matches.

您需要使用负前瞻来限制更通用的模式,使用re.findall来查找所有匹配项。

Use

res = re.findall(r'-(?!(?:aa|bb)-)(\w+)(?=-)', s)

or - if your values in between hyphens can be any but a hyphen, use a negated character class [^-]:

或者 - 如果连字符之间的值可以是除连字符之外的任何值,请使用否定字符类[^ - ]:

res = re.findall(r'-(?!(?:aa|bb)-)([^-]+)(?=-)', s)

Here is the regex demo.

这是正则表达式演示。

Details:

  • - - a hyphen
  • - - 连字符

  • (?!(?:aa|bb)-) - if there is aaa- or bb- after the first hyphen, no match should be returned
  • (?!(?:aa | bb) - ) - 如果在第一个连字符后面有aaa-或bb-,则不应返回匹配项

  • (\w+) - Group 1 (this value will be returned by the re.findall call) capturing 1 or more word chars OR [^-]+ - 1 or more characters other than -
  • (\ w +) - 第1组(此值将由re.findall调用返回)捕获1个或多个字符或[^ - ] + - 除1以外的1个或多个字符 -

  • (?=-) - there must be a - after the word chars. The lookahead is required here to ensure overlapping matches (as this hyphen will be a starting point for the next match).
  • (?= - ) - 必须有一个 - 字后面的字符。这里需要前瞻以确保重叠匹配(因为此连字符将成为下一个匹配的起点)。

Python demo:

import re
p = re.compile(r'-(?!(?:aa|bb)-)([^-]+)(?=-)')
s = "-a-bc-aa-def-bb-ghij-"
print(p.findall(s)) # => ['a', 'bc', 'def', 'ghij']

#3


0  

Although a regex solution was asked for, I would argue that this problem can be solved easier with simpler python functions, namely string splitting and filtering:

虽然要求使用正则表达式解决方案,但我认为使用更简单的python函数(即字符串拆分和过滤)可以更轻松地解决这个问题:

input_list = "-a-bc-aa-def-bb-ghij-"
exclude = set(["aa", "bb"])
result = [s for s in input_list.split('-')[1:-1] if s not in exclude]

This solution has the additional advantage that result could also be turned into a generator and the result list does not need to be constructed explicitly.

该解决方案具有额外的优点,即结果也可以转换为生成器,并且不需要明确地构造结果列表。

#1


6  

You can make use of negative look aheads.

你可以利用负向前瞻。

For example,

>>> re.findall(r'-(?!aa|bb)([^-]+)', string)
['a', 'bc', 'def', 'ghij']

  • - Matches -

    - 火柴 -

  • (?!aa|bb) Negative lookahead, checks if - is not followed by aa or bb

    (?!aa | bb)负向前瞻,检查 - 是否 - 后面没有aa或bb

  • ([^-]+) Matches ony or more character other than -

    ([^ - ] +)匹配除了以外的ony或更多字符 -


Edit

The above regex will not match those which start with aa or bb, for example like -aabc-. To take care of that we can add - to the lookaheads like,

上面的正则表达式与那些以aa或bb开头的正则表达式不匹配,例如-aabc-。为了照顾我们可以添加 - 像前面这样的,

>>> re.findall(r'-(?!aa-|bb-)([^-]+)', string)

#2


2  

You need to use a negative lookahead to restrict a more generic pattern, and a re.findall to find all matches.

您需要使用负前瞻来限制更通用的模式,使用re.findall来查找所有匹配项。

Use

res = re.findall(r'-(?!(?:aa|bb)-)(\w+)(?=-)', s)

or - if your values in between hyphens can be any but a hyphen, use a negated character class [^-]:

或者 - 如果连字符之间的值可以是除连字符之外的任何值,请使用否定字符类[^ - ]:

res = re.findall(r'-(?!(?:aa|bb)-)([^-]+)(?=-)', s)

Here is the regex demo.

这是正则表达式演示。

Details:

  • - - a hyphen
  • - - 连字符

  • (?!(?:aa|bb)-) - if there is aaa- or bb- after the first hyphen, no match should be returned
  • (?!(?:aa | bb) - ) - 如果在第一个连字符后面有aaa-或bb-,则不应返回匹配项

  • (\w+) - Group 1 (this value will be returned by the re.findall call) capturing 1 or more word chars OR [^-]+ - 1 or more characters other than -
  • (\ w +) - 第1组(此值将由re.findall调用返回)捕获1个或多个字符或[^ - ] + - 除1以外的1个或多个字符 -

  • (?=-) - there must be a - after the word chars. The lookahead is required here to ensure overlapping matches (as this hyphen will be a starting point for the next match).
  • (?= - ) - 必须有一个 - 字后面的字符。这里需要前瞻以确保重叠匹配(因为此连字符将成为下一个匹配的起点)。

Python demo:

import re
p = re.compile(r'-(?!(?:aa|bb)-)([^-]+)(?=-)')
s = "-a-bc-aa-def-bb-ghij-"
print(p.findall(s)) # => ['a', 'bc', 'def', 'ghij']

#3


0  

Although a regex solution was asked for, I would argue that this problem can be solved easier with simpler python functions, namely string splitting and filtering:

虽然要求使用正则表达式解决方案,但我认为使用更简单的python函数(即字符串拆分和过滤)可以更轻松地解决这个问题:

input_list = "-a-bc-aa-def-bb-ghij-"
exclude = set(["aa", "bb"])
result = [s for s in input_list.split('-')[1:-1] if s not in exclude]

This solution has the additional advantage that result could also be turned into a generator and the result list does not need to be constructed explicitly.

该解决方案具有额外的优点,即结果也可以转换为生成器,并且不需要明确地构造结果列表。