I need to find all the strings matching a pattern with the exception of two given strings.
我需要找到匹配模式的所有字符串,但两个给定的字符串除外。
For example, find all groups of letters with the exception of aa
and bb
. Starting from this string:
例如,查找除aa和bb之外的所有字母组。从这个字符串开始:
-a-bc-aa-def-bb-ghij-
Should return:
('a', 'bc', 'def', 'ghij')
I tried with this regular expression that captures 4 strings. I thought I was getting close, but (1) it doesn't work in Python and (2) I can't figure out how to exclude a few strings from the search. (Yes, I could remove them later, but my real regular expression does everything in one shot and I would like to include this last step in it.)
我尝试使用这个捕获4个字符串的正则表达式。我以为我越来越接近,但是(1)它在Python中不起作用,(2)我无法弄清楚如何从搜索中排除一些字符串。 (是的,我可以在以后删除它们,但是我的真实正则表达式一次性完成所有操作,我想在其中包含最后一步。)
I said it doesn't work in Python because I tried this, expecting the exact same result, but instead I get only the first group:
我说它在Python中不起作用,因为我试过这个,期望完全相同的结果,但我得到的只是第一组:
>>> import re
>>> re.search('-(\w.*?)(?=-)', '-a-bc-def-ghij-').groups()
('a',)
I tried with negative look ahead, but I couldn't find a working solution for this case.
我试着用负面向前看,但我找不到适合这种情况的解决方案。
3 个解决方案
#1
6
You can make use of negative look aheads.
你可以利用负向前瞻。
For example,
>>> re.findall(r'-(?!aa|bb)([^-]+)', string)
['a', 'bc', 'def', 'ghij']
-
-
Matches-
- 火柴 -
-
(?!aa|bb)
Negative lookahead, checks if-
is not followed byaa
orbb
(?!aa | bb)负向前瞻,检查 - 是否 - 后面没有aa或bb
-
([^-]+)
Matches ony or more character other than-
([^ - ] +)匹配除了以外的ony或更多字符 -
Edit
The above regex will not match those which start with aa
or bb
, for example like -aabc-
. To take care of that we can add -
to the lookaheads like,
上面的正则表达式与那些以aa或bb开头的正则表达式不匹配,例如-aabc-。为了照顾我们可以添加 - 像前面这样的,
>>> re.findall(r'-(?!aa-|bb-)([^-]+)', string)
#2
2
You need to use a negative lookahead to restrict a more generic pattern, and a re.findall
to find all matches.
您需要使用负前瞻来限制更通用的模式,使用re.findall来查找所有匹配项。
Use
res = re.findall(r'-(?!(?:aa|bb)-)(\w+)(?=-)', s)
or - if your values in between hyphens can be any but a hyphen, use a negated character class [^-]
:
或者 - 如果连字符之间的值可以是除连字符之外的任何值,请使用否定字符类[^ - ]:
res = re.findall(r'-(?!(?:aa|bb)-)([^-]+)(?=-)', s)
Here is the regex demo.
这是正则表达式演示。
Details:
-
-
- a hyphen -
(?!(?:aa|bb)-)
- if there is aaa-
orbb-
after the first hyphen, no match should be returned -
(\w+)
- Group 1 (this value will be returned by there.findall
call) capturing 1 or more word chars OR[^-]+
- 1 or more characters other than-
-
(?=-)
- there must be a-
after the word chars. The lookahead is required here to ensure overlapping matches (as this hyphen will be a starting point for the next match).
- - 连字符
(?!(?:aa | bb) - ) - 如果在第一个连字符后面有aaa-或bb-,则不应返回匹配项
(\ w +) - 第1组(此值将由re.findall调用返回)捕获1个或多个字符或[^ - ] + - 除1以外的1个或多个字符 -
(?= - ) - 必须有一个 - 字后面的字符。这里需要前瞻以确保重叠匹配(因为此连字符将成为下一个匹配的起点)。
import re
p = re.compile(r'-(?!(?:aa|bb)-)([^-]+)(?=-)')
s = "-a-bc-aa-def-bb-ghij-"
print(p.findall(s)) # => ['a', 'bc', 'def', 'ghij']
#3
0
Although a regex solution was asked for, I would argue that this problem can be solved easier with simpler python functions, namely string splitting and filtering:
虽然要求使用正则表达式解决方案,但我认为使用更简单的python函数(即字符串拆分和过滤)可以更轻松地解决这个问题:
input_list = "-a-bc-aa-def-bb-ghij-"
exclude = set(["aa", "bb"])
result = [s for s in input_list.split('-')[1:-1] if s not in exclude]
This solution has the additional advantage that result
could also be turned into a generator and the result list does not need to be constructed explicitly.
该解决方案具有额外的优点,即结果也可以转换为生成器,并且不需要明确地构造结果列表。
#1
6
You can make use of negative look aheads.
你可以利用负向前瞻。
For example,
>>> re.findall(r'-(?!aa|bb)([^-]+)', string)
['a', 'bc', 'def', 'ghij']
-
-
Matches-
- 火柴 -
-
(?!aa|bb)
Negative lookahead, checks if-
is not followed byaa
orbb
(?!aa | bb)负向前瞻,检查 - 是否 - 后面没有aa或bb
-
([^-]+)
Matches ony or more character other than-
([^ - ] +)匹配除了以外的ony或更多字符 -
Edit
The above regex will not match those which start with aa
or bb
, for example like -aabc-
. To take care of that we can add -
to the lookaheads like,
上面的正则表达式与那些以aa或bb开头的正则表达式不匹配,例如-aabc-。为了照顾我们可以添加 - 像前面这样的,
>>> re.findall(r'-(?!aa-|bb-)([^-]+)', string)
#2
2
You need to use a negative lookahead to restrict a more generic pattern, and a re.findall
to find all matches.
您需要使用负前瞻来限制更通用的模式,使用re.findall来查找所有匹配项。
Use
res = re.findall(r'-(?!(?:aa|bb)-)(\w+)(?=-)', s)
or - if your values in between hyphens can be any but a hyphen, use a negated character class [^-]
:
或者 - 如果连字符之间的值可以是除连字符之外的任何值,请使用否定字符类[^ - ]:
res = re.findall(r'-(?!(?:aa|bb)-)([^-]+)(?=-)', s)
Here is the regex demo.
这是正则表达式演示。
Details:
-
-
- a hyphen -
(?!(?:aa|bb)-)
- if there is aaa-
orbb-
after the first hyphen, no match should be returned -
(\w+)
- Group 1 (this value will be returned by there.findall
call) capturing 1 or more word chars OR[^-]+
- 1 or more characters other than-
-
(?=-)
- there must be a-
after the word chars. The lookahead is required here to ensure overlapping matches (as this hyphen will be a starting point for the next match).
- - 连字符
(?!(?:aa | bb) - ) - 如果在第一个连字符后面有aaa-或bb-,则不应返回匹配项
(\ w +) - 第1组(此值将由re.findall调用返回)捕获1个或多个字符或[^ - ] + - 除1以外的1个或多个字符 -
(?= - ) - 必须有一个 - 字后面的字符。这里需要前瞻以确保重叠匹配(因为此连字符将成为下一个匹配的起点)。
import re
p = re.compile(r'-(?!(?:aa|bb)-)([^-]+)(?=-)')
s = "-a-bc-aa-def-bb-ghij-"
print(p.findall(s)) # => ['a', 'bc', 'def', 'ghij']
#3
0
Although a regex solution was asked for, I would argue that this problem can be solved easier with simpler python functions, namely string splitting and filtering:
虽然要求使用正则表达式解决方案,但我认为使用更简单的python函数(即字符串拆分和过滤)可以更轻松地解决这个问题:
input_list = "-a-bc-aa-def-bb-ghij-"
exclude = set(["aa", "bb"])
result = [s for s in input_list.split('-')[1:-1] if s not in exclude]
This solution has the additional advantage that result
could also be turned into a generator and the result list does not need to be constructed explicitly.
该解决方案具有额外的优点,即结果也可以转换为生成器,并且不需要明确地构造结果列表。