I would like to replace all occurrences of 3 or more "=" with an equal-number of "-".
我想用等号“ - ”替换所有出现的3个或更多“=”。
def f(a, b):
'''
Example
=======
>>> from x import y
'''
return a == b
becomes
def f(a, b):
'''
Example
-------
>>> from x import y
'''
return a == b # don't touch
My working but hacky solution is to pass a lambda to repl
from re.sub()
that grabs the length of each match:
我的工作但hacky解决方案是将lambda传递给re.sub()的repl,它抓住每个匹配的长度:
>>> import re
>>> s = """
... def f(a, b):
... '''
... Example
... =======
... >>> from x import y
... '''
... return a == b"""
>>> eq = r'(={3,})'
>>> print(re.sub(eq, lambda x: '-' * (x.end() - x.start()), s))
def f(a, b):
'''
Example
-------
>>> from x import y
'''
return a == b
Can I do this without needing to pass a function to re.sub()
?
我可以这样做而无需将函数传递给re.sub()吗?
My thinking would be that I'd need r'(=){3,}'
(a variable-length capturing group), but re.sub(r'(=){3,}', '-', s)
has a problem with greediness, I believe.
我的想法是我需要r'(=){3,}'(一个可变长度的捕获组),但是re.sub(r'(=){3,}',' - ',s)我相信,贪婪有问题。
Can I modify the regex eq
above so that the lambda isn't needed?
我可以修改上面的正则表达式,以便不需要lambda吗?
4 个解决方案
#1
2
Using re.sub
, this uses some deceptive lookahead trickery and works assuming your pattern-to-replace is always followed by a newline '\n'
.
使用re.sub,这会使用一些欺骗性的前瞻技巧,并且假设你的替换模式后面总是跟一个换行符'\ n'。
print(re.sub('=(?=={2}|=?\n)', '-', s))
def f(a, b):
'''
Example
-------
>>> from x import y
'''
return a == b
Details
"Replace an equal sign if it is succeeded by two equal signs or an optional equal sign and newline."
详细信息“如果两个等号或可选的等号和换行符替换,则替换等号。”
= # equal sign if
(?=={2} # lookahead
| # regex OR
=? # optional equal sign
\n # newline
)
#2
3
With some help from lookahead/lookbehind it is possible to replace by char:
在lookahead / lookbehind的帮助下,可以用char替换:
>>> re.sub("(=(?===)|(?<===)=|(?<==)=(?==))", "-", "=== == ======= asdlkfj")
... '--- == ------- asdlkfj'
#3
2
It's possible, but not advisable.
这是可能的,但不可取。
The way re.sub
works is that it finds a complete match and then it replaces it. It doesn't replace each capture group separately, so things like re.sub(r'(=){3,}', '-', s)
won't work - that'll replace the entire match with a dash, not each occurence of the =
character.
re.sub的工作方式是找到完全匹配,然后替换它。它不会单独替换每个捕获组,因此re.sub(r'(=){3,}',' - ',s)之类的东西将无效 - 这将用短划线取代整个匹配,不是每个出现的=字符。
>>> re.sub(r'(=){3,}', '-', '=== ===')
'- -'
So if you want to avoid a lambda, you have to write a regex that matches individual =
characters - but only if there's at least 3 of them. This is, of course, much more difficult than simply matching 3 or more =
characters with the simple pattern ={3,}
. It requires some use of lookarounds and looks like this:
因此,如果你想避免使用lambda,你必须编写一个匹配个别=字符的正则表达式 - 但前提是它至少有3个。当然,这比使用简单模式= {3,}简单地匹配3个或更多=字符要困难得多。它需要使用一些外观,看起来像这样:
(?<===)=|(?<==)=(?==)|=(?===)
This does what you want:
这样做你想要的:
>>> re.sub(r'(?<===)=|(?<==)=(?==)|=(?===)', '-', '= == === ======')
'= == --- ------'
But it's clearly much less readable than the original lambda
solution.
但它显然比原始的lambda解决方案更不易读。
#4
2
Using the regex module, you can write:
使用正则表达式模块,您可以编写:
regex.sub(r'\G(?!\A)=|=(?===)', '-', s)
-
\G
is the position immediately after the last successful match or the start of the string.
-
(?!\A)
forces the start of the string to fail.
\ G是紧接在最后一次成功匹配或字符串开始之后的位置。
(?!\ A)强制字符串的开始失败。
The second branch =(?===)
succeeds when a =
is followed by two other =
. Then the next matches use the first branch \G(?!\A)=
until there are no more consecutive =
.
当a =后跟另外两个=时,第二个分支=(?===)成功。然后下一个匹配使用第一个分支\ G(?!\ A)=直到没有连续=。
#1
2
Using re.sub
, this uses some deceptive lookahead trickery and works assuming your pattern-to-replace is always followed by a newline '\n'
.
使用re.sub,这会使用一些欺骗性的前瞻技巧,并且假设你的替换模式后面总是跟一个换行符'\ n'。
print(re.sub('=(?=={2}|=?\n)', '-', s))
def f(a, b):
'''
Example
-------
>>> from x import y
'''
return a == b
Details
"Replace an equal sign if it is succeeded by two equal signs or an optional equal sign and newline."
详细信息“如果两个等号或可选的等号和换行符替换,则替换等号。”
= # equal sign if
(?=={2} # lookahead
| # regex OR
=? # optional equal sign
\n # newline
)
#2
3
With some help from lookahead/lookbehind it is possible to replace by char:
在lookahead / lookbehind的帮助下,可以用char替换:
>>> re.sub("(=(?===)|(?<===)=|(?<==)=(?==))", "-", "=== == ======= asdlkfj")
... '--- == ------- asdlkfj'
#3
2
It's possible, but not advisable.
这是可能的,但不可取。
The way re.sub
works is that it finds a complete match and then it replaces it. It doesn't replace each capture group separately, so things like re.sub(r'(=){3,}', '-', s)
won't work - that'll replace the entire match with a dash, not each occurence of the =
character.
re.sub的工作方式是找到完全匹配,然后替换它。它不会单独替换每个捕获组,因此re.sub(r'(=){3,}',' - ',s)之类的东西将无效 - 这将用短划线取代整个匹配,不是每个出现的=字符。
>>> re.sub(r'(=){3,}', '-', '=== ===')
'- -'
So if you want to avoid a lambda, you have to write a regex that matches individual =
characters - but only if there's at least 3 of them. This is, of course, much more difficult than simply matching 3 or more =
characters with the simple pattern ={3,}
. It requires some use of lookarounds and looks like this:
因此,如果你想避免使用lambda,你必须编写一个匹配个别=字符的正则表达式 - 但前提是它至少有3个。当然,这比使用简单模式= {3,}简单地匹配3个或更多=字符要困难得多。它需要使用一些外观,看起来像这样:
(?<===)=|(?<==)=(?==)|=(?===)
This does what you want:
这样做你想要的:
>>> re.sub(r'(?<===)=|(?<==)=(?==)|=(?===)', '-', '= == === ======')
'= == --- ------'
But it's clearly much less readable than the original lambda
solution.
但它显然比原始的lambda解决方案更不易读。
#4
2
Using the regex module, you can write:
使用正则表达式模块,您可以编写:
regex.sub(r'\G(?!\A)=|=(?===)', '-', s)
-
\G
is the position immediately after the last successful match or the start of the string.
-
(?!\A)
forces the start of the string to fail.
\ G是紧接在最后一次成功匹配或字符串开始之后的位置。
(?!\ A)强制字符串的开始失败。
The second branch =(?===)
succeeds when a =
is followed by two other =
. Then the next matches use the first branch \G(?!\A)=
until there are no more consecutive =
.
当a =后跟另外两个=时,第二个分支=(?===)成功。然后下一个匹配使用第一个分支\ G(?!\ A)=直到没有连续=。