使用`re.sub()`进行可变长度替换

时间:2022-09-10 20:24:24

I would like to replace all occurrences of 3 or more "=" with an equal-number of "-".

我想用等号“ - ”替换所有出现的3个或更多“=”。

def f(a, b):
    '''
    Example
    =======
    >>> from x import y
    '''
    return a == b

becomes

def f(a, b):
    '''
    Example
    -------
    >>> from x import y
    '''
    return a == b        # don't touch

My working but hacky solution is to pass a lambda to repl from re.sub() that grabs the length of each match:

我的工作但hacky解决方案是将lambda传递给re.sub()的repl,它抓住每个匹配的长度:

>>> import re

>>> s = """
... def f(a, b):
...     '''
...     Example
...     =======
...     >>> from x import y
...     '''
...     return a == b"""

>>> eq = r'(={3,})'
>>> print(re.sub(eq, lambda x: '-' * (x.end() - x.start()), s))

def f(a, b):
    '''
    Example
    -------
    >>> from x import y
    '''
    return a == b

Can I do this without needing to pass a function to re.sub()?

我可以这样做而无需将函数传递给re.sub()吗?

My thinking would be that I'd need r'(=){3,}' (a variable-length capturing group), but re.sub(r'(=){3,}', '-', s) has a problem with greediness, I believe.

我的想法是我需要r'(=){3,}'(一个可变长度的捕获组),但是re.sub(r'(=){3,}',' - ',s)我相信,贪婪有问题。

Can I modify the regex eq above so that the lambda isn't needed?

我可以修改上面的正则表达式,以便不需要lambda吗?

4 个解决方案

#1


2  

Using re.sub, this uses some deceptive lookahead trickery and works assuming your pattern-to-replace is always followed by a newline '\n'.

使用re.sub,这会使用一些欺骗性的前瞻技巧,并且假设你的替换模式后面总是跟一个换行符'\ n'。

print(re.sub('=(?=={2}|=?\n)', '-',  s))
def f(a, b):
    '''
    Example
    -------
    >>> from x import y
    '''
    return a == b

Details
"Replace an equal sign if it is succeeded by two equal signs or an optional equal sign and newline."

详细信息“如果两个等号或可选的等号和换行符替换,则替换等号。”

=        # equal sign if
(?=={2}  # lookahead
|        # regex OR
=?       # optional equal sign
\n       # newline
)

#2


3  

With some help from lookahead/lookbehind it is possible to replace by char:

在lookahead / lookbehind的帮助下,可以用char替换:

>>> re.sub("(=(?===)|(?<===)=|(?<==)=(?==))", "-", "=== == ======= asdlkfj")
... '--- == ------- asdlkfj'

#3


2  

It's possible, but not advisable.

这是可能的,但不可取。

The way re.sub works is that it finds a complete match and then it replaces it. It doesn't replace each capture group separately, so things like re.sub(r'(=){3,}', '-', s) won't work - that'll replace the entire match with a dash, not each occurence of the = character.

re.sub的工作方式是找到完全匹配,然后替换它。它不会单独替换每个捕获组,因此re.sub(r'(=){3,}',' - ',s)之类的东西将无效 - 这将用短划线取代整个匹配,不是每个出现的=字符。

>>> re.sub(r'(=){3,}', '-', '=== ===')
'- -'

So if you want to avoid a lambda, you have to write a regex that matches individual = characters - but only if there's at least 3 of them. This is, of course, much more difficult than simply matching 3 or more = characters with the simple pattern ={3,}. It requires some use of lookarounds and looks like this:

因此,如果你想避免使用lambda,你必须编写一个匹配个别=字符的正则表达式 - 但前提是它至少有3个。当然,这比使用简单模式= {3,}简单地匹配3个或更多=字符要困难得多。它需要使用一些外观,看起来像这样:

(?<===)=|(?<==)=(?==)|=(?===)

This does what you want:

这样做你想要的:

>>> re.sub(r'(?<===)=|(?<==)=(?==)|=(?===)', '-', '= == === ======')
'= == --- ------'

But it's clearly much less readable than the original lambda solution.

但它显然比原始的lambda解决方案更不易读。

#4


2  

Using the regex module, you can write:

使用正则表达式模块,您可以编写:

regex.sub(r'\G(?!\A)=|=(?===)', '-', s)
  • \G is the position immediately after the last successful match or the start of the string.
  • \ G是紧接在最后一次成功匹配或字符串开始之后的位置。

  • (?!\A) forces the start of the string to fail.
  • (?!\ A)强制字符串的开始失败。

The second branch =(?===) succeeds when a = is followed by two other =. Then the next matches use the first branch \G(?!\A)= until there are no more consecutive =.

当a =后跟另外两个=时,第二个分支=(?===)成功。然后下一个匹配使用第一个分支\ G(?!\ A)=直到没有连续=。

demo

#1


2  

Using re.sub, this uses some deceptive lookahead trickery and works assuming your pattern-to-replace is always followed by a newline '\n'.

使用re.sub,这会使用一些欺骗性的前瞻技巧,并且假设你的替换模式后面总是跟一个换行符'\ n'。

print(re.sub('=(?=={2}|=?\n)', '-',  s))
def f(a, b):
    '''
    Example
    -------
    >>> from x import y
    '''
    return a == b

Details
"Replace an equal sign if it is succeeded by two equal signs or an optional equal sign and newline."

详细信息“如果两个等号或可选的等号和换行符替换,则替换等号。”

=        # equal sign if
(?=={2}  # lookahead
|        # regex OR
=?       # optional equal sign
\n       # newline
)

#2


3  

With some help from lookahead/lookbehind it is possible to replace by char:

在lookahead / lookbehind的帮助下,可以用char替换:

>>> re.sub("(=(?===)|(?<===)=|(?<==)=(?==))", "-", "=== == ======= asdlkfj")
... '--- == ------- asdlkfj'

#3


2  

It's possible, but not advisable.

这是可能的,但不可取。

The way re.sub works is that it finds a complete match and then it replaces it. It doesn't replace each capture group separately, so things like re.sub(r'(=){3,}', '-', s) won't work - that'll replace the entire match with a dash, not each occurence of the = character.

re.sub的工作方式是找到完全匹配,然后替换它。它不会单独替换每个捕获组,因此re.sub(r'(=){3,}',' - ',s)之类的东西将无效 - 这将用短划线取代整个匹配,不是每个出现的=字符。

>>> re.sub(r'(=){3,}', '-', '=== ===')
'- -'

So if you want to avoid a lambda, you have to write a regex that matches individual = characters - but only if there's at least 3 of them. This is, of course, much more difficult than simply matching 3 or more = characters with the simple pattern ={3,}. It requires some use of lookarounds and looks like this:

因此,如果你想避免使用lambda,你必须编写一个匹配个别=字符的正则表达式 - 但前提是它至少有3个。当然,这比使用简单模式= {3,}简单地匹配3个或更多=字符要困难得多。它需要使用一些外观,看起来像这样:

(?<===)=|(?<==)=(?==)|=(?===)

This does what you want:

这样做你想要的:

>>> re.sub(r'(?<===)=|(?<==)=(?==)|=(?===)', '-', '= == === ======')
'= == --- ------'

But it's clearly much less readable than the original lambda solution.

但它显然比原始的lambda解决方案更不易读。

#4


2  

Using the regex module, you can write:

使用正则表达式模块,您可以编写:

regex.sub(r'\G(?!\A)=|=(?===)', '-', s)
  • \G is the position immediately after the last successful match or the start of the string.
  • \ G是紧接在最后一次成功匹配或字符串开始之后的位置。

  • (?!\A) forces the start of the string to fail.
  • (?!\ A)强制字符串的开始失败。

The second branch =(?===) succeeds when a = is followed by two other =. Then the next matches use the first branch \G(?!\A)= until there are no more consecutive =.

当a =后跟另外两个=时,第二个分支=(?===)成功。然后下一个匹配使用第一个分支\ G(?!\ A)=直到没有连续=。

demo