
时间:2022-01-10 22:30:01

Does Python have a function that I can use to escape special characters in a regular expression?


For example, I'm "stuck" :\ should become I\'m \"stuck\" :\\.


6 个解决方案



Use re.escape

>>> re.escape('\ a.*$')
'\\\\\\ a\\.\\*\\$'
>>> print(re.escape('\ a.*$'))
\\\ a\.\*\$
>>> re.escape('www.*.com')
>>> print(re.escape('www.*.com'))

See : http://docs.python.org/library/re.html#module-contents


Repeating it here:



Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.




I'm surprised no one has mentioned using regular expressions via re.sub():


import re
print re.sub(r'([\"])',    r'\\\1', 'it\'s "this"')  # it's \"this\"
print re.sub(r"([\'])",    r'\\\1', 'it\'s "this"')  # it\'s "this"
print re.sub(r'([\" \'])', r'\\\1', 'it\'s "this"')  # it\'s\ \"this\"

Important things to note:


  • In the search pattern, include \ as well as the character(s) you're looking for. You're going to be using \ to escape your characters, so you need to escape that as well.
  • 在搜索模式中,包括\以及您正在寻找的字符。你将使用\来逃避你的角色,所以你也需要逃脱它。

  • Put parentheses around the search pattern, e.g. ([\"]), so that the substitution pattern can use the found character when it adds \ in front of it. (That's what \1 does: uses the value of the first parenthesized group.)
  • 在搜索模式周围加上括号,例如: ([\“]),这样替换模式可以在它前面添加\时使用找到的字符。(这就是\ 1的作用:使用第一个带括号的组的值。)

  • The r in front of r'([\"])' means it's a raw string. Raw strings use different rules for escaping backslashes. To write ([\"]) as a plain string, you'd need to double all the backslashes and write '([\\"])'. Raw strings are friendlier when you're writing regular expressions.
  • r'([\“])'前面的r表示它是一个原始字符串。原始字符串使用不同的规则来转义反斜杠。要将([\”])写成普通字符串,你需要将所有的字符串加倍反斜杠并写'([\\“])'。当你编写正则表达式时,原始字符串更友好。

  • In the substitution pattern, you need to escape \ to distinguish it from a backslash that precedes a substitution group, e.g. \1, hence r'\\\1'. To write that as a plain string, you'd need '\\\\\\1' — and nobody wants that.
  • 在替换模式中,您需要转义\以将其与替换组之前的反斜杠区分开来,例如\ 1,因此r'\\\ 1'。要将其写为普通字符串,您需要'\\\\\\ 1' - 并且没有人想要它。



Use repr()[1:-1]. In this case, the double quotes don't need to be escaped. The [-1:1] slice is to remove the single quote from the beginning and the end.

使用repr()[1:-1]。在这种情况下,双引号不需要转义。 [-1:1]切片是从开头和结尾删除单引号。

>>> x = raw_input()
I'm "stuck" :\
>>> print x
I'm "stuck" :\
>>> print repr(x)[1:-1]
I\'m "stuck" :\\

Or maybe you just want to escape a phrase to paste into your program? If so, do this:


>>> raw_input()
I'm "stuck" :\
'I\'m "stuck" :\\'



As it was mentioned above, the answer depends on your case. If you want to escape a string for a regular expression then you should use re.escape(). But if you want to escape a specific set of characters then use this lambda function:


>>> escape = lambda s, escapechar, specialchars: "".join(escapechar + c if c in specialchars or c == escapechar else c for c in s)
>>> s = raw_input()
I'm "stuck" :\
>>> print s
I'm "stuck" :\
>>> print escape(s, "\\", ['"'])
I'm \"stuck\" :\\



It's not that hard:


def escapeSpecialCharacters ( text, characters ):
    for character in characters:
        text = text.replace( character, '\\' + character )
    return text

>>> escapeSpecialCharacters( 'I\'m "stuck" :\\', '\'"' )
'I\\\'m \\"stuck\\" :\\'
>>> print( _ )
I\'m \"stuck\" :\



If you only want to replace some characters you could use this:


import re

print re.sub(r'([\.\\\+\*\?\[\^\]\$\(\)\{\}\!\<\>\|\:\-])', r'\\\1', "example string.")



Use re.escape

>>> re.escape('\ a.*$')
'\\\\\\ a\\.\\*\\$'
>>> print(re.escape('\ a.*$'))
\\\ a\.\*\$
>>> re.escape('www.*.com')
>>> print(re.escape('www.*.com'))

See : http://docs.python.org/library/re.html#module-contents


Repeating it here:



Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.




I'm surprised no one has mentioned using regular expressions via re.sub():


import re
print re.sub(r'([\"])',    r'\\\1', 'it\'s "this"')  # it's \"this\"
print re.sub(r"([\'])",    r'\\\1', 'it\'s "this"')  # it\'s "this"
print re.sub(r'([\" \'])', r'\\\1', 'it\'s "this"')  # it\'s\ \"this\"

Important things to note:


  • In the search pattern, include \ as well as the character(s) you're looking for. You're going to be using \ to escape your characters, so you need to escape that as well.
  • 在搜索模式中,包括\以及您正在寻找的字符。你将使用\来逃避你的角色,所以你也需要逃脱它。

  • Put parentheses around the search pattern, e.g. ([\"]), so that the substitution pattern can use the found character when it adds \ in front of it. (That's what \1 does: uses the value of the first parenthesized group.)
  • 在搜索模式周围加上括号,例如: ([\“]),这样替换模式可以在它前面添加\时使用找到的字符。(这就是\ 1的作用:使用第一个带括号的组的值。)

  • The r in front of r'([\"])' means it's a raw string. Raw strings use different rules for escaping backslashes. To write ([\"]) as a plain string, you'd need to double all the backslashes and write '([\\"])'. Raw strings are friendlier when you're writing regular expressions.
  • r'([\“])'前面的r表示它是一个原始字符串。原始字符串使用不同的规则来转义反斜杠。要将([\”])写成普通字符串,你需要将所有的字符串加倍反斜杠并写'([\\“])'。当你编写正则表达式时,原始字符串更友好。

  • In the substitution pattern, you need to escape \ to distinguish it from a backslash that precedes a substitution group, e.g. \1, hence r'\\\1'. To write that as a plain string, you'd need '\\\\\\1' — and nobody wants that.
  • 在替换模式中,您需要转义\以将其与替换组之前的反斜杠区分开来,例如\ 1,因此r'\\\ 1'。要将其写为普通字符串,您需要'\\\\\\ 1' - 并且没有人想要它。



Use repr()[1:-1]. In this case, the double quotes don't need to be escaped. The [-1:1] slice is to remove the single quote from the beginning and the end.

使用repr()[1:-1]。在这种情况下,双引号不需要转义。 [-1:1]切片是从开头和结尾删除单引号。

>>> x = raw_input()
I'm "stuck" :\
>>> print x
I'm "stuck" :\
>>> print repr(x)[1:-1]
I\'m "stuck" :\\

Or maybe you just want to escape a phrase to paste into your program? If so, do this:


>>> raw_input()
I'm "stuck" :\
'I\'m "stuck" :\\'



As it was mentioned above, the answer depends on your case. If you want to escape a string for a regular expression then you should use re.escape(). But if you want to escape a specific set of characters then use this lambda function:


>>> escape = lambda s, escapechar, specialchars: "".join(escapechar + c if c in specialchars or c == escapechar else c for c in s)
>>> s = raw_input()
I'm "stuck" :\
>>> print s
I'm "stuck" :\
>>> print escape(s, "\\", ['"'])
I'm \"stuck\" :\\



It's not that hard:


def escapeSpecialCharacters ( text, characters ):
    for character in characters:
        text = text.replace( character, '\\' + character )
    return text

>>> escapeSpecialCharacters( 'I\'m "stuck" :\\', '\'"' )
'I\\\'m \\"stuck\\" :\\'
>>> print( _ )
I\'m \"stuck\" :\



If you only want to replace some characters you could use this:


import re

print re.sub(r'([\.\\\+\*\?\[\^\]\$\(\)\{\}\!\<\>\|\:\-])', r'\\\1', "example string.")