在Python字符串中转义正则表达式特殊字符

时间:2022-01-10 22:30:01

Does Python have a function that I can use to escape special characters in a regular expression?

Python有一个函数可以用来转义正则表达式中的特殊字符吗?

For example, I'm "stuck" :\ should become I\'m \"stuck\" :\\.

例如,我“卡住了”:\应该成为我“卡住”:\\。

6 个解决方案

#1


143  

Use re.escape

re.escape(string)
>>> re.escape('\ a.*$')
'\\\\\\ a\\.\\*\\$'
>>> print(re.escape('\ a.*$'))
\\\ a\.\*\$
>>> re.escape('www.*.com')
'www\\.*\\.com'
>>> print(re.escape('www.*.com'))
www\.*\.com

See : http://docs.python.org/library/re.html#module-contents

请参阅:http://docs.python.org/library/re.html#module-contents

Repeating it here:

在此重复:

re.escape(string)

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

返回字符串,所有非字母数字反向;如果要匹配可能包含正则表达式元字符的任意文字字符串,这非常有用。

#2


15  

I'm surprised no one has mentioned using regular expressions via re.sub():

我很惊讶没有人提到通过re.sub()使用正则表达式:

import re
print re.sub(r'([\"])',    r'\\\1', 'it\'s "this"')  # it's \"this\"
print re.sub(r"([\'])",    r'\\\1', 'it\'s "this"')  # it\'s "this"
print re.sub(r'([\" \'])', r'\\\1', 'it\'s "this"')  # it\'s\ \"this\"

Important things to note:

需要注意的重要事项:

  • In the search pattern, include \ as well as the character(s) you're looking for. You're going to be using \ to escape your characters, so you need to escape that as well.
  • 在搜索模式中,包括\以及您正在寻找的字符。你将使用\来逃避你的角色,所以你也需要逃脱它。

  • Put parentheses around the search pattern, e.g. ([\"]), so that the substitution pattern can use the found character when it adds \ in front of it. (That's what \1 does: uses the value of the first parenthesized group.)
  • 在搜索模式周围加上括号,例如: ([\“]),这样替换模式可以在它前面添加\时使用找到的字符。(这就是\ 1的作用:使用第一个带括号的组的值。)

  • The r in front of r'([\"])' means it's a raw string. Raw strings use different rules for escaping backslashes. To write ([\"]) as a plain string, you'd need to double all the backslashes and write '([\\"])'. Raw strings are friendlier when you're writing regular expressions.
  • r'([\“])'前面的r表示它是一个原始字符串。原始字符串使用不同的规则来转义反斜杠。要将([\”])写成普通字符串,你需要将所有的字符串加倍反斜杠并写'([\\“])'。当你编写正则表达式时,原始字符串更友好。

  • In the substitution pattern, you need to escape \ to distinguish it from a backslash that precedes a substitution group, e.g. \1, hence r'\\\1'. To write that as a plain string, you'd need '\\\\\\1' — and nobody wants that.
  • 在替换模式中,您需要转义\以将其与替换组之前的反斜杠区分开来,例如\ 1,因此r'\\\ 1'。要将其写为普通字符串,您需要'\\\\\\ 1' - 并且没有人想要它。

#3


9  

Use repr()[1:-1]. In this case, the double quotes don't need to be escaped. The [-1:1] slice is to remove the single quote from the beginning and the end.

使用repr()[1:-1]。在这种情况下,双引号不需要转义。 [-1:1]切片是从开头和结尾删除单引号。

>>> x = raw_input()
I'm "stuck" :\
>>> print x
I'm "stuck" :\
>>> print repr(x)[1:-1]
I\'m "stuck" :\\

Or maybe you just want to escape a phrase to paste into your program? If so, do this:

或者你可能只是想逃避一个短语粘贴到你的程序?如果是,请执行以下操作:

>>> raw_input()
I'm "stuck" :\
'I\'m "stuck" :\\'

#4


3  

As it was mentioned above, the answer depends on your case. If you want to escape a string for a regular expression then you should use re.escape(). But if you want to escape a specific set of characters then use this lambda function:

如上所述,答案取决于您的情况。如果要转义正则表达式的字符串,则应使用re.escape()。但是如果你想转义一组特定的字符,那么使用这个lambda函数:

>>> escape = lambda s, escapechar, specialchars: "".join(escapechar + c if c in specialchars or c == escapechar else c for c in s)
>>> s = raw_input()
I'm "stuck" :\
>>> print s
I'm "stuck" :\
>>> print escape(s, "\\", ['"'])
I'm \"stuck\" :\\

#5


2  

It's not that hard:

这并不难:

def escapeSpecialCharacters ( text, characters ):
    for character in characters:
        text = text.replace( character, '\\' + character )
    return text

>>> escapeSpecialCharacters( 'I\'m "stuck" :\\', '\'"' )
'I\\\'m \\"stuck\\" :\\'
>>> print( _ )
I\'m \"stuck\" :\

#6


2  

If you only want to replace some characters you could use this:

如果您只想替换一些字符,可以使用:

import re

print re.sub(r'([\.\\\+\*\?\[\^\]\$\(\)\{\}\!\<\>\|\:\-])', r'\\\1', "example string.")

#1


143  

Use re.escape

re.escape(string)
>>> re.escape('\ a.*$')
'\\\\\\ a\\.\\*\\$'
>>> print(re.escape('\ a.*$'))
\\\ a\.\*\$
>>> re.escape('www.*.com')
'www\\.*\\.com'
>>> print(re.escape('www.*.com'))
www\.*\.com

See : http://docs.python.org/library/re.html#module-contents

请参阅:http://docs.python.org/library/re.html#module-contents

Repeating it here:

在此重复:

re.escape(string)

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

返回字符串,所有非字母数字反向;如果要匹配可能包含正则表达式元字符的任意文字字符串,这非常有用。

#2


15  

I'm surprised no one has mentioned using regular expressions via re.sub():

我很惊讶没有人提到通过re.sub()使用正则表达式:

import re
print re.sub(r'([\"])',    r'\\\1', 'it\'s "this"')  # it's \"this\"
print re.sub(r"([\'])",    r'\\\1', 'it\'s "this"')  # it\'s "this"
print re.sub(r'([\" \'])', r'\\\1', 'it\'s "this"')  # it\'s\ \"this\"

Important things to note:

需要注意的重要事项:

  • In the search pattern, include \ as well as the character(s) you're looking for. You're going to be using \ to escape your characters, so you need to escape that as well.
  • 在搜索模式中,包括\以及您正在寻找的字符。你将使用\来逃避你的角色,所以你也需要逃脱它。

  • Put parentheses around the search pattern, e.g. ([\"]), so that the substitution pattern can use the found character when it adds \ in front of it. (That's what \1 does: uses the value of the first parenthesized group.)
  • 在搜索模式周围加上括号,例如: ([\“]),这样替换模式可以在它前面添加\时使用找到的字符。(这就是\ 1的作用:使用第一个带括号的组的值。)

  • The r in front of r'([\"])' means it's a raw string. Raw strings use different rules for escaping backslashes. To write ([\"]) as a plain string, you'd need to double all the backslashes and write '([\\"])'. Raw strings are friendlier when you're writing regular expressions.
  • r'([\“])'前面的r表示它是一个原始字符串。原始字符串使用不同的规则来转义反斜杠。要将([\”])写成普通字符串,你需要将所有的字符串加倍反斜杠并写'([\\“])'。当你编写正则表达式时,原始字符串更友好。

  • In the substitution pattern, you need to escape \ to distinguish it from a backslash that precedes a substitution group, e.g. \1, hence r'\\\1'. To write that as a plain string, you'd need '\\\\\\1' — and nobody wants that.
  • 在替换模式中,您需要转义\以将其与替换组之前的反斜杠区分开来,例如\ 1,因此r'\\\ 1'。要将其写为普通字符串,您需要'\\\\\\ 1' - 并且没有人想要它。

#3


9  

Use repr()[1:-1]. In this case, the double quotes don't need to be escaped. The [-1:1] slice is to remove the single quote from the beginning and the end.

使用repr()[1:-1]。在这种情况下,双引号不需要转义。 [-1:1]切片是从开头和结尾删除单引号。

>>> x = raw_input()
I'm "stuck" :\
>>> print x
I'm "stuck" :\
>>> print repr(x)[1:-1]
I\'m "stuck" :\\

Or maybe you just want to escape a phrase to paste into your program? If so, do this:

或者你可能只是想逃避一个短语粘贴到你的程序?如果是,请执行以下操作:

>>> raw_input()
I'm "stuck" :\
'I\'m "stuck" :\\'

#4


3  

As it was mentioned above, the answer depends on your case. If you want to escape a string for a regular expression then you should use re.escape(). But if you want to escape a specific set of characters then use this lambda function:

如上所述,答案取决于您的情况。如果要转义正则表达式的字符串,则应使用re.escape()。但是如果你想转义一组特定的字符,那么使用这个lambda函数:

>>> escape = lambda s, escapechar, specialchars: "".join(escapechar + c if c in specialchars or c == escapechar else c for c in s)
>>> s = raw_input()
I'm "stuck" :\
>>> print s
I'm "stuck" :\
>>> print escape(s, "\\", ['"'])
I'm \"stuck\" :\\

#5


2  

It's not that hard:

这并不难:

def escapeSpecialCharacters ( text, characters ):
    for character in characters:
        text = text.replace( character, '\\' + character )
    return text

>>> escapeSpecialCharacters( 'I\'m "stuck" :\\', '\'"' )
'I\\\'m \\"stuck\\" :\\'
>>> print( _ )
I\'m \"stuck\" :\

#6


2  

If you only want to replace some characters you could use this:

如果您只想替换一些字符,可以使用:

import re

print re.sub(r'([\.\\\+\*\?\[\^\]\$\(\)\{\}\!\<\>\|\:\-])', r'\\\1', "example string.")