Does Python have a function that I can use to escape special characters in a regular expression?
Python有一个函数可以用来转义正则表达式中的特殊字符吗?
For example, I'm "stuck" :\
should become I\'m \"stuck\" :\\
.
例如,我“卡住了”:\应该成为我“卡住”:\\。
6 个解决方案
#1
143
Use re.escape
re.escape(string)
>>> re.escape('\ a.*$')
'\\\\\\ a\\.\\*\\$'
>>> print(re.escape('\ a.*$'))
\\\ a\.\*\$
>>> re.escape('www.*.com')
'www\\.*\\.com'
>>> print(re.escape('www.*.com'))
www\.*\.com
See : http://docs.python.org/library/re.html#module-contents
请参阅:http://docs.python.org/library/re.html#module-contents
Repeating it here:
在此重复:
re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
返回字符串,所有非字母数字反向;如果要匹配可能包含正则表达式元字符的任意文字字符串,这非常有用。
#2
15
I'm surprised no one has mentioned using regular expressions via re.sub()
:
我很惊讶没有人提到通过re.sub()使用正则表达式:
import re
print re.sub(r'([\"])', r'\\\1', 'it\'s "this"') # it's \"this\"
print re.sub(r"([\'])", r'\\\1', 'it\'s "this"') # it\'s "this"
print re.sub(r'([\" \'])', r'\\\1', 'it\'s "this"') # it\'s\ \"this\"
Important things to note:
需要注意的重要事项:
- In the search pattern, include
\
as well as the character(s) you're looking for. You're going to be using\
to escape your characters, so you need to escape that as well. - Put parentheses around the search pattern, e.g.
([\"])
, so that the substitution pattern can use the found character when it adds\
in front of it. (That's what\1
does: uses the value of the first parenthesized group.) - The
r
in front ofr'([\"])'
means it's a raw string. Raw strings use different rules for escaping backslashes. To write([\"])
as a plain string, you'd need to double all the backslashes and write'([\\"])'
. Raw strings are friendlier when you're writing regular expressions. - In the substitution pattern, you need to escape
\
to distinguish it from a backslash that precedes a substitution group, e.g.\1
, hencer'\\\1'
. To write that as a plain string, you'd need'\\\\\\1'
— and nobody wants that.
在搜索模式中,包括\以及您正在寻找的字符。你将使用\来逃避你的角色,所以你也需要逃脱它。
在搜索模式周围加上括号,例如: ([\“]),这样替换模式可以在它前面添加\时使用找到的字符。(这就是\ 1的作用:使用第一个带括号的组的值。)
r'([\“])'前面的r表示它是一个原始字符串。原始字符串使用不同的规则来转义反斜杠。要将([\”])写成普通字符串,你需要将所有的字符串加倍反斜杠并写'([\\“])'。当你编写正则表达式时,原始字符串更友好。
在替换模式中,您需要转义\以将其与替换组之前的反斜杠区分开来,例如\ 1,因此r'\\\ 1'。要将其写为普通字符串,您需要'\\\\\\ 1' - 并且没有人想要它。
#3
9
Use repr()[1:-1]. In this case, the double quotes don't need to be escaped. The [-1:1] slice is to remove the single quote from the beginning and the end.
使用repr()[1:-1]。在这种情况下,双引号不需要转义。 [-1:1]切片是从开头和结尾删除单引号。
>>> x = raw_input()
I'm "stuck" :\
>>> print x
I'm "stuck" :\
>>> print repr(x)[1:-1]
I\'m "stuck" :\\
Or maybe you just want to escape a phrase to paste into your program? If so, do this:
或者你可能只是想逃避一个短语粘贴到你的程序?如果是,请执行以下操作:
>>> raw_input()
I'm "stuck" :\
'I\'m "stuck" :\\'
#4
3
As it was mentioned above, the answer depends on your case. If you want to escape a string for a regular expression then you should use re.escape(). But if you want to escape a specific set of characters then use this lambda function:
如上所述,答案取决于您的情况。如果要转义正则表达式的字符串,则应使用re.escape()。但是如果你想转义一组特定的字符,那么使用这个lambda函数:
>>> escape = lambda s, escapechar, specialchars: "".join(escapechar + c if c in specialchars or c == escapechar else c for c in s)
>>> s = raw_input()
I'm "stuck" :\
>>> print s
I'm "stuck" :\
>>> print escape(s, "\\", ['"'])
I'm \"stuck\" :\\
#5
2
It's not that hard:
这并不难:
def escapeSpecialCharacters ( text, characters ):
for character in characters:
text = text.replace( character, '\\' + character )
return text
>>> escapeSpecialCharacters( 'I\'m "stuck" :\\', '\'"' )
'I\\\'m \\"stuck\\" :\\'
>>> print( _ )
I\'m \"stuck\" :\
#6
2
If you only want to replace some characters you could use this:
如果您只想替换一些字符,可以使用:
import re
print re.sub(r'([\.\\\+\*\?\[\^\]\$\(\)\{\}\!\<\>\|\:\-])', r'\\\1', "example string.")
#1
143
Use re.escape
re.escape(string)
>>> re.escape('\ a.*$')
'\\\\\\ a\\.\\*\\$'
>>> print(re.escape('\ a.*$'))
\\\ a\.\*\$
>>> re.escape('www.*.com')
'www\\.*\\.com'
>>> print(re.escape('www.*.com'))
www\.*\.com
See : http://docs.python.org/library/re.html#module-contents
请参阅:http://docs.python.org/library/re.html#module-contents
Repeating it here:
在此重复:
re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
返回字符串,所有非字母数字反向;如果要匹配可能包含正则表达式元字符的任意文字字符串,这非常有用。
#2
15
I'm surprised no one has mentioned using regular expressions via re.sub()
:
我很惊讶没有人提到通过re.sub()使用正则表达式:
import re
print re.sub(r'([\"])', r'\\\1', 'it\'s "this"') # it's \"this\"
print re.sub(r"([\'])", r'\\\1', 'it\'s "this"') # it\'s "this"
print re.sub(r'([\" \'])', r'\\\1', 'it\'s "this"') # it\'s\ \"this\"
Important things to note:
需要注意的重要事项:
- In the search pattern, include
\
as well as the character(s) you're looking for. You're going to be using\
to escape your characters, so you need to escape that as well. - Put parentheses around the search pattern, e.g.
([\"])
, so that the substitution pattern can use the found character when it adds\
in front of it. (That's what\1
does: uses the value of the first parenthesized group.) - The
r
in front ofr'([\"])'
means it's a raw string. Raw strings use different rules for escaping backslashes. To write([\"])
as a plain string, you'd need to double all the backslashes and write'([\\"])'
. Raw strings are friendlier when you're writing regular expressions. - In the substitution pattern, you need to escape
\
to distinguish it from a backslash that precedes a substitution group, e.g.\1
, hencer'\\\1'
. To write that as a plain string, you'd need'\\\\\\1'
— and nobody wants that.
在搜索模式中,包括\以及您正在寻找的字符。你将使用\来逃避你的角色,所以你也需要逃脱它。
在搜索模式周围加上括号,例如: ([\“]),这样替换模式可以在它前面添加\时使用找到的字符。(这就是\ 1的作用:使用第一个带括号的组的值。)
r'([\“])'前面的r表示它是一个原始字符串。原始字符串使用不同的规则来转义反斜杠。要将([\”])写成普通字符串,你需要将所有的字符串加倍反斜杠并写'([\\“])'。当你编写正则表达式时,原始字符串更友好。
在替换模式中,您需要转义\以将其与替换组之前的反斜杠区分开来,例如\ 1,因此r'\\\ 1'。要将其写为普通字符串,您需要'\\\\\\ 1' - 并且没有人想要它。
#3
9
Use repr()[1:-1]. In this case, the double quotes don't need to be escaped. The [-1:1] slice is to remove the single quote from the beginning and the end.
使用repr()[1:-1]。在这种情况下,双引号不需要转义。 [-1:1]切片是从开头和结尾删除单引号。
>>> x = raw_input()
I'm "stuck" :\
>>> print x
I'm "stuck" :\
>>> print repr(x)[1:-1]
I\'m "stuck" :\\
Or maybe you just want to escape a phrase to paste into your program? If so, do this:
或者你可能只是想逃避一个短语粘贴到你的程序?如果是,请执行以下操作:
>>> raw_input()
I'm "stuck" :\
'I\'m "stuck" :\\'
#4
3
As it was mentioned above, the answer depends on your case. If you want to escape a string for a regular expression then you should use re.escape(). But if you want to escape a specific set of characters then use this lambda function:
如上所述,答案取决于您的情况。如果要转义正则表达式的字符串,则应使用re.escape()。但是如果你想转义一组特定的字符,那么使用这个lambda函数:
>>> escape = lambda s, escapechar, specialchars: "".join(escapechar + c if c in specialchars or c == escapechar else c for c in s)
>>> s = raw_input()
I'm "stuck" :\
>>> print s
I'm "stuck" :\
>>> print escape(s, "\\", ['"'])
I'm \"stuck\" :\\
#5
2
It's not that hard:
这并不难:
def escapeSpecialCharacters ( text, characters ):
for character in characters:
text = text.replace( character, '\\' + character )
return text
>>> escapeSpecialCharacters( 'I\'m "stuck" :\\', '\'"' )
'I\\\'m \\"stuck\\" :\\'
>>> print( _ )
I\'m \"stuck\" :\
#6
2
If you only want to replace some characters you could use this:
如果您只想替换一些字符,可以使用:
import re
print re.sub(r'([\.\\\+\*\?\[\^\]\$\(\)\{\}\!\<\>\|\:\-])', r'\\\1', "example string.")