从python字符串转义特殊字符的最佳实践是什么?(复制)

时间:2021-05-30 00:23:47

This question already has an answer here:

这个问题已经有了答案:

I have a search term that I am using re.search() with and I would like to know what is the best way to escape characters out of the string (such as (), [], \/, {} ) so that my regex interprets it correctly.

我有一个使用re.search()的搜索术语,我想知道从字符串中转义字符(例如()、[]、\/、{})的最佳方式是什么,以便我的regex能够正确地解释它。

Currently I am doing the following

目前我正在做以下工作

searchString.replace('\\', '\\\\').replace(')','\)').replace('(','\(')

Is there anything built in to do this or is there a better besides explicitly calling replace on every special character I need to escape?

有什么内置的东西来做这个,或者除了显式地调用替换我需要转义的每个特殊字符之外,还有更好的方法吗?

3 个解决方案

#1


18  

re.escape function does that for you.

escape函数为你做了这个。

>>> import re
>>> re.escape('escape this. /')
'escape\\ this\\.\\ \\/'
>>> 
>>> re.escape('www.*.com')
'www\\.*\\.com'

As the documentation says:

的文档表示:

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

返回字符串,所有非字母数字字符反切;如果您想匹配一个可能包含正则表达式元字符的任意文字字符串,那么这是非常有用的。

#2


0  

Use raw strings.

使用原始字符串。

From the docs on raw strings:

来自原始字符串文档:

When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase 'n'. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

当出现一个'r'或'r'前缀时,字符串中包含一个后跟反斜杠的字符,没有进行任何修改,所有反斜杠都留在字符串中。例如,字符串文字r"\n"由两个字符组成:一个反斜杠和一个小写的'n'。可以使用反斜杠来转义字符串引号,但是反斜杠仍然保留在字符串中;例如,r“\”是一个有效的字符串文字,由两个字符组成:反斜杠和双引号;r“\”不是一个有效的字符串文字(即使是一个原始字符串也不能以奇数的反斜杠结束)。特别地,原始字符串不能以单个反斜杠结尾(因为反斜杠将转义为以下引号字符)。还要注意,后面跟着换行的一个反斜杠被解释为这两个字符作为字符串的一部分,而不是作为行延续。

#3


0  

Use Python's raw string notation. From http://docs.python.org/library/re.html:

使用Python的原始字符串表示法。从http://docs.python.org/library/re.html:

Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\' as the pattern string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal.

正则表达式使用反斜杠字符('\')来表示特殊的形式,或者允许使用特殊的字符,而不调用它们的特殊含义。这与Python在字符串文本中使用相同字符的目的相冲突;例如,要匹配一个字面反斜杠,您可能必须将'\\ \'作为模式字符串,因为正则表达式必须是\,并且每个反斜杠必须在一个常规Python字符串文字中表示为\。

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

解决方案是对正则表达式模式使用Python的原始字符串表示法;反斜杠在以“r”为前缀的字符串中不会以任何特殊方式处理。所以r"\n"是一个包含'\'和'n'的双字符字符串,而"\n"是一个包含换行的单字符字符串。通常,模式将使用这种原始字符串表示法在Python代码中表示。

#1


18  

re.escape function does that for you.

escape函数为你做了这个。

>>> import re
>>> re.escape('escape this. /')
'escape\\ this\\.\\ \\/'
>>> 
>>> re.escape('www.*.com')
'www\\.*\\.com'

As the documentation says:

的文档表示:

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

返回字符串,所有非字母数字字符反切;如果您想匹配一个可能包含正则表达式元字符的任意文字字符串,那么这是非常有用的。

#2


0  

Use raw strings.

使用原始字符串。

From the docs on raw strings:

来自原始字符串文档:

When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase 'n'. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

当出现一个'r'或'r'前缀时,字符串中包含一个后跟反斜杠的字符,没有进行任何修改,所有反斜杠都留在字符串中。例如,字符串文字r"\n"由两个字符组成:一个反斜杠和一个小写的'n'。可以使用反斜杠来转义字符串引号,但是反斜杠仍然保留在字符串中;例如,r“\”是一个有效的字符串文字,由两个字符组成:反斜杠和双引号;r“\”不是一个有效的字符串文字(即使是一个原始字符串也不能以奇数的反斜杠结束)。特别地,原始字符串不能以单个反斜杠结尾(因为反斜杠将转义为以下引号字符)。还要注意,后面跟着换行的一个反斜杠被解释为这两个字符作为字符串的一部分,而不是作为行延续。

#3


0  

Use Python's raw string notation. From http://docs.python.org/library/re.html:

使用Python的原始字符串表示法。从http://docs.python.org/library/re.html:

Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\' as the pattern string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal.

正则表达式使用反斜杠字符('\')来表示特殊的形式,或者允许使用特殊的字符,而不调用它们的特殊含义。这与Python在字符串文本中使用相同字符的目的相冲突;例如,要匹配一个字面反斜杠,您可能必须将'\\ \'作为模式字符串,因为正则表达式必须是\,并且每个反斜杠必须在一个常规Python字符串文字中表示为\。

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

解决方案是对正则表达式模式使用Python的原始字符串表示法;反斜杠在以“r”为前缀的字符串中不会以任何特殊方式处理。所以r"\n"是一个包含'\'和'n'的双字符字符串,而"\n"是一个包含换行的单字符字符串。通常,模式将使用这种原始字符串表示法在Python代码中表示。