python的r .compile(r' pattern flags')是什么意思?

时间:2021-03-13 22:29:30

I am reading through http://docs.python.org/2/library/re.html. According to this the "r" in pythons re.compile(r' pattern flags') refers the raw string notation :

我正在阅读http://docs.python.org/2/library/re.html。根据这一点,python的re.compile(r' pattern flags')中的“r”是指原始的字符串表示法:

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

解决方案是对正则表达式模式使用Python的原始字符串表示法;反斜杠在以“r”为前缀的字符串中不会以任何特殊方式处理。所以r"\n"是一个包含'\'和'n'的双字符字符串,而"\n"是一个包含换行的单字符字符串。通常,模式将使用这种原始字符串表示法在Python代码中表示。

Would it be fair to say then that:

这样说是否公平:

re.compile(r pattern) means that "pattern" is a regex while, re.compile(pattern) means that "pattern" is an exact match?

re.compile(r pattern)意味着“pattern”是一个regex,而re.compile(pattern)意味着“pattern”是一个精确的匹配。

3 个解决方案

#1


26  

As @PauloBu stated, the r string prefix is not specifically related to regex's, but to strings generally in Python.

正如@PauloBu所指出的,r字符串前缀与regex没有特别的关系,而是与Python中的一般字符串有关。

Normal strings use the backslash character as an escape character for special characters (like newlines):

普通字符串使用反斜线字符作为特殊字符(如换行)的转义字符:

>>> print 'this is \n a test'
this is 
 a test

The r prefix tells the interpreter not to do this:

r前缀告诉解释器不要这样做:

>>> print r'this is \n a test'
this is \n a test
>>> 

This is important in regular expressions, as you need the backslash to make it to the re module intact - in particular, \b matches empty string specifically at the start and end of a word. re expects the string \b, however normal string interpretation '\b' is converted to the ASCII backspace character, so you need to either explicitly escape the backslash ('\\b'), or tell python it is a raw string (r'\b').

这在正则表达式中是很重要的,因为您需要反斜杠使它完整地到达re模块—特别是,\b在单词的开头和结尾匹配空字符串。re期望字符串\b,然而普通的字符串解释'\b'被转换为ASCII回空字符,所以您需要显式转义反斜杠('\ b'),或者告诉python它是一个原始字符串(r'\b')。

>>> import re
>>> re.findall('\b', 'test') # the backslash gets consumed by the python string interpreter
[]
>>> re.findall('\\b', 'test') # backslash is explicitly escaped and is passed through to re module
['', '']
>>> re.findall(r'\b', 'test') # often this syntax is easier
['', '']

#2


6  

No, as the documentation pasted in explains the r prefix to a string indicates that the string is a raw string.

不,正如粘贴进来的文档解释的那样,字符串的r前缀表明该字符串是一个原始字符串。

Because of the collisions between Python escaping of characters and regex escaping, both of which use the back-slash \ character, raw strings provide a way to indicate to python that you want an unescaped string.

由于字符的Python转义和regex转义之间的冲突(这两个转义都使用反斜杠\字符),原始字符串提供了一种方法来向Python表明您想要一个未转义字符串。

Examine the following:

检查以下几点:

>>> "\n"
'\n'
>>> r"\n"
'\\n'
>>> print "\n"


>>> print r"\n"
\n

Prefixing with an r merely indicates to the string that backslashes \ should be treated literally and not as escape characters for python.

以r开头的前缀只是向字符串表示,反斜杠\应该按照字面意思处理,而不是作为python的转义字符。

This is helpful, when for example you are searching on a word boundry. The regex for this is \b, however to capture this in a Python string, I'd need to use "\\b" as the pattern. Instead, I can use the raw string: r"\b" to pattern match on.

这很有帮助,例如当你搜索一个单词boundry。它的regex是\b,但是要在Python字符串中捕获它,我需要使用“\b”作为模式。相反,我可以使用原始字符串:r“\b”来进行模式匹配。

This becomes especially handy when trying to find a literal backslash in regex. To match a backslash in regex I need to use the pattern \\, to escape this in python means I need to escape each slash and the pattern becomes "\\\\", or the much simpler r"\\".

当试图在regex中找到一个字面反斜杠时,这将变得特别方便。为了匹配regex中的反斜杠,我需要使用模式\\,以python的方式来逃避这个,这意味着我需要避免每个斜杠,而模式变成“\\\”,或者更简单的r“\\”。

As you can guess in longer and more complex regexes, the extra slashes can get confusing, so raw strings are generally considered the way to go.

正如您可以在更长的、更复杂的regex中猜测的那样,额外的斜杠可能会让人感到困惑,因此原始字符串通常被认为是正确的。

#3


2  

No. Not everything in regex syntax needs to be preceded by \, so ., *, +, etc still have special meaning in a pattern

不。并不是regex语法中的所有内容都需要在前面加上\,所以.、*、+等等在模式中仍然具有特殊的意义

The r'' is often used as a convenience for regex that do need a lot of \ as it prevents the clutter of doubling up the \

r通常被用作regex的一种便利,它确实需要很多的\,因为它可以防止将\加倍的混乱

#1


26  

As @PauloBu stated, the r string prefix is not specifically related to regex's, but to strings generally in Python.

正如@PauloBu所指出的,r字符串前缀与regex没有特别的关系,而是与Python中的一般字符串有关。

Normal strings use the backslash character as an escape character for special characters (like newlines):

普通字符串使用反斜线字符作为特殊字符(如换行)的转义字符:

>>> print 'this is \n a test'
this is 
 a test

The r prefix tells the interpreter not to do this:

r前缀告诉解释器不要这样做:

>>> print r'this is \n a test'
this is \n a test
>>> 

This is important in regular expressions, as you need the backslash to make it to the re module intact - in particular, \b matches empty string specifically at the start and end of a word. re expects the string \b, however normal string interpretation '\b' is converted to the ASCII backspace character, so you need to either explicitly escape the backslash ('\\b'), or tell python it is a raw string (r'\b').

这在正则表达式中是很重要的,因为您需要反斜杠使它完整地到达re模块—特别是,\b在单词的开头和结尾匹配空字符串。re期望字符串\b,然而普通的字符串解释'\b'被转换为ASCII回空字符,所以您需要显式转义反斜杠('\ b'),或者告诉python它是一个原始字符串(r'\b')。

>>> import re
>>> re.findall('\b', 'test') # the backslash gets consumed by the python string interpreter
[]
>>> re.findall('\\b', 'test') # backslash is explicitly escaped and is passed through to re module
['', '']
>>> re.findall(r'\b', 'test') # often this syntax is easier
['', '']

#2


6  

No, as the documentation pasted in explains the r prefix to a string indicates that the string is a raw string.

不,正如粘贴进来的文档解释的那样,字符串的r前缀表明该字符串是一个原始字符串。

Because of the collisions between Python escaping of characters and regex escaping, both of which use the back-slash \ character, raw strings provide a way to indicate to python that you want an unescaped string.

由于字符的Python转义和regex转义之间的冲突(这两个转义都使用反斜杠\字符),原始字符串提供了一种方法来向Python表明您想要一个未转义字符串。

Examine the following:

检查以下几点:

>>> "\n"
'\n'
>>> r"\n"
'\\n'
>>> print "\n"


>>> print r"\n"
\n

Prefixing with an r merely indicates to the string that backslashes \ should be treated literally and not as escape characters for python.

以r开头的前缀只是向字符串表示,反斜杠\应该按照字面意思处理,而不是作为python的转义字符。

This is helpful, when for example you are searching on a word boundry. The regex for this is \b, however to capture this in a Python string, I'd need to use "\\b" as the pattern. Instead, I can use the raw string: r"\b" to pattern match on.

这很有帮助,例如当你搜索一个单词boundry。它的regex是\b,但是要在Python字符串中捕获它,我需要使用“\b”作为模式。相反,我可以使用原始字符串:r“\b”来进行模式匹配。

This becomes especially handy when trying to find a literal backslash in regex. To match a backslash in regex I need to use the pattern \\, to escape this in python means I need to escape each slash and the pattern becomes "\\\\", or the much simpler r"\\".

当试图在regex中找到一个字面反斜杠时,这将变得特别方便。为了匹配regex中的反斜杠,我需要使用模式\\,以python的方式来逃避这个,这意味着我需要避免每个斜杠,而模式变成“\\\”,或者更简单的r“\\”。

As you can guess in longer and more complex regexes, the extra slashes can get confusing, so raw strings are generally considered the way to go.

正如您可以在更长的、更复杂的regex中猜测的那样,额外的斜杠可能会让人感到困惑,因此原始字符串通常被认为是正确的。

#3


2  

No. Not everything in regex syntax needs to be preceded by \, so ., *, +, etc still have special meaning in a pattern

不。并不是regex语法中的所有内容都需要在前面加上\,所以.、*、+等等在模式中仍然具有特殊的意义

The r'' is often used as a convenience for regex that do need a lot of \ as it prevents the clutter of doubling up the \

r通常被用作regex的一种便利,它确实需要很多的\,因为它可以防止将\加倍的混乱