python学习之re 10 字符集合如 \d等

时间:2022-12-13 00:10:18

\number

Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) \1 matches 'the the' or '55 55', but not 'thethe' (note the space after the group). This special sequence can only be used to match one of the first 99 groups. If the first digit of number is 0, or number is 3 octal digits long, it will not be interpreted as a group match, but as the character with octal value number. Inside the '[' and ']' of a character class, all numeric escapes are treated as characters.

翻译:匹配同编号组的内容,每一个组都有一个编号,编号从1开始,比如:(.+) \1 匹配'the the' or '55 55'。但是不匹配'thethe'(提示:两个之间有空格,此处的匹配串中间没有空格)。这样特殊的序列只可以匹配99个组。如果序号是0或者是3位8进制数,都不会被认为是一个组,而是被处理为对应进制所代表的字符。在中括号包含的即合理,所有的数字都毫无意外的被当成字符处理。  

\A

Matches only at the start of the string.

翻译:仅仅匹配字符串首部,用来说明当前位置是一个字符串的首部。

import re

string1 = """hello,
world."""
print(re.match("\\A",string1))
print(re.match("h\\A",string1))
<re.Match object; span=(0, 0), match=''>
None
import re

string1 = """hello,
world."""

print(re.match("(?sm)(.*?,)\n\\A(.*?\.)",string1))
print(re.match("(?sm)\\A(.*?,)\n(.*?\.)",string1))
print(re.match("h\\A",string1))
None
<re.Match object; span=(0, 13), match='hello,\nworld.'>
None
通过对比我们可以看出,\\A在新行是没有用的,\A只匹配整个字符串的首部。

\b

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string. This means that r'\bfoo\b' matches 'foo''foo.''(foo)''bar foo baz' but not 'foobar' or 'foo3'.

By default Unicode alphanumerics are the ones used in Unicode patterns, but this can be changed by using the ASCII flag. Word boundaries are determined by the current locale if the LOCALE flag is used. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.

翻译:匹配空字符串,但是只有在单词的后面或者前面才会匹配。一个单词背定义为一组字符序列。一般来说:\b是被定义在\W和\W之间一个分界符,或者\W与字符串首部和尾部。这也就是说:REr'\bfoo\b'匹配'foo''foo.''(foo)''bar foo baz',但是不匹配'foobar' or 'foo3'.默认的Unicode字母数字都是在Unicode表达式中使用,但是也可以通过使用ASCIIflag进行切换((?A)或者re.A)。单词的分界线由归属地决定,如果使用了local flag。在单词序列中,\b代表删除符号(回溯一个字符,类比ASCII).

\B

Matches the empty string, but only when it is not at the beginning or end of a word. This means that r'py\B'matches 'python''py3''py2', but not 'py''py.', or 'py!'\B is just the opposite of \b, so word characters in Unicode patterns are Unicode alphanumerics or the underscore, although this can be changed by using the ASCII flag. Word boundaries are determined by the current locale if the LOCALE flag is used.

翻译:\B匹配空字符串,但是只有他不是在单词首部或者尾部才会匹配,也就是说r'py\B'matches 'python''py3''py2', but not 'py''py.', or 'py!'.\B与\b所表达的意思相反。 

\d

For Unicode (str) patterns:
Matches any Unicode decimal digit (that is, any character in Unicode character category [Nd]). This includes  [0-9], and also many other digit characters. If the  ASCII flag is used only  [0-9] is matched.
For 8-bit (bytes) patterns:
Matches any decimal digit; this is equivalent to  [0-9].

翻译:\d 对于Unicode规则:匹配任意Unicode十进制数包括0-9。对于8位byte字符串,仅仅只匹配0-9

import re

string1 = "1236四"
print(re.match("(?u)\d*",string1,flags=re.UNICODE))
<re.Match object; span=(0, 4), match='1236'>

通过对比我们可以出unicode模式不能匹配其他的十进制字符(个人测试结果)

\DMatches any character which is not a decimal digit. This is the opposite of \d. If the ASCII flag is used this becomes the equivalent of [^0-9].

\s

For Unicode (str) patterns:
Matches Unicode whitespace characters (which includes  [ \t\n\r\f\v], and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages). If the  ASCII flag is used, only  [ \t\n\r\f\v] is matched.
For 8-bit (bytes) patterns:
Matches characters considered whitespace in the ASCII character set; this is equivalent to  [\t\n\r\f\v].

翻译:匹配任意空格字符包括(空格,制表符,换行符,\r,\f,\v)包括许多其他的字符,比如,不同的语言分隔符也将不同,在ASCII里面,仅仅匹配[ \t\n\r\f\v]。 

\S

Matches any character which is not a whitespace character. This is the opposite of \s. If the ASCII flag is used this becomes the equivalent of [^ \t\n\r\f\v].

翻译:与\s相反

\w

For Unicode (str) patterns:
Matches Unicode word characters; this includes most characters that can be part of a word in any language, as well as numbers and the underscore. If the  ASCII flag is used, only  [a-zA-Z0-9_] is matched.
For 8-bit (bytes) patterns:

Matches characters considered alphanumeric in the ASCII character set; this is equivalent to [a-zA-Z0-9_]. If the LOCALE flag is used, matches characters considered alphanumeric in the current locale and the underscore.

翻译:匹配unicode单词字符,大多数的字符,包括其他的语言的字符,数字与下划线,如果是ASCII标记,则只匹配[a-zA-Z0-9_] 。 

\W

Matches any character which is not a word character. This is the opposite of \w. If the ASCII flag is used this becomes the equivalent of [^a-zA-Z0-9_]. If the LOCALE flag is used, matches characters considered alphanumeric in the current locale and the underscore.

翻译:与\w相反。

\Z

Matches only at the end of the string.

匹配字符串末尾。

import re

string1 = """hello,
world."""

print(re.match("(?sm).*?\\Z",string1))
print(re.match("(?sm).*?",string1))
<re.Match object; span=(0, 13), match='hello,\nworld.'>
<re.Match object; span=(0, 0), match=''>
前者是尽力而为匹配,后者表示一定要匹配到末尾。