python学习之re 5 *+? *? +? ??

时间:2021-08-30 00:12:20

*Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible.  ab* will match ‘a’, ‘ab’, or ‘a’ followed by any number of ‘b’s.

*    代表的数量是  自然数集

+Causes the resulting RE to match 1 or more repetitions of the preceding RE.  ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will not match just ‘a’.

+    代表整数集合

?Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.  ab? will match either ‘a’ or ‘ab’.?    代表集合 【0,1】

*?+???The  '*''+', and  '?' qualifiers are all  greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE  <.*>  is matched against  '<a> b <c>', it will match the entire string, and not just  '<a>'. Adding  ? after the qualifier makes it perform the match in  non-greedy or  minimal fashion; as  fewcharacters as possible will be matched. Using the RE  <.*?> will match only  '<a>'.

前面的标识符都是所表示的集合都是贪婪地,它们尽最大的力度去匹配字符。有时候这些行为是不可取的。比如用RE表达式<.*>

可以匹配'<a> b <c>'配,这个表达式匹配整个字符串,而不是仅仅<a>添加?在前面的标识符后面可以让他们以非贪心或者最小模式进行匹配,以最小可以匹配的长度为界限。使用<.*?> 可以仅仅匹配出 '<a>'.

案例一

import re
string1 = """hello this is the end,
no ,this is no the end.
"""
print("stringr",re.match("a*.*",string1))
print("stringr",re.match("a+.+",string1))
print("stringr",re.match("a?.+",string1))
print("stringr",re.match(".*end",string1,re.DOTALL))
print("stringr",re.match(".*?end",string1,re.DOTALL))

输出

stringr <re.Match object; span=(0, 22), match='hello this is the end,'>
stringr None
stringr <re.Match object; span=(0, 22), match='hello this is the end,'>
stringr <re.Match object; span=(0, 45), match='hello this is the end,\nno ,this is no the end'>
stringr <re.Match object; span=(0, 21), match='hello this is the end'>

我们可以从输出和条件进行对比出其中的差异。

第一条输出语句a可以出现的次数是0次,所以匹配。

第二条输出语句a必须出现一次,与匹配规则与原串没有匹配处,所以为none

第三条输出语句a可以是0可以是1,所以匹配。

第四条语句是没有使用标识符?的,匹配规则是以贪心的方式进行,最大长度匹配。

第五条输出语句添加了标识符?,以最小长度进行匹配。

与此同时同样适用的是

{m}

Specifies that exactly m copies of the previous RE should be matched; fewer matches cause the entire RE not to match. For example, a{6} will match exactly six 'a' characters, but not five.

{m,n}Causes the resulting RE to match from  m to  n repetitions of the preceding RE, attempting to match as many repetitions as possible. For example,  a{3,5} will match from 3 to 5  'a' characters. Omitting  m specifies a lower bound of zero, and omitting  n specifies an infinite upper bound. As an example,  a{4,}b will match  'aaaab' or a thousand  'a'  characters followed by a  'b', but not  'aaab'. The comma may not be omitted or the modifier would be confused with the previously described form. {m,n}?Causes the resulting RE to match from  m to  n repetitions of the preceding RE, attempting to match as  fewrepetitions as possible. This is the non-greedy version of the previous qualifier. For example, on the 6-character string  'aaaaaa'a{3,5} will match 5  'a' characters, while  a{3,5}? will only match 3 characters.?标识符表示非贪心匹配。