*
Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible.
ab*
will match ‘a’, ‘ab’, or ‘a’ followed by any number of ‘b’s.
* 代表的数量是 自然数集
+
Causes the resulting RE to match 1 or more repetitions of the preceding RE.
ab+
will match ‘a’ followed by any non-zero number of ‘b’s; it will not match just ‘a’.
+ 代表整数集合
?
Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.
ab?
will match either ‘a’ or ‘ab’.? 代表集合 【0,1】
*?
,
+?
,
??
The
'*'
,
'+'
, and
'?'
qualifiers are all
greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE
<.*>
is matched against
'<a> b <c>'
, it will match the entire string, and not just
'<a>'
. Adding
?
after the qualifier makes it perform the match in
non-greedy or
minimal fashion; as
fewcharacters as possible will be matched. Using the RE
<.*?>
will match only
'<a>'
.
前面的标识符都是所表示的集合都是贪婪地,它们尽最大的力度去匹配字符。有时候这些行为是不可取的。比如用RE表达式<.*>
可以匹配'<a> b <c>'配,这个表达式匹配整个字符串,而不是仅仅<a>添加?在前面的标识符后面可以让他们以非贪心或者最小模式进行匹配,以最小可以匹配的长度为界限。使用<.*?>
可以仅仅匹配出 '<a>'
.
案例一
import re string1 = """hello this is the end, no ,this is no the end. """ print("stringr",re.match("a*.*",string1)) print("stringr",re.match("a+.+",string1)) print("stringr",re.match("a?.+",string1)) print("stringr",re.match(".*end",string1,re.DOTALL)) print("stringr",re.match(".*?end",string1,re.DOTALL))
输出
stringr <re.Match object; span=(0, 22), match='hello this is the end,'> stringr None stringr <re.Match object; span=(0, 22), match='hello this is the end,'> stringr <re.Match object; span=(0, 45), match='hello this is the end,\nno ,this is no the end'> stringr <re.Match object; span=(0, 21), match='hello this is the end'>
我们可以从输出和条件进行对比出其中的差异。
第一条输出语句a可以出现的次数是0次,所以匹配。
第二条输出语句a必须出现一次,与匹配规则与原串没有匹配处,所以为none
第三条输出语句a可以是0可以是1,所以匹配。
第四条语句是没有使用标识符?的,匹配规则是以贪心的方式进行,最大长度匹配。
第五条输出语句添加了标识符?,以最小长度进行匹配。
与此同时同样适用的是
{m}
Specifies that exactly m copies of the previous RE should be matched; fewer matches cause the entire RE not to match. For example, a{6}
will match exactly six 'a'
characters, but not five.
{m,n}
Causes the resulting RE to match from
m to
n repetitions of the preceding RE, attempting to match as many repetitions as possible. For example,
a{3,5}
will match from 3 to 5
'a'
characters. Omitting
m specifies a lower bound of zero, and omitting
n specifies an infinite upper bound. As an example,
a{4,}b
will match
'aaaab'
or a thousand
'a'
characters followed by a
'b'
, but not
'aaab'
. The comma may not be omitted or the modifier would be confused with the previously described form.
{m,n}?
Causes the resulting RE to match from
m to
n repetitions of the preceding RE, attempting to match as
fewrepetitions as possible. This is the non-greedy version of the previous qualifier. For example, on the 6-character string
'aaaaaa'
,
a{3,5}
will match 5
'a'
characters, while
a{3,5}?
will only match 3 characters.?标识符表示非贪心匹配。