|
A|B
, where
A and
B can be arbitrary REs, creates a regular expression that will match either
A or
B. An arbitrary number of REs can be separated by the
'|'
in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by
'|'
are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once
A matches,
B will not be tested further, even if it would produce a longer overall match. In other words, the
'|'
operator is never greedy. To match a literal
'|'
, use
\|
, or enclose it inside a character class, as in
[|]
.翻译: A|B这样的一个RE表达式,其中A和B可以是任意的RE表达式,它的含义是,这整个RE表达式将使用两者中的一个表达式进行匹配。复合的RE表达式可以通过标识符
'|'
通过标识符来进行分隔如 A|B|C。这种表达式也可以放在组内如
(A|B|C)
,这个表达式表示一个组,后面可以通过group进行取出。当对目的串进行扫描时,复合RE表达式将会通过|分隔并从左到右依次匹配子表达式。当某一个子表达式匹配成功,将会接受那个分支表达式。也就是说只要A匹配成功,B就不会再进行匹配了。即使B可能会匹配更长的子串。这也意味着,通过'|' 分隔符分隔的表达式都是非贪心的。如果要匹配字符'|' ,需要使用反斜杠进行转义,或者将这个字符放入集合中如[|]
.
import re string1 = "<div>aaa</div><div>bbb</div>" print(len(string1),"stringr") rs = re.match("<div>([\d]+|[a-z]+)</div><div>([\d]+|[a-z]+)</div>",string1) print(rs.group(0)) print(rs.group(1)) print(rs.group(2))
输出
28 stringr <div>aaa</div><div>bbb</div> aaa bbb
group(0)是匹配的长度,group组是从1开始标号的(将在下一篇进行说明)。
正则匹配的整个左边和整个右边,如果要有界限可以通过括号来分隔