python regular expressions re.match VS re.search

时间:2021-03-26 22:37:03

有一次需要使用Python的re模块,当我调用re方法的时候不知道该用re.match方法还是re.search方法。看来学艺不精,遂google之。

match 方法 (The match Function)

函数原型:
re.match(pattern, string, flags=0)

参数pattern就是需要匹配的正则表达式
参数string是字符串
flag是匹配标志,下面的表格是我在网上搜到的,表中列出了flag可用的值。
Modifier Description
re.I Performs case-insensitive matching.
re.L Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior (\b and \B).
re.M Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).
re.S Makes a period (dot) match any character, including a newline.
re.U Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B.
re.X Permits "cuter" regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker.

例子:
import re

line = "Cats are smarter than dogs"

matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)

if matchObj:
   print "matchObj.group() : ", matchObj.group()
   print "matchObj.group(1) : ", matchObj.group(1)
   print "matchObj.group(2) : ", matchObj.group(2)
else:
   print "No match!!"

输出:
matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

search 方法 (The search Function)


函数原型:
re.search(pattern, string, flags=0)

这个方法不管是原型还是参数含义都与re.match方法一致。这里不过多描述。

例子:
import re

line = "Cats are smarter than dogs";

searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)

if searchObj:
   print "searchObj.group() : ", searchObj.group()
   print "searchObj.group(1) : ", searchObj.group(1)
   print "searchObj.group(2) : ", searchObj.group(2)
else:
   print "Nothing found!!"

输出结果:
matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

Matching vs Searching:

对比上面re.match和re.search,他们是无比的相像,输出结果也一致。
那么他们有什么区别呢?
re.match方法从字符串的开头开始匹配。而re.search方法可以从字符串的任意地方匹配。
例子:
import re

line = "Cats are smarter than dogs";

matchObj = re.match( r'dogs', line, re.M|re.I)
if matchObj:
   print "match --> matchObj.group() : ", matchObj.group()
else:
   print "No match!!"

searchObj = re.search( r'dogs', line, re.M|re.I)
if searchObj:
   print "search --> searchObj.group() : ", searchObj.group()
else:
   print "Nothing found!!"

输出:
No match!!
search --> matchObj.group() :  dogs

从上面的例子我们可以总结出:如果我们是要在大量文本中查找固定模式的字符串,那么应该使用re.search方法。事实上,我们大部分工作都是时用的查找功能。所以我建议使用Python中的re模块时候尽量使用re.search方法。

既然说到re模块中的re.match方法和re.search方法,那么索性就在说说re模块中另外一个很重要的函数:
re.sub(pattern, repl, string, max=0)
简介一下功能:在string中,用repl替换pattern匹配到的字符串。
例子:
import re

phone = "2004-959-559 # This is Phone Number"

# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print "Phone Num : ", num

# Remove anything other than digits
num = re.sub(r'\D', "", phone)    
print "Phone Num : ", num

输出结果:
Phone Num :  2004-959-559
Phone Num :  2004959559