python regular expressions re.match VS re.search

有一次需要使用Python的re模块，当我调用re方法的时候不知道该用re.match方法还是re.search方法。看来学艺不精，遂google之。

match 方法（The match Function）

函数原型：

re.match(pattern, string, flags=0)

参数pattern就是需要匹配的正则表达式

参数string是字符串

flag是匹配标志，下面的表格是我在网上搜到的，表中列出了flag可用的值。

Modifier	Description
re.I	Performs case-insensitive matching.
re.L	Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior (\b and \B).
re.M	Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).
re.S	Makes a period (dot) match any character, including a newline.
re.U	Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B.
re.X	Permits "cuter" regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker.

例子：

import re

line = "Cats are smarter than dogs"

matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)

if matchObj:
   print "matchObj.group() : ", matchObj.group()
   print "matchObj.group(1) : ", matchObj.group(1)
   print "matchObj.group(2) : ", matchObj.group(2)
else:
   print "No match!!"

输出：

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

search 方法（The search Function）

函数原型：

re.search(pattern, string, flags=0)

这个方法不管是原型还是参数含义都与re.match方法一致。这里不过多描述。

例子：

import re

line = "Cats are smarter than dogs";

searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)

if searchObj:
   print "searchObj.group() : ", searchObj.group()
   print "searchObj.group(1) : ", searchObj.group(1)
   print "searchObj.group(2) : ", searchObj.group(2)
else:
   print "Nothing found!!"

输出结果：

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

Matching vs Searching:

对比上面re.match和re.search，他们是无比的相像，输出结果也一致。

那么他们有什么区别呢？

re.match方法从字符串的开头开始匹配。而re.search方法可以从字符串的任意地方匹配。

例子：

import re

line = "Cats are smarter than dogs";

matchObj = re.match( r'dogs', line, re.M|re.I)
if matchObj:
   print "match --> matchObj.group() : ", matchObj.group()
else:
   print "No match!!"

searchObj = re.search( r'dogs', line, re.M|re.I)
if searchObj:
   print "search --> searchObj.group() : ", searchObj.group()
else:
   print "Nothing found!!"

输出：

No match!!
search --> matchObj.group() :  dogs

从上面的例子我们可以总结出：如果我们是要在大量文本中查找固定模式的字符串，那么应该使用re.search方法。事实上，我们大部分工作都是时用的查找功能。所以我建议使用Python中的re模块时候尽量使用re.search方法。

既然说到re模块中的re.match方法和re.search方法，那么索性就在说说re模块中另外一个很重要的函数：

re.sub(pattern, repl, string, max=0)

简介一下功能：在string中，用repl替换pattern匹配到的字符串。

例子：

import re

phone = "2004-959-559 # This is Phone Number"

# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print "Phone Num : ", num

# Remove anything other than digits
num = re.sub(r'\D', "", phone)    
print "Phone Num : ", num

输出结果：

Phone Num :  2004-959-559
Phone Num :  2004959559

秒客网

python regular expressions re.match VS re.search

match 方法（The match Function）

search 方法（The search Function）

Matching vs Searching:

相关文章

python regular expressions re.match VS re.search

match 方法 （The match Function）

search 方法 （The search Function）

Matching vs Searching:

相关文章

match 方法（The match Function）

search 方法（The search Function）