有一次需要使用Python的re模块,当我调用re方法的时候不知道该用re.match方法还是re.search方法。看来学艺不精,遂google之。
match 方法 (The match Function)
函数原型:
re.match(pattern, string, flags=0)
参数pattern就是需要匹配的正则表达式
参数string是字符串
flag是匹配标志,下面的表格是我在网上搜到的,表中列出了flag可用的值。
Modifier | Description |
---|---|
re.I | Performs case-insensitive matching. |
re.L | Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior (\b and \B). |
re.M | Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string). |
re.S | Makes a period (dot) match any character, including a newline. |
re.U | Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B. |
re.X | Permits "cuter" regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker. |
例子:
import re line = "Cats are smarter than dogs" matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I) if matchObj: print "matchObj.group() : ", matchObj.group() print "matchObj.group(1) : ", matchObj.group(1) print "matchObj.group(2) : ", matchObj.group(2) else: print "No match!!"
输出:
matchObj.group() : Cats are smarter than dogs matchObj.group(1) : Cats matchObj.group(2) : smarter
search 方法 (The search Function)
函数原型:
re.search(pattern, string, flags=0)
这个方法不管是原型还是参数含义都与re.match方法一致。这里不过多描述。
例子:
import re line = "Cats are smarter than dogs"; searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I) if searchObj: print "searchObj.group() : ", searchObj.group() print "searchObj.group(1) : ", searchObj.group(1) print "searchObj.group(2) : ", searchObj.group(2) else: print "Nothing found!!"
输出结果:
matchObj.group() : Cats are smarter than dogs matchObj.group(1) : Cats matchObj.group(2) : smarter
Matching vs Searching:
对比上面re.match和re.search,他们是无比的相像,输出结果也一致。
那么他们有什么区别呢?
re.match方法从字符串的开头开始匹配。而re.search方法可以从字符串的任意地方匹配。
例子:
import re line = "Cats are smarter than dogs"; matchObj = re.match( r'dogs', line, re.M|re.I) if matchObj: print "match --> matchObj.group() : ", matchObj.group() else: print "No match!!" searchObj = re.search( r'dogs', line, re.M|re.I) if searchObj: print "search --> searchObj.group() : ", searchObj.group() else: print "Nothing found!!"
输出:
No match!! search --> matchObj.group() : dogs
从上面的例子我们可以总结出:如果我们是要在大量文本中查找固定模式的字符串,那么应该使用re.search方法。事实上,我们大部分工作都是时用的查找功能。所以我建议使用Python中的re模块时候尽量使用re.search方法。
既然说到re模块中的re.match方法和re.search方法,那么索性就在说说re模块中另外一个很重要的函数:
re.sub(pattern, repl, string, max=0)简介一下功能:在string中,用repl替换pattern匹配到的字符串。
例子:
import re phone = "2004-959-559 # This is Phone Number" # Delete Python-style comments num = re.sub(r'#.*$', "", phone) print "Phone Num : ", num # Remove anything other than digits num = re.sub(r'\D', "", phone) print "Phone Num : ", num
输出结果:
Phone Num : 2004-959-559 Phone Num : 2004959559