From the documentation, it's very clear that:
从文档中可以清楚地看出:
-
match()
-> apply pattern match at the beginning of the string - match() - >在字符串的开头应用模式匹配
-
search()
-> search through the string and return first match - search() - >搜索字符串并返回第一个匹配项
And search
with '^'
and without re.M
flag would work the same as match
.
并使用'^'搜索并且没有re.M标记将与匹配相同。
Then why does python have match()
? Isn't it redundant? Are there any performance benefits to keeping match()
in python?
那为什么python有match()?这不是多余的吗?在python中保持match()有什么性能优势?
2 个解决方案
#1
4
"Why" questions are hard to answer. As a matter of fact, you could define the function re.match()
like this:
“为什么”这些问题很难回答。事实上,您可以像这样定义函数re.match():
def match(pattern, string, flags):
return re.search(r"\A(?:" + pattern + ")", string, flags)
(because \A
always matches at the start of the string, regardless of the re.M
flag status´).
(因为\ A始终匹配字符串的开头,无论re.M标志状态如何')。
So re.match
is a useful shortcut but not strictly necessary. It's especially confusing for Java programmers who have Pattern.matches()
which anchors the search to the start and end of the string (which is probably a more common use case than just anchoring to the start).
所以re.match是一个有用的捷径,但并非绝对必要。对于拥有Pattern.matches()的Java程序员来说尤其令人困惑,因为Pattern.matches()将搜索锚定到字符串的开头和结尾(这可能是一个更常见的用例,而不仅仅是锚定到开头)。
It's different for the match
and search
methods of regex objects, though, as Eric has pointed out.
然而,正如Eric指出的那样,正则表达式对象的匹配和搜索方法不同。
#2
11
The pos
argument behaves differently in important ways:
pos参数在重要方面表现不同:
>>> s = "a ab abc abcd"
>>> re.compile('a').match(s, pos=2)
<_sre.SRE_Match object; span=(2, 3), match='a'>
>>> re.compile('^a').search(s, pos=2)
None
match
makes it possible to write a tokenizer, and ensure that characters are never skipped. search
has no way of saying "start from the earliest allowable character".
match使得编写tokenizer成为可能,并确保永远不会跳过字符。搜索无法说“从最早的允许角色开始”。
Example use of match to break up a string with no gaps:
使用match来分解没有间隙的字符串的示例:
def tokenize(s, patt):
at = 0
while at < len(s):
m = patt.match(s, pos=at)
if not m:
raise ValueError("Did not expect character at location {}".format(at))
at = m.end()
yield m
#1
4
"Why" questions are hard to answer. As a matter of fact, you could define the function re.match()
like this:
“为什么”这些问题很难回答。事实上,您可以像这样定义函数re.match():
def match(pattern, string, flags):
return re.search(r"\A(?:" + pattern + ")", string, flags)
(because \A
always matches at the start of the string, regardless of the re.M
flag status´).
(因为\ A始终匹配字符串的开头,无论re.M标志状态如何')。
So re.match
is a useful shortcut but not strictly necessary. It's especially confusing for Java programmers who have Pattern.matches()
which anchors the search to the start and end of the string (which is probably a more common use case than just anchoring to the start).
所以re.match是一个有用的捷径,但并非绝对必要。对于拥有Pattern.matches()的Java程序员来说尤其令人困惑,因为Pattern.matches()将搜索锚定到字符串的开头和结尾(这可能是一个更常见的用例,而不仅仅是锚定到开头)。
It's different for the match
and search
methods of regex objects, though, as Eric has pointed out.
然而,正如Eric指出的那样,正则表达式对象的匹配和搜索方法不同。
#2
11
The pos
argument behaves differently in important ways:
pos参数在重要方面表现不同:
>>> s = "a ab abc abcd"
>>> re.compile('a').match(s, pos=2)
<_sre.SRE_Match object; span=(2, 3), match='a'>
>>> re.compile('^a').search(s, pos=2)
None
match
makes it possible to write a tokenizer, and ensure that characters are never skipped. search
has no way of saying "start from the earliest allowable character".
match使得编写tokenizer成为可能,并确保永远不会跳过字符。搜索无法说“从最早的允许角色开始”。
Example use of match to break up a string with no gaps:
使用match来分解没有间隙的字符串的示例:
def tokenize(s, patt):
at = 0
while at < len(s):
m = patt.match(s, pos=at)
if not m:
raise ValueError("Did not expect character at location {}".format(at))
at = m.end()
yield m