
时间:2021-10-15 22:34:56

From the documentation, it's very clear that:


  • match() -> apply pattern match at the beginning of the string
  • match() - >在字符串的开头应用模式匹配
  • search() -> search through the string and return first match
  • search() - >搜索字符串并返回第一个匹配项

And search with '^' and without re.M flag would work the same as match.


Then why does python have match()? Isn't it redundant? Are there any performance benefits to keeping match() in python?


2 个解决方案



"Why" questions are hard to answer. As a matter of fact, you could define the function re.match() like this:


def match(pattern, string, flags):
    return re.search(r"\A(?:" + pattern + ")", string, flags)

(because \A always matches at the start of the string, regardless of the re.M flag status´).

(因为\ A始终匹配字符串的开头,无论re.M标志状态如何')。

So re.match is a useful shortcut but not strictly necessary. It's especially confusing for Java programmers who have Pattern.matches() which anchors the search to the start and end of the string (which is probably a more common use case than just anchoring to the start).


It's different for the match and search methods of regex objects, though, as Eric has pointed out.




The pos argument behaves differently in important ways:


>>> s = "a ab abc abcd"
>>> re.compile('a').match(s, pos=2)
<_sre.SRE_Match object; span=(2, 3), match='a'>
>>> re.compile('^a').search(s, pos=2)

match makes it possible to write a tokenizer, and ensure that characters are never skipped. search has no way of saying "start from the earliest allowable character".


Example use of match to break up a string with no gaps:


def tokenize(s, patt):
    at = 0
    while at < len(s):
        m = patt.match(s, pos=at)
        if not m:
            raise ValueError("Did not expect character at location {}".format(at))
        at = m.end()
        yield m



"Why" questions are hard to answer. As a matter of fact, you could define the function re.match() like this:


def match(pattern, string, flags):
    return re.search(r"\A(?:" + pattern + ")", string, flags)

(because \A always matches at the start of the string, regardless of the re.M flag status´).

(因为\ A始终匹配字符串的开头,无论re.M标志状态如何')。

So re.match is a useful shortcut but not strictly necessary. It's especially confusing for Java programmers who have Pattern.matches() which anchors the search to the start and end of the string (which is probably a more common use case than just anchoring to the start).


It's different for the match and search methods of regex objects, though, as Eric has pointed out.




The pos argument behaves differently in important ways:


>>> s = "a ab abc abcd"
>>> re.compile('a').match(s, pos=2)
<_sre.SRE_Match object; span=(2, 3), match='a'>
>>> re.compile('^a').search(s, pos=2)

match makes it possible to write a tokenizer, and ensure that characters are never skipped. search has no way of saying "start from the earliest allowable character".


Example use of match to break up a string with no gaps:


def tokenize(s, patt):
    at = 0
    while at < len(s):
        m = patt.match(s, pos=at)
        if not m:
            raise ValueError("Did not expect character at location {}".format(at))
        at = m.end()
        yield m