为什么使用re.match(),当re.search()可以做同样的事情?

时间:2021-10-15 22:34:56

From the documentation, it's very clear that:

从文档中可以清楚地看出:

  • match() -> apply pattern match at the beginning of the string
  • match() - >在字符串的开头应用模式匹配
  • search() -> search through the string and return first match
  • search() - >搜索字符串并返回第一个匹配项

And search with '^' and without re.M flag would work the same as match.

并使用'^'搜索并且没有re.M标记将与匹配相同。

Then why does python have match()? Isn't it redundant? Are there any performance benefits to keeping match() in python?

那为什么python有match()?这不是多余的吗?在python中保持match()有什么性能优势?

2 个解决方案

#1


4  

"Why" questions are hard to answer. As a matter of fact, you could define the function re.match() like this:

“为什么”这些问题很难回答。事实上,您可以像这样定义函数re.match():

def match(pattern, string, flags):
    return re.search(r"\A(?:" + pattern + ")", string, flags)

(because \A always matches at the start of the string, regardless of the re.M flag status´).

(因为\ A始终匹配字符串的开头,无论re.M标志状态如何')。

So re.match is a useful shortcut but not strictly necessary. It's especially confusing for Java programmers who have Pattern.matches() which anchors the search to the start and end of the string (which is probably a more common use case than just anchoring to the start).

所以re.match是一个有用的捷径,但并非绝对必要。对于拥有Pattern.matches()的Java程序员来说尤其令人困惑,因为Pattern.matches()将搜索锚定到字符串的开头和结尾(这可能是一个更常见的用例,而不仅仅是锚定到开头)。

It's different for the match and search methods of regex objects, though, as Eric has pointed out.

然而,正如Eric指出的那样,正则表达式对象的匹配和搜索方法不同。

#2


11  

The pos argument behaves differently in important ways:

pos参数在重要方面表现不同:

>>> s = "a ab abc abcd"
>>> re.compile('a').match(s, pos=2)
<_sre.SRE_Match object; span=(2, 3), match='a'>
>>> re.compile('^a').search(s, pos=2)
None

match makes it possible to write a tokenizer, and ensure that characters are never skipped. search has no way of saying "start from the earliest allowable character".

match使得编写tokenizer成为可能,并确保永远不会跳过字符。搜索无法说“从最早的允许角色开始”。

Example use of match to break up a string with no gaps:

使用match来分解没有间隙的字符串的示例:

def tokenize(s, patt):
    at = 0
    while at < len(s):
        m = patt.match(s, pos=at)
        if not m:
            raise ValueError("Did not expect character at location {}".format(at))
        at = m.end()
        yield m

#1


4  

"Why" questions are hard to answer. As a matter of fact, you could define the function re.match() like this:

“为什么”这些问题很难回答。事实上,您可以像这样定义函数re.match():

def match(pattern, string, flags):
    return re.search(r"\A(?:" + pattern + ")", string, flags)

(because \A always matches at the start of the string, regardless of the re.M flag status´).

(因为\ A始终匹配字符串的开头,无论re.M标志状态如何')。

So re.match is a useful shortcut but not strictly necessary. It's especially confusing for Java programmers who have Pattern.matches() which anchors the search to the start and end of the string (which is probably a more common use case than just anchoring to the start).

所以re.match是一个有用的捷径,但并非绝对必要。对于拥有Pattern.matches()的Java程序员来说尤其令人困惑,因为Pattern.matches()将搜索锚定到字符串的开头和结尾(这可能是一个更常见的用例,而不仅仅是锚定到开头)。

It's different for the match and search methods of regex objects, though, as Eric has pointed out.

然而,正如Eric指出的那样,正则表达式对象的匹配和搜索方法不同。

#2


11  

The pos argument behaves differently in important ways:

pos参数在重要方面表现不同:

>>> s = "a ab abc abcd"
>>> re.compile('a').match(s, pos=2)
<_sre.SRE_Match object; span=(2, 3), match='a'>
>>> re.compile('^a').search(s, pos=2)
None

match makes it possible to write a tokenizer, and ensure that characters are never skipped. search has no way of saying "start from the earliest allowable character".

match使得编写tokenizer成为可能,并确保永远不会跳过字符。搜索无法说“从最早的允许角色开始”。

Example use of match to break up a string with no gaps:

使用match来分解没有间隙的字符串的示例:

def tokenize(s, patt):
    at = 0
    while at < len(s):
        m = patt.match(s, pos=at)
        if not m:
            raise ValueError("Did not expect character at location {}".format(at))
        at = m.end()
        yield m