re.search和re.match有什么区别？

What is the difference between the search() and match() functions in the Python re module?

Python re模块中的search()和match()函数有什么区别?

I've read the documentation (current documentation), but I never seem to remember it. I keep having to look it up and re-learn it. I'm hoping that someone will answer it clearly with examples so that (perhaps) it will stick in my head. Or at least I'll have a better place to return with my question and it will take less time to re-learn it.

我已经阅读了文档(当前文档),但我似乎永远不会记住它。我不得不查阅并重新学习它。我希望有人会用例子清楚地回答它,以便(也许)它会坚持到底。或者至少我会有一个更好的地方回答我的问题,重新学习它将花费更少的时间。

7 个解决方案

#1

374

re.match is anchored at the beginning of the string. That has nothing to do with newlines, so it is not the same as using ^ in the pattern.

re.match锚定在字符串的开头。这与换行符无关,因此它与在模式中使用^不同。

As the re.match documentation says:

正如re.match文档所说:

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.

如果字符串开头的零个或多个字符与正则表达式模式匹配,则返回相应的MatchObject实例。如果字符串与模式不匹配,则返回None;请注意,这与零长度匹配不同。

Note: If you want to locate a match anywhere in string, use search() instead.

注意:如果要在字符串中的任何位置找到匹配项,请改用search()。

re.search searches the entire string, as the documentation says:

re.search搜索整个字符串,如文档所述:

Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

扫描字符串,查找正则表达式模式生成匹配的位置,并返回相应的MatchObject实例。如果字符串中没有位置与模式匹配,则返回None;请注意,这与在字符串中的某个点找到零长度匹配不同。

So if you need to match at the beginning of the string, or to match the entire string use match. It is faster. Otherwise use search.

因此,如果您需要在字符串的开头匹配,或者匹配整个字符串使用匹配。它更快。否则使用搜索。

The documentation has a specific section for match vs. search that also covers multiline strings:

该文档有一个匹配与搜索的特定部分,也涵盖了多行字符串:

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).

Python提供了两种基于正则表达式的不同原语操作:匹配仅在字符串的开头检查匹配,而搜索在字符串中的任何位置检查匹配(这是Perl默认执行的操作)。

Note that match may differ from search even when using a regular expression beginning with '^': '^' matches only at the start of the string, or in MULTILINE mode also immediately following a newline. The “match” operation succeeds only if the pattern matches at the start of the string regardless of mode, or at the starting position given by the optional pos argument regardless of whether a newline precedes it.

请注意,即使使用以“^”开头的正则表达式,匹配也可能与搜索不同:“^”仅在字符串的开头匹配,或者在紧跟在换行符后的MULTILINE模式中匹配。只有当模式在字符串的开头匹配时,“匹配”操作才会成功,无论模式如何,或者在可选的pos参数给出的起始位置,无论新行是否在它之前。

Now, enough talk. Time to see some example code:

现在,足够的谈话。是时候看一些示例代码:

# example code:string_with_newlines = """somethingsomeotherthing"""import reprint re.match('some', string_with_newlines) # matchesprint re.match('someother',                string_with_newlines) # won't matchprint re.match('^someother', string_with_newlines,                re.MULTILINE) # also won't matchprint re.search('someother',                 string_with_newlines) # finds somethingprint re.search('^someother', string_with_newlines,                 re.MULTILINE) # also finds somethingm = re.compile('thing$', re.MULTILINE)print m.match(string_with_newlines) # no matchprint m.match(string_with_newlines, pos=4) # matchesprint m.search(string_with_newlines,                re.MULTILINE) # also matches

#2

search ⇒ find something anywhere in the string and return a match object.

搜索⇒在字符串中的任何位置查找并返回匹配对象。

match ⇒ find something at the beginning of the string and return a match object.

匹配⇒在字符串的开头找到一些东西并返回一个匹配对象。

#3

re.search searches for the pattern throughout the string, whereas re.match does not search the pattern; if it does not, it has no other choice than to match it at start of the string.

re.search在整个字符串中搜索模式,而re.match不搜索模式;如果没有,除了在字符串的开头匹配它,别无选择。

#4

you can refer the below example to understand the working of re.match and re.search

您可以参考以下示例来了解re.match和re.search的工作情况

a = "123abc"t = re.match("[a-z]+",a)t = re.search("[a-z]+",a)

re.match will return none, but re.search will return abc.

re.match将不返回任何内容,但re.search将返回abc。

#5

The difference is, re.match() misleads anyone accustomed to Perl, grep, or sed regular expression matching, and re.search() does not. :-)

区别在于,re.match()误导任何习惯于Perl,grep或sed正则表达式匹配的人,而re.search()则不会。 :-)

More soberly, As John D. Cook remarks, re.match() "behaves as if every pattern has ^ prepended." In other words, re.match('pattern') equals re.search('^pattern'). So it anchors a pattern's left side. But it also doesn't anchor a pattern's right side: that still requires a terminating $.

更清醒的是,正如John D. Cook所言,re.match()“表现得好像每个模式都具有前缀。”换句话说,re.match('pattern')等于re.search('^ pattern')。所以它锚定了一个模式的左侧。但它也没有锚定模式的右侧:仍然需要终止$。

Frankly given the above, I think re.match() should be deprecated. I would be interested to know reasons it should be retained.

坦率地说,鉴于上述情况,我认为应该弃用re.match()。我很想知道它应该保留的原因。

#6

re.match attempts to match a pattern at the beginning of the string. re.search attempts to match the pattern throughout the string until it finds a match.

re.match尝试匹配字符串开头的模式。 re.search尝试在整个字符串中匹配模式,直到找到匹配项。

#7

match is much faster than search, so instead of doing regex.search("word") you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples.

匹配比搜索快得多,所以你可以做regex.match((。*?)word(。*?))而不是做regex.search(“word”),如果你正在使用数百万的样本。

This comment from @ivan_bilan under the accepted answer above got me thinking if such hack is actually speeding anything so let's find out how many tons of performance you will really gain.

来自@ivan_bilan的评论根据上面接受的答案让我想到如果这样的黑客实际上是在加速任何事情那么让我们找出你真正获得多少吨的表现。

I prepared the following test suite:

我准备了以下测试套件:

import randomimport reimport stringimport timeLENGTH = 10LIST_SIZE = 1000000def generate_word():    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]    word = ''.join(word)    return wordwordlist = [generate_word() for _ in range(LIST_SIZE)]start = time.time()[re.search('python', word) for word in wordlist]print('search:', time.time() - start)start = time.time()[re.match('(.*?)python(.*?)', word) for word in wordlist]print('match:', time.time() - start)

I made 10 measurements (1M, 2M, ..., 10M words) which gave me the following plot:

我做了10次测量(1M,2M,......,10M字)给了我以下图:

The resulting lines are surprisingly (actually not that surprisingly) straight. And the search function is (slightly) faster given this specific pattern combination. The moral of this test: Avoid overoptimizing your code.

由此产生的线条令人惊讶地(实际上并不令人惊讶地)直线。鉴于这种特定的模式组合,搜索功能(稍微)更快。这个测试的道德:避免过度优化您的代码。

#1

374