什么时候在Python Regex中使用re.search而不是re.findall才有意义?

时间:2021-04-28 22:37:40

I understand the technical difference between using re.search and re.findall in Python, but would someone with more experience explain situations in which you might use re.search over just using re.findall for regex parsing?

我理解在Python中使用re.search和re.findall之间的技术差异,但是有更多经验的人会解释使用re.search而不仅仅使用re.findall进行正则表达式解析的情况吗?

1 个解决方案

#1


2  

From documentation

re.search(pattern, string, flags=0) :- Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

re.search(pattern,string,flags = 0): - 扫描字符串,查找正则表达式模式产生匹配的第一个位置,并返回相应的MatchObject实例。如果字符串中没有位置与模式匹配,则返回None;请注意,这与在字符串中的某个点找到零长度匹配不同。

i) If you just want to find whether there exists a pattern in string, you can use re.search e.g.

i)如果你只想查找字符串中是否存在模式,可以使用re.search例如

a+ in string abcdaa

will tell whether there is one or more than one a present in the string abcaa. If the match is found, it will return a MatchObject that string is found otherwise None. It won't check for any further occurrences of the pattern. So, if you use re.search('a+', 'abcdaa').group(0) you will only get a for string abcdaa

将判断字符串abcaa中是否存在一个或多个存在。如果找到匹配项,它将返回一个MatchObject,否则找到该字符串None。它不会检查模式的任何进一步出现。所以,如果你使用re.search('a +','abcdaa')。group(0)你将只得到一个字符串abcdaa

On the other hand, re.findall will return all matches that are found in a string, like [a, aa] for the string abcdaa. So, we can say that re.findall is python way of using g flag which finds all matches.

另一方面,re.findall将返回字符串中找到的所有匹配项,例如字符串abcdaa的[a,aa]。所以,我们可以说re.findall是使用g标志的python方式,它找到所有匹配项。

ii) One may argue that why not use re.findall to find all the matches and if the list is non-empty, then we can say that pattern exists.

ii)有人可能会争辩说为什么不使用re.findall来查找所有匹配项,如果列表是非空的,那么我们可以说模式存在。

In that case, re.findall will be (much) slower than re.search.

在这种情况下,re.findall将比re.search慢得多。

Comparison (Processor - Intel® Core™ i5-5200U CPU @ 2.20GHz × 4, Memory - 7.7 GiB)

比较(处理器 - 英特尔®酷睿™i5-5200U CPU @ 2.20GHz×4,内存 - 7.7 GiB)

On a string of size 10000000, using the following code

在大小为10000000的字符串上,使用以下代码

import re
import time

st = "".join(str(n) for n in range(10000000))

start_time = time.time()
re.search(r"1+", st)
first_time = time.time()
print("Time taken by re.search = ", first_time - start_time, "seconds")

re.findall(r"1+", st)
second_time = time.time()
print("Time taken by re.findall = ", second_time - first_time, "seconds")

Output was

Time taken by re.search =  0.00011801719665527344 seconds
Time taken by re.findall =  1.7739462852478027 seconds

So, if we just want to know whether there is a pattern that exists in a string, its favorable to use re.search.

因此,如果我们只想知道字符串中是否存在模式,则有利于使用re.search。

#1


2  

From documentation

re.search(pattern, string, flags=0) :- Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

re.search(pattern,string,flags = 0): - 扫描字符串,查找正则表达式模式产生匹配的第一个位置,并返回相应的MatchObject实例。如果字符串中没有位置与模式匹配,则返回None;请注意,这与在字符串中的某个点找到零长度匹配不同。

i) If you just want to find whether there exists a pattern in string, you can use re.search e.g.

i)如果你只想查找字符串中是否存在模式,可以使用re.search例如

a+ in string abcdaa

will tell whether there is one or more than one a present in the string abcaa. If the match is found, it will return a MatchObject that string is found otherwise None. It won't check for any further occurrences of the pattern. So, if you use re.search('a+', 'abcdaa').group(0) you will only get a for string abcdaa

将判断字符串abcaa中是否存在一个或多个存在。如果找到匹配项,它将返回一个MatchObject,否则找到该字符串None。它不会检查模式的任何进一步出现。所以,如果你使用re.search('a +','abcdaa')。group(0)你将只得到一个字符串abcdaa

On the other hand, re.findall will return all matches that are found in a string, like [a, aa] for the string abcdaa. So, we can say that re.findall is python way of using g flag which finds all matches.

另一方面,re.findall将返回字符串中找到的所有匹配项,例如字符串abcdaa的[a,aa]。所以,我们可以说re.findall是使用g标志的python方式,它找到所有匹配项。

ii) One may argue that why not use re.findall to find all the matches and if the list is non-empty, then we can say that pattern exists.

ii)有人可能会争辩说为什么不使用re.findall来查找所有匹配项,如果列表是非空的,那么我们可以说模式存在。

In that case, re.findall will be (much) slower than re.search.

在这种情况下,re.findall将比re.search慢得多。

Comparison (Processor - Intel® Core™ i5-5200U CPU @ 2.20GHz × 4, Memory - 7.7 GiB)

比较(处理器 - 英特尔®酷睿™i5-5200U CPU @ 2.20GHz×4,内存 - 7.7 GiB)

On a string of size 10000000, using the following code

在大小为10000000的字符串上,使用以下代码

import re
import time

st = "".join(str(n) for n in range(10000000))

start_time = time.time()
re.search(r"1+", st)
first_time = time.time()
print("Time taken by re.search = ", first_time - start_time, "seconds")

re.findall(r"1+", st)
second_time = time.time()
print("Time taken by re.findall = ", second_time - first_time, "seconds")

Output was

Time taken by re.search =  0.00011801719665527344 seconds
Time taken by re.findall =  1.7739462852478027 seconds

So, if we just want to know whether there is a pattern that exists in a string, its favorable to use re.search.

因此,如果我们只想知道字符串中是否存在模式,则有利于使用re.search。