re.finditer和re.findall之间的不同行为

时间:2021-04-18 22:33:29

I am using the following code:

我使用以下代码:

CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>'
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
matches = pattern.finditer(mailbody)
findall = pattern.findall(mailbody)

But finditer and findall are finding different things. Findall indeed finds all the matches in the given string. But finditer only finds the first one, returning an iterator with only one element.

但是finditer和findall正在寻找不同的东西。 Findall确实找到了给定字符串中的所有匹配项。但是finditer只找到第一个,返回一个只有一个元素的迭代器。

How can I make finditer and findall behave the same way?

如何使finditer和findall的行为方式相同?

Thanks

谢谢

3 个解决方案

#1


21  

I can't reproduce this here. Have tried it with both Python 2.7 and 3.1.

我不能在这里重现这一点。尝试过使用Python 2.7和3.1。

One difference between finditer and findall is that the former returns regex match objects whereas the other returns a tuple of the matched capturing groups (or the entire match if there are no capturing groups).

finditer和findall之间的一个区别是前者返回正则表达式匹配对象,而另一个返回匹配捕获组的元组(如果没有捕获组,则返回整个匹配)。

So

所以

import re
CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>'
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
mailbody = open("test.txt").read()
for match in pattern.finditer(mailbody):
    print(match)
print()
for match in pattern.findall(mailbody):
    print(match)

prints

版画

<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>

('790', 'PR. REAL', '21:06', '04m')
('758', 'PORTAS BENFICA', '21:10', '09m')
('790', 'PR. REAL', '21:14', '13m')
('758', 'PORTAS BENFICA', '21:21', '19m')
('790', 'PR. REAL', '21:29', '28m')
('758', 'PORTAS BENFICA', '21:38', '36m')
('758', 'SETE RIOS', '21:49', '47m')
('758', 'SETE RIOS', '22:09', '68m')

If you want the same output from finditer as you're getting from findall, you need

如果你想从findall获得相同的finditer输出,你需要

for match in pattern.finditer(mailbody):
    print(tuple(match.groups()))

#2


4  

You can't make them behave the same way, because they're different. If you really want to create a list of results from finditer, then you could use a list comprehension:

你不能让他们的行为方式相同,因为他们是不同的。如果你真的想从finditer创建一个结果列表,那么你可以使用列表理解:

>>> [match for match in pattern.finditer(mailbody)]
[...]

In general, use a for loop to access the matches returned by re.finditer:

通常,使用for循环访问re.finditer返回的匹配项:

>>> for match in pattern.finditer(mailbody):
...     ...

#3


4  

re.findall(pattern.string)

re.findall(pattern.string)

findall() returns all non-overlapping matches of pattern in string as a list of strings.

findall()返回string中pattern的所有非重叠匹配作为字符串列表。

re.finditer()

re.finditer()

finditer() returns callable object.

finditer()返回可调用对象。

In both functions, the string is scanned from left to right and matches are returned in order found.

在这两个函数中,从左到右扫描字符串,并按顺序返回匹配项。

#1


21  

I can't reproduce this here. Have tried it with both Python 2.7 and 3.1.

我不能在这里重现这一点。尝试过使用Python 2.7和3.1。

One difference between finditer and findall is that the former returns regex match objects whereas the other returns a tuple of the matched capturing groups (or the entire match if there are no capturing groups).

finditer和findall之间的一个区别是前者返回正则表达式匹配对象,而另一个返回匹配捕获组的元组(如果没有捕获组,则返回整个匹配)。

So

所以

import re
CARRIS_REGEX=r'<th>(\d+)</th><th>([\s\w\.\-]+)</th><th>(\d+:\d+)</th><th>(\d+m)</th>'
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
mailbody = open("test.txt").read()
for match in pattern.finditer(mailbody):
    print(match)
print()
for match in pattern.findall(mailbody):
    print(match)

prints

版画

<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>

('790', 'PR. REAL', '21:06', '04m')
('758', 'PORTAS BENFICA', '21:10', '09m')
('790', 'PR. REAL', '21:14', '13m')
('758', 'PORTAS BENFICA', '21:21', '19m')
('790', 'PR. REAL', '21:29', '28m')
('758', 'PORTAS BENFICA', '21:38', '36m')
('758', 'SETE RIOS', '21:49', '47m')
('758', 'SETE RIOS', '22:09', '68m')

If you want the same output from finditer as you're getting from findall, you need

如果你想从findall获得相同的finditer输出,你需要

for match in pattern.finditer(mailbody):
    print(tuple(match.groups()))

#2


4  

You can't make them behave the same way, because they're different. If you really want to create a list of results from finditer, then you could use a list comprehension:

你不能让他们的行为方式相同,因为他们是不同的。如果你真的想从finditer创建一个结果列表,那么你可以使用列表理解:

>>> [match for match in pattern.finditer(mailbody)]
[...]

In general, use a for loop to access the matches returned by re.finditer:

通常,使用for循环访问re.finditer返回的匹配项:

>>> for match in pattern.finditer(mailbody):
...     ...

#3


4  

re.findall(pattern.string)

re.findall(pattern.string)

findall() returns all non-overlapping matches of pattern in string as a list of strings.

findall()返回string中pattern的所有非重叠匹配作为字符串列表。

re.finditer()

re.finditer()

finditer() returns callable object.

finditer()返回可调用对象。

In both functions, the string is scanned from left to right and matches are returned in order found.

在这两个函数中,从左到右扫描字符串,并按顺序返回匹配项。