For school I'm supposed to write a Python RE script that extracts IP addresses. The regular expression I'm using seems to work with re.search()
but not with re.findall()
.
对于学校,我应该编写一个提取IP地址的Python RE脚本。我正在使用的正则表达式似乎与re.search()一起使用,但不与re.findall()一起使用。
exp = "(\d{1,3}\.){3}\d{1,3}"
ip = "blah blah 192.168.0.185 blah blah"
match = re.search(exp, ip)
print match.group()
The match for that is always 192.168.0.185, but its different when I do re.findall()
它的匹配总是192.168.0.185,但当我执行re.findall()时它的不同
exp = "(\d{1,3}\.){3}\d{1,3}"
ip = "blah blah 192.168.0.185 blah blah"
matches = re.findall(exp, ip)
print matches[0]
0.
I'm wondering why re.findall()
yields 0. when re.search()
yields 192.168.0.185, since I'm using the same expression for both functions.
我想知道为什么re.findall()得到0.当re.search()产生192.168.0.185,因为我对两个函数使用相同的表达式。
And what can I do to make it so re.findall()
will actually follow the expression correctly? Or am I making some kind of mistake?
我能做些什么才能使它成为re.findall()实际上会正确地遵循表达式?还是我犯了某种错误?
2 个解决方案
#1
12
findall
returns a list of matches, and from the documentation:
findall返回匹配列表,并从文档中:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
如果模式中存在一个或多个组,则返回组列表;如果模式有多个组,这将是一个元组列表。
So, your previous expression had one group that matched 3 times in the string where the last match was 0.
因此,您之前的表达式有一个组在最后一个匹配为0的字符串中匹配3次。
To fix your problem use: exp = "(?:\d{1,3}\.){3}\d{1,3}"
; by using the non-grouping version, there is no returned groups so the match is returned in both cases.
要解决您的问题,请使用:exp =“(?:\ d {1,3} \。){3} \ d {1,3}”;通过使用非分组版本,没有返回的组,因此在两种情况下都返回匹配。
#2
3
You're only capturing the 0 in that regex, as it'll be the last one that's caught.
你只是捕获了那个正则表达式中的0,因为它将是最后一个被捕获的。
Change the expression to capture the entire IP, and the repeated part to be a non-capturing group:
更改表达式以捕获整个IP,并将重复的部分更改为非捕获组:
In [2]: ip = "blah blah 192.168.0.185 blah blah"
In [3]: exp = "((?:\d{1,3}\.){3}\d{1,3})"
In [4]: m = re.findall(exp, ip)
In [5]: m
Out[5]: ['192.168.0.185']
In [6]:
And if it helps to explain the regex:
如果它有助于解释正则表达式:
In [6]: re.compile(exp, re.DEBUG)
subpattern 1
max_repeat 3 3
subpattern None
max_repeat 1 3
in
category category_digit
literal 46
max_repeat 1 3
in
category category_digit
This explains the subpatterns. Subpattern 1 is what gets captured by findall.
这解释了子模式。子模式1是findall捕获的内容。
#1
12
findall
returns a list of matches, and from the documentation:
findall返回匹配列表,并从文档中:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
如果模式中存在一个或多个组,则返回组列表;如果模式有多个组,这将是一个元组列表。
So, your previous expression had one group that matched 3 times in the string where the last match was 0.
因此,您之前的表达式有一个组在最后一个匹配为0的字符串中匹配3次。
To fix your problem use: exp = "(?:\d{1,3}\.){3}\d{1,3}"
; by using the non-grouping version, there is no returned groups so the match is returned in both cases.
要解决您的问题,请使用:exp =“(?:\ d {1,3} \。){3} \ d {1,3}”;通过使用非分组版本,没有返回的组,因此在两种情况下都返回匹配。
#2
3
You're only capturing the 0 in that regex, as it'll be the last one that's caught.
你只是捕获了那个正则表达式中的0,因为它将是最后一个被捕获的。
Change the expression to capture the entire IP, and the repeated part to be a non-capturing group:
更改表达式以捕获整个IP,并将重复的部分更改为非捕获组:
In [2]: ip = "blah blah 192.168.0.185 blah blah"
In [3]: exp = "((?:\d{1,3}\.){3}\d{1,3})"
In [4]: m = re.findall(exp, ip)
In [5]: m
Out[5]: ['192.168.0.185']
In [6]:
And if it helps to explain the regex:
如果它有助于解释正则表达式:
In [6]: re.compile(exp, re.DEBUG)
subpattern 1
max_repeat 3 3
subpattern None
max_repeat 1 3
in
category category_digit
literal 46
max_repeat 1 3
in
category category_digit
This explains the subpatterns. Subpattern 1 is what gets captured by findall.
这解释了子模式。子模式1是findall捕获的内容。