I want to get an HTML page with python and then print out all the IPs from it. I will define an IP as the following:
我想获得一个带有python的HTML页面,然后从中打印出所有的IP。我将如下定义IP:
x.x.x.x:y
Where: x = a number between 0 and 256. y = a number with < 7 digits.
其中:x = 0到256之间的数字.y = <7位数。
Thanks.
5 个解决方案
#1
Right. The only part I cant do is the regular expression one. – das 9 mins ago If someone shows me that, I will be fine. – das 8 mins ago
对。我唯一能做的就是正则表达式。 - das 9分钟前如果有人告诉我,我会没事的。 - 8分钟前达斯
import re
ip = re.compile(r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?):\d{1,6}\b")
junk = " 1.1.1.1:123 2.2.2.2:321 312.123.1.12:123 "
print ip.findall(junk)
# outputs ['1.1.1.1:123', '2.2.2.2:321']
Here is a complete example:
这是一个完整的例子:
import re, urllib2
f = urllib2.urlopen("http://www.samair.ru/proxy/ip-address-01.htm")
junk = f.read()
ip = re.compile(r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?):\d{1,6}\b")
print ip.findall(junk)
# ['114.30.47.10:80', '118.228.148.83:80', '119.70.40.101:8080', '12.47.164.114:8888', '121.
# 17.161.114:3128', '122.152.183.103:80', '122.224.171.91:3128', '123.234.32.27:8080', '124.
# 107.85.115:80', '124.247.222.66:6588', '125.76.228.201:808', '128.112.139.75:3128', '128.2
# 08.004.197:3128', '128.233.252.11:3124', '128.233.252.12:3124']
#2
The basic approach would be:
基本方法是:
- Use
urllib2
to download the contents of the page - Use a regular expression to extract IPv4-like addresses
- Validate each match according to the numeric constraints on each octet
- Print out the list of matches
使用urllib2下载页面内容
使用正则表达式提取类似IPv4的地址
根据每个八位字节的数字约束验证每个匹配项
打印出匹配列表
Please provide a clearer indication of what specific part you are having trouble with, along with evidence to show what it is you've tried thus far.
请更清楚地说明您遇到问题的具体部分,以及证明您迄今为止尝试过的内容的证据。
#3
Not to turn this into a who's-a-better-regex-author-war but...
不要把这变成一个谁是一个更好的正则表达 - 作者 - 战争,但......
(\d{1,3}\.){3}\d{1,3}\:\d{1,6}
#4
Try:
re.compile("\d?\d?\d.\d?\d?\d.\d?\d?\d.\d?\d?\d:\d+").findall(urllib2.urlopen(url).read())
#5
\b(?: # A.B.C in A.B.C.D:port
(?:
25[0-5]
| 2[0-4][0-9]
| 1[0-9][0-9]
| [1-9]?[0-9]
)\.
){3}
(?: # D in A.B.C.D:port
25[0-5]
| 2[0-4][0-9]
| 1[0-9][0-9]
| [1-9]?[0-9]
)
:[1-9]\d{0,5} # port number any number in (0,999999]
\b
#1
Right. The only part I cant do is the regular expression one. – das 9 mins ago If someone shows me that, I will be fine. – das 8 mins ago
对。我唯一能做的就是正则表达式。 - das 9分钟前如果有人告诉我,我会没事的。 - 8分钟前达斯
import re
ip = re.compile(r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?):\d{1,6}\b")
junk = " 1.1.1.1:123 2.2.2.2:321 312.123.1.12:123 "
print ip.findall(junk)
# outputs ['1.1.1.1:123', '2.2.2.2:321']
Here is a complete example:
这是一个完整的例子:
import re, urllib2
f = urllib2.urlopen("http://www.samair.ru/proxy/ip-address-01.htm")
junk = f.read()
ip = re.compile(r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?):\d{1,6}\b")
print ip.findall(junk)
# ['114.30.47.10:80', '118.228.148.83:80', '119.70.40.101:8080', '12.47.164.114:8888', '121.
# 17.161.114:3128', '122.152.183.103:80', '122.224.171.91:3128', '123.234.32.27:8080', '124.
# 107.85.115:80', '124.247.222.66:6588', '125.76.228.201:808', '128.112.139.75:3128', '128.2
# 08.004.197:3128', '128.233.252.11:3124', '128.233.252.12:3124']
#2
The basic approach would be:
基本方法是:
- Use
urllib2
to download the contents of the page - Use a regular expression to extract IPv4-like addresses
- Validate each match according to the numeric constraints on each octet
- Print out the list of matches
使用urllib2下载页面内容
使用正则表达式提取类似IPv4的地址
根据每个八位字节的数字约束验证每个匹配项
打印出匹配列表
Please provide a clearer indication of what specific part you are having trouble with, along with evidence to show what it is you've tried thus far.
请更清楚地说明您遇到问题的具体部分,以及证明您迄今为止尝试过的内容的证据。
#3
Not to turn this into a who's-a-better-regex-author-war but...
不要把这变成一个谁是一个更好的正则表达 - 作者 - 战争,但......
(\d{1,3}\.){3}\d{1,3}\:\d{1,6}
#4
Try:
re.compile("\d?\d?\d.\d?\d?\d.\d?\d?\d.\d?\d?\d:\d+").findall(urllib2.urlopen(url).read())
#5
\b(?: # A.B.C in A.B.C.D:port
(?:
25[0-5]
| 2[0-4][0-9]
| 1[0-9][0-9]
| [1-9]?[0-9]
)\.
){3}
(?: # D in A.B.C.D:port
25[0-5]
| 2[0-4][0-9]
| 1[0-9][0-9]
| [1-9]?[0-9]
)
:[1-9]\d{0,5} # port number any number in (0,999999]
\b