And if it is large...then stop the download? I don't want to download files that are larger than 12MB.
如果它很大……然后停止下载?我不想下载大于12MB的文件。
request = urllib2.Request(ep_url)
request.add_header('User-Agent',random.choice(agents))
thefile = urllib2.urlopen(request).read()
4 个解决方案
#1
19
There's no need as bobince did and drop to httplib. You can do all that with urllib directly:
不需要像bobince那样,切换到httplib。你可以直接用urllib做这些:
>>> import urllib2
>>> f = urllib2.urlopen("http://dalkescientific.com")
>>> f.headers.items()
[('content-length', '7535'), ('accept-ranges', 'bytes'), ('server', 'Apache/2.2.14'),
('last-modified', 'Sun, 09 Mar 2008 00:27:43 GMT'), ('connection', 'close'),
('etag', '"19fa87-1d6f-447f627da7dc0"'), ('date', 'Wed, 28 Oct 2009 19:59:10 GMT'),
('content-type', 'text/html')]
>>> f.headers["Content-Length"]
'7535'
>>>
If you use httplib then you may have to implement redirect handling, proxy support, and the other nice things that urllib2 does for you.
如果您使用httplib,那么您可能需要实现重定向处理、代理支持以及urllib2为您做的其他一些好处。
#2
7
You could say:
你可能会说:
maxlength= 12*1024*1024
thefile= urllib2.urlopen(request).read(maxlength+1)
if len(thefile)==maxlength+1:
raise ThrowToysOutOfPramException()
but then of course you've still read 12MB of unwanted data. If you want to minimise the risk of this happening you can check the HTTP Content-Length header, if present (it might not be). But to do that you need to drop down to httplib instead of the more general urllib.
但是当然,你还是会读12MB的不需要的数据。如果您希望最小化发生这种情况的风险,您可以检查HTTP内容长度头,如果存在(可能不是)。但是要做到这一点,你需要把它放到httplib而不是更一般的urllib。
u= urlparse.urlparse(ep_url)
cn= httplib.HTTPConnection(u.netloc)
cn.request('GET', u.path, headers= {'User-Agent': ua})
r= cn.getresponse()
try:
l= int(r.getheader('Content-Length', '0'))
except ValueError:
l= 0
if l>maxlength:
raise IAmCrossException()
thefile= r.read(maxlength+1)
if len(thefile)==maxlength+1:
raise IAmStillCrossException()
You can check the length before asking to get the file too, if you prefer. This is basically the same as above, except using the method 'HEAD'
instead of 'GET'
.
如果您愿意的话,您可以在要求获取文件之前检查长度。这与上面的基本相同,只是使用方法'HEAD'而不是'GET'。
#3
1
you can check the content-length in a HEAD request first, but be warned, this header doesn't have to be set - see How do you send a HEAD HTTP request in Python 2?
您可以首先检查HEAD请求中的内容长度,但是要注意,这个头不需要设置——看看如何在Python 2中发送HEAD HTTP请求?
#4
1
This will work if the Content-Length header is set
如果设置了Content-Length报头,这将有效。
import urllib2
req = urllib2.urlopen("http://example.com/file.zip")
total_size = int(req.info().getheader('Content-Length'))
#1
19
There's no need as bobince did and drop to httplib. You can do all that with urllib directly:
不需要像bobince那样,切换到httplib。你可以直接用urllib做这些:
>>> import urllib2
>>> f = urllib2.urlopen("http://dalkescientific.com")
>>> f.headers.items()
[('content-length', '7535'), ('accept-ranges', 'bytes'), ('server', 'Apache/2.2.14'),
('last-modified', 'Sun, 09 Mar 2008 00:27:43 GMT'), ('connection', 'close'),
('etag', '"19fa87-1d6f-447f627da7dc0"'), ('date', 'Wed, 28 Oct 2009 19:59:10 GMT'),
('content-type', 'text/html')]
>>> f.headers["Content-Length"]
'7535'
>>>
If you use httplib then you may have to implement redirect handling, proxy support, and the other nice things that urllib2 does for you.
如果您使用httplib,那么您可能需要实现重定向处理、代理支持以及urllib2为您做的其他一些好处。
#2
7
You could say:
你可能会说:
maxlength= 12*1024*1024
thefile= urllib2.urlopen(request).read(maxlength+1)
if len(thefile)==maxlength+1:
raise ThrowToysOutOfPramException()
but then of course you've still read 12MB of unwanted data. If you want to minimise the risk of this happening you can check the HTTP Content-Length header, if present (it might not be). But to do that you need to drop down to httplib instead of the more general urllib.
但是当然,你还是会读12MB的不需要的数据。如果您希望最小化发生这种情况的风险,您可以检查HTTP内容长度头,如果存在(可能不是)。但是要做到这一点,你需要把它放到httplib而不是更一般的urllib。
u= urlparse.urlparse(ep_url)
cn= httplib.HTTPConnection(u.netloc)
cn.request('GET', u.path, headers= {'User-Agent': ua})
r= cn.getresponse()
try:
l= int(r.getheader('Content-Length', '0'))
except ValueError:
l= 0
if l>maxlength:
raise IAmCrossException()
thefile= r.read(maxlength+1)
if len(thefile)==maxlength+1:
raise IAmStillCrossException()
You can check the length before asking to get the file too, if you prefer. This is basically the same as above, except using the method 'HEAD'
instead of 'GET'
.
如果您愿意的话,您可以在要求获取文件之前检查长度。这与上面的基本相同,只是使用方法'HEAD'而不是'GET'。
#3
1
you can check the content-length in a HEAD request first, but be warned, this header doesn't have to be set - see How do you send a HEAD HTTP request in Python 2?
您可以首先检查HEAD请求中的内容长度,但是要注意,这个头不需要设置——看看如何在Python 2中发送HEAD HTTP请求?
#4
1
This will work if the Content-Length header is set
如果设置了Content-Length报头,这将有效。
import urllib2
req = urllib2.urlopen("http://example.com/file.zip")
total_size = int(req.info().getheader('Content-Length'))