使用urllib3下载文件的最佳方式是什么

时间:2021-11-17 18:07:39

I would like to download file over HTTP protocol using urllib3. I have managed to do this using following code:

我想用urllib3来下载文件/ HTTP协议。我通过以下代码做到了这一点:

 url = 'http://url_to_a_file'
 connection_pool = urllib3.PoolManager()
 resp = connection_pool.request('GET',url )
 f = open(filename, 'wb')
 f.write(resp.data)
 f.close()
 resp.release_conn()

But I was wondering what is the proper way of doing this. For example will it work well for big files and If no what to do to make this code more bug tolerant and scalable.

但我想知道,怎样做才是正确的。例如,它对大文件是否有效,如果不知道如何使这些代码更具bug容错性和可扩展性。

Note. It is important to me to use urllib3 library not urllib2 for example, because I want my code to be thread safe.

请注意。使用urllib3库而不是urllib2对我来说很重要,因为我希望我的代码是线程安全的。

2 个解决方案

#1


16  

Your code snippet is close. Two things worth noting:

您的代码片段很接近。有两件事值得注意:

  1. If you're using resp.data, it will consume the entire response and return the connection (you don't need to resp.release_conn() manually). This is fine if you're cool with holding the data in-memory.

    如果你使用职责。数据将使用整个响应并返回连接(您不需要手动回复.release_conn()))。如果您不介意将数据保存在内存中,这是可以的。

  2. You could use resp.read(amt) which will stream the response, but the connection will need to be returned via resp.release_conn().

    您可以使用rep .read(amt)来流响应,但是需要通过resp.release_conn()返回连接。

This would look something like...

这看起来像…

import urllib3
http = urllib3.PoolManager()
r = http.request('GET', url, preload_content=False)

with open(path, 'wb') as out:
    while True:
        data = r.read(chunk_size)
        if not data:
            break
        out.write(data)

r.release_conn()

The documentation might be a bit lacking on this scenario. If anyone is interested in making a pull-request to improve the urllib3 documentation, that would be greatly appreciated. :)

在这个场景中,文档可能有点缺乏。如果有人有兴趣提出拉拽请求以改进urllib3文档,我们将非常感激。:)

#2


2  

The most correct way to do this is probably to get a file-like object that represents the HTTP response and copy it to a real file using shutil.copyfileobj as below:

最正确的方法可能是得到一个类似于文件的对象,它表示HTTP响应,并使用shutil将其复制到一个真实的文件中。copyfileobj如下:

url = 'http://url_to_a_file'
c = urllib3.PoolManager()

with c.request('GET',url, preload_content=False) as resp, open(filename, 'wb') as out_file:
    shutil.copyfileobj(resp, out_file)

resp.release_conn()     # not 100% sure this is required though

#1


16  

Your code snippet is close. Two things worth noting:

您的代码片段很接近。有两件事值得注意:

  1. If you're using resp.data, it will consume the entire response and return the connection (you don't need to resp.release_conn() manually). This is fine if you're cool with holding the data in-memory.

    如果你使用职责。数据将使用整个响应并返回连接(您不需要手动回复.release_conn()))。如果您不介意将数据保存在内存中,这是可以的。

  2. You could use resp.read(amt) which will stream the response, but the connection will need to be returned via resp.release_conn().

    您可以使用rep .read(amt)来流响应,但是需要通过resp.release_conn()返回连接。

This would look something like...

这看起来像…

import urllib3
http = urllib3.PoolManager()
r = http.request('GET', url, preload_content=False)

with open(path, 'wb') as out:
    while True:
        data = r.read(chunk_size)
        if not data:
            break
        out.write(data)

r.release_conn()

The documentation might be a bit lacking on this scenario. If anyone is interested in making a pull-request to improve the urllib3 documentation, that would be greatly appreciated. :)

在这个场景中,文档可能有点缺乏。如果有人有兴趣提出拉拽请求以改进urllib3文档,我们将非常感激。:)

#2


2  

The most correct way to do this is probably to get a file-like object that represents the HTTP response and copy it to a real file using shutil.copyfileobj as below:

最正确的方法可能是得到一个类似于文件的对象,它表示HTTP响应,并使用shutil将其复制到一个真实的文件中。copyfileobj如下:

url = 'http://url_to_a_file'
c = urllib3.PoolManager()

with c.request('GET',url, preload_content=False) as resp, open(filename, 'wb') as out_file:
    shutil.copyfileobj(resp, out_file)

resp.release_conn()     # not 100% sure this is required though