If I make a request for a file and specify encoding of gzip, how do I handle that?
如果我发出文件请求并指定gzip的编码,我该如何处理?
Normally when I have a large file I do the following:
通常,当我有一个大文件时,我会执行以下操作:
while True:
chunk = resp.read(CHUNK)
if not chunk: break
writer.write(chunk)
writer.flush()
where the CHUNK is some size in bytes, writer is an open() object and resp is the request response generated from a urllib request.
其中CHUNK的大小以字节为单位,writer是一个open()对象,resp是从urllib请求生成的请求响应。
So it's pretty simple most of the time when the response header contains 'gzip' as the returned encoding, I would do the following:
因此,当响应头包含'gzip'作为返回的编码时,大部分时间都非常简单,我会执行以下操作:
decomp = zlib.decompressobj(16+zlib.MAX_WBITS)
data = decomp.decompress(resp.read())
writer.write(data)
writer.flush()
or this:
f = gzip.GzipFile(fileobj=buf)
writer.write(f.read())
where the buf is a BytesIO().
其中buf是BytesIO()。
If I try to decompress the gzip response though, I am getting issues:
如果我尝试解压缩gzip响应,我会遇到问题:
while True:
chunk = resp.read(CHUNK)
if not chunk: break
decomp = zlib.decompressobj(16+zlib.MAX_WBITS)
data = decomp.decompress(chunk)
writer.write(data)
writer.flush()
Is there a way I can decompress the gzip data as it comes down in little chunks? or do I need to write the whole file to disk, decompress it then move it to the final file name? Part of the issue I have, using 32-bit Python, is that I can get out of memory errors.
有没有办法可以解压缩gzip数据,因为它以小块的形式出现?或者我是否需要将整个文件写入磁盘,解压缩然后将其移动到最终文件名?我使用32位Python的部分问题是我可能会出现内存错误。
Thank you
1 个解决方案
#1
2
I think I found a solution that I wish to share.
我想我找到了一个我希望分享的解决方案。
def _chunk(response, size=4096):
""" downloads a web response in pieces """
method = response.headers.get("content-encoding")
if method == "gzip":
d = zlib.decompressobj(16+zlib.MAX_WBITS)
b = resp.read(size)
while b:
data = d.decompress(b)
yield data
b = resp.read(size)
del data
else:
while True:
chunk = response.read(size)
if not chunk: break
yield chunk
If anyone has a better solution, please add to it. Basically my error was the creation of the zlib.decompressobj(). I was creating it in the wrong place.
如果有人有更好的解决方案,请添加它。基本上我的错误是创建了zlib.decompressobj()。我在错误的地方创造它。
This seems to work in both python 2 and 3 as well, so there is a plus.
这似乎在python 2和3中都有效,所以有一个加号。
#1
2
I think I found a solution that I wish to share.
我想我找到了一个我希望分享的解决方案。
def _chunk(response, size=4096):
""" downloads a web response in pieces """
method = response.headers.get("content-encoding")
if method == "gzip":
d = zlib.decompressobj(16+zlib.MAX_WBITS)
b = resp.read(size)
while b:
data = d.decompress(b)
yield data
b = resp.read(size)
del data
else:
while True:
chunk = response.read(size)
if not chunk: break
yield chunk
If anyone has a better solution, please add to it. Basically my error was the creation of the zlib.decompressobj(). I was creating it in the wrong place.
如果有人有更好的解决方案,请添加它。基本上我的错误是创建了zlib.decompressobj()。我在错误的地方创造它。
This seems to work in both python 2 and 3 as well, so there is a plus.
这似乎在python 2和3中都有效,所以有一个加号。