I'm doing this to fetch some data:
我这样做是为了获取一些数据:
c = pycurl.Curl()
c.setopt(pycurl.ENCODING, 'gzip')
c.setopt(pycurl.URL, url)
c.setopt(pycurl.TIMEOUT, 10)
c.setopt(pycurl.FOLLOWLOCATION, True)
xml = StringIO()
c.setopt(pycurl.WRITEFUNCTION, xml.write )
c.perform()
c.close()
My urls are typically of this sort:
我的网址通常是这样的:
http://host/path/to/resource-foo.xml
Usually I get back 302 pointing to:
通常我会回到302指向:
http://archive-host/path/to/resource-foo.xml.gz
Given that I have set FOLLOWLOCATION, and ENCODING gzip, everything works great.
鉴于我已经设置了FOLLOWLOCATION和ENCODING gzip,一切都很好。
The problem is, sometimes I have a URL which does not result in a redirect to a gzipped resource. When this happens, c.perform()
throws this error:
问题是,有时我有一个URL,不会导致重定向到gzip压缩资源。发生这种情况时,c.perform()会抛出此错误:
pycurl.error: (61, 'Error while processing content unencoding: invalid block type')
Which suggests to me that pycurl is trying to gunzip a resource that is not gzipped.
这告诉我pycurl试图对没有压缩的资源进行gunzip。
Is there some way I can instruct pycurl to figure out the response encoding, and gunzip or not as appropriate? I have played around with using different values for ENCODING
, but so far no beans.
有没有什么方法可以指示pycurl找出响应编码,并在适当时使用gunzip?我已经玩过使用不同的ENCODING值,但到目前为止还没有bean。
The pycurl docs seems to be a little lacking. :/
pycurl文档似乎有点缺乏。 :/
thx!
1 个解决方案
#1
If worst comes to worst, you could omit the ENCODING 'gzip', set HTTPHEADER to {'Accept-Encoding' : 'gzip'}, check the response headers for "Content-Encoding: gzip" and if it's present, gunzip the response yourself.
如果最糟糕的情况发生,你可以省略ENCODING'gzip',将HTTPHEADER设置为{'Accept-Encoding':'gzip'},检查“Content-Encoding:gzip”的响应标头,如果它存在,则gunzip响应你自己。
#1
If worst comes to worst, you could omit the ENCODING 'gzip', set HTTPHEADER to {'Accept-Encoding' : 'gzip'}, check the response headers for "Content-Encoding: gzip" and if it's present, gunzip the response yourself.
如果最糟糕的情况发生,你可以省略ENCODING'gzip',将HTTPHEADER设置为{'Accept-Encoding':'gzip'},检查“Content-Encoding:gzip”的响应标头,如果它存在,则gunzip响应你自己。