使用urllib2的HttpResponse导致内存不回收(内存泄漏)

时间:2021-07-04 22:17:50
  • 问题出现环境:python 2.7.1(X)及以下, Windows(或CentOS)

这个问题产生在lib/urllib2.py的line 1174 (python 2.7.1),导致形成了cycle,即使调用gc.collect()也不能释放到HttpResponse等相关联对象(gc.garbage可以查看)

    r.recv = r.read

         fp = socket._fileobject(r, close=True)

          resp = addinfourl(fp, r.msg, req.get_full_url())

         resp.code = r.status

         resp.msg = r.reason

         return resp 

在python官方网站上很早发现了此BUG(见以下两个issues),但就是没有正式解决此问题。不过以下两个threads可以得到workarounds。

http://bugs.python.org/issue1208304

http://bugs.python.org/issue7464


  • 引申一下,如果python代码写成这样(自己写代码犯的一个错误),会导致以上相同cycle问题,从而导致内存泄漏。
 class T(object):
def __init__(self):
self.test = self.test0 def test0(self, d={}):
d['a'] = 1

在python shell运行如下:

 Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import gc
>>> gc.set_debug(gc.DEBUG_LEAK)
>>> class T(object):
... def __init__(self):
... self.test = self.test0
...
... def test0(self, d={}):
... d['a'] = 1
...
>>> t=T()
>>> del t
>>> gc.collect()
gc: collectable <T 0260D870>
gc: collectable <instancemethod 01DCFDF0>
gc: collectable <dict 0260EA50>
3
>>> for _item in gc.garbage:
... print _item
...
<__main__.T object at 0x0260D870>
<bound method T.test0 of <__main__.T object at 0x0260D870>>
{'test': <bound method T.test0 of <__main__.T object at 0x0260D870>>}

导致不能释放内存即是以上红色字体部分,可以通过调用GC自带两方法查看为什么会形成cycle。

 >>> t2=T()
>>> gc.get_referrers(t2)
[<bound method T.test0 of <__main__.T object at 0x0260D890>>, {'__builtins__': <module '__builtin__' (built-in)>, 't2': <__main__.T object at 0x0260D890>, '__package__': None, 'gc'
: <module 'gc' (built-in)>, 'T': <class '__main__.T'>, '__name__': '__main__', '__doc__': None, '_item': {'test': <bound method T.test0 of <__main__.T object at 0x0260D870>>}}]
>>> for _item in gc.get_referrers(t2):
... print _item
...
<bound method T.test0 of <__main__.T object at 0x0260D890>>
{'__builtins__': <module '__builtin__' (built-in)>, 't2': <__main__.T object at 0x0260D890>, '__package__': None, 'gc': <module 'gc' (built-in)>, 'T': <class '__main__.T'>, '__name
__': '__main__', '__doc__': None, '_item': {...}}
>>> for _item in gc.get_referents(t2):
... print _item
...
{'test': <bound method T.test0 of <__main__.T object at 0x0260D890>>}
<class '__main__.T'>
gc.get_referrers:Return the list of objects that directly refer to any of objs.
返回引用t2的对象,包括<bound method T.test0 of <__main__.T object at 0x0260D890>>对象
gc.get_referents:Return a list of objects directly referred to by any of the arguments.
返回被t2引用的对象,包括<bound method T.test0 of <__main__.T object at 0x0260D890>>对象
  • 以下情况不产生cycle:
 class T2(object):
def __init__(self):
pass def test(self):
return self.test0() def test0(self, d={}):
d['a'] = 1
class T3(object):
def __init__(self):
self.test = self.test0 @classmethod
def test0(cls, d={}):
d['a'] = 1
kkk = test0