I need to create pairs of hashtags so people can judge whether the two tags in question refer to the same thing. The problem is that there are A LOT of hashtags, and I'm running the code on a Dreamhost VPS, so my memory is somewhat limited.
我需要创建一对主题标签,以便人们可以判断这两个标签是否指向同一个标签。问题是有很多主题标签,我在Dreamhost VPS上运行代码,所以我的内存有点受限。
Here's my relevant models:
这是我的相关模型:
class Hashtag(models.Model):
text = models.CharField(max_length=140)
competitors = models.ManyToManyField('Hashtag', through='Competitors')
tweet = models.ManyToManyField('Tweet')
def __unicode__(self):
return unicode_escape(self.text)
class Competitors(models.Model):
tag1 = models.ForeignKey('Hashtag', related_name='+')
tag2 = models.ForeignKey('Hashtag', related_name='+')
yes = models.PositiveIntegerField(default=0, null=False)
no = models.PositiveIntegerField(default=0, null=False)
objects = models.Manager()
def __unicode__(self):
return u'{0} vs {1}'.format(unicode_escape(self.tag1.text), unicode_escape(self.tag2.text))
Here's the code I've developed to create the Competitors objects and save them to my DB:
这是我开发的用于创建竞争对手对象并将其保存到我的数据库的代码:
class Twitterator(object):
def __init__(self, infile=None, outfile=None, verbosity=True):
...
self.competitors_i = 1
...
def __save_comps__(self,tag1, tag2):
try:
comps = Competitors(id=self.competitors_i,
tag1=tag1,
tag2=tag2,
yes=0,
no=0)
comps.save()
except IntegrityError:
self.competitors_i += 1
self.save_comps(tag1, tag2)
else:
self.competitors_i += 1
def competitors_to_db(self, start=1):
tags = Hashtag.objects.all()
i = start
while True:
try:
tag1 = tags.get(pk=i)
j = i + 1
while True:
try:
tag2 = tags.get(pk=j)
self.__save_comps__(tag1, tag2)
j += 1
except Hashtag.DoesNotExist:
break
i += 1
except Hashtag.DoesNotExist:
break
It all "works", but never manages to get that far before I run out of memory and the whole thing gets killed. I thought using .get would be less memory-intensive, but it doesn't seem to be less memory-intensive enough. I'm under the impression that Django Querysets are iterators already, so my usual 'make an iterator' trick is out. Any suggestions for further reducing my memory footprint?
这一切都“有效”,但在我耗尽内存之前从未设法达到那么远,整个事情都被杀死了。我认为使用.get会减少内存密集,但它似乎没有足够的内存密集度。我的印象是Django Querysets已经是迭代器,所以我通常的'make a iterator'技巧就出来了。有关进一步减少内存占用的建议吗?
1 个解决方案
#1
1
I think the problem is in this function, i
is not getting incremented properly and you will keep looping for same value of i
.
我认为问题出在这个函数中,我没有正确递增,你将继续为i的相同值循环。
def competitors_to_db(self, start=1):
tags = Hashtag.objects.all()
i = start
while True:
try:
tag1 = tags.get(pk=i)
j = i + 1
while True:
try:
tag2 = tags.get(pk=j)
self.__save_comps__(tag1, tag2)
j += 1
except Hashtag.DoesNotExist:
break #<------move this after i +=1 otherwise i will not increment
i += 1
#1
1
I think the problem is in this function, i
is not getting incremented properly and you will keep looping for same value of i
.
我认为问题出在这个函数中,我没有正确递增,你将继续为i的相同值循环。
def competitors_to_db(self, start=1):
tags = Hashtag.objects.all()
i = start
while True:
try:
tag1 = tags.get(pk=i)
j = i + 1
while True:
try:
tag2 = tags.get(pk=j)
self.__save_comps__(tag1, tag2)
j += 1
except Hashtag.DoesNotExist:
break #<------move this after i +=1 otherwise i will not increment
i += 1