I'm writing a script where I want to get every occurrence of a value, from visited sites.
我正在编写一个脚本,在这个脚本中,我希望从访问的站点中获取每个值的出现。
First I get sites visited:
首先,我访问了一些网站:
sd = SessionData.objects.filter(session_id__mlsession__platform__exact=int('2'))
result = sd.values('last_page')
I then get the values that I'm expecting:
然后我得到我期望的值:
[{'last_page': 10L}, {'last_page': 4L}, {'last_page': 10L}]
With that, I want the page with 10L as an id to have double the weight of 4L, since it's appearing two times.
有了这个,我想要以10L为id的页面的重量是4L的两倍,因为它出现了两次。
I try to get the values from the list:
我试图从列表中获取值:
wordData = KeywordData.objects.filter(page_id__in=result)
but then I only get unique values:
但我只得到唯一的值:
[<KeywordData: 23>, <KeywordData: 24>, <KeywordData: 8>]
where my wanted outcome would be:
我想要的结果是:
[<KeywordData: 23>, <KeywordData: 24>, <KeywordData: 8>, <KeywordData: 23>, <KeywordData: 24>]
The only way I've managed to not get a unique list is by iterating through a for-loop but that isn't really an option since the data I'm dealing with has millions of entries.
唯一的方法是,我没有得到唯一的列表,就是通过for循环迭代,但这并不是一个真正的选择,因为我处理的数据有数百万个条目。
Is the "__in" filter in django made to only return unique entries? Is there a way that I can get the right output the "django"-way?
django中的“__in”过滤器是否只返回惟一的条目?有没有一种方法可以让我以django的方式获得正确的输出?
Thank you in advance for your help!
感谢您的帮助!
EDIT: The relevant models:
编辑:相关的模型:
class KeywordData(models.Model):
page = models.ForeignKey(Page, db_column='page_id', related_name='page_pageid', default=None)
site = models.ForeignKey(Page, db_column='site_id', related_name='page_siteid', default=None)
keywords = models.CharField(max_length=255, blank=True, null=True, default=None)
class MLSession(models.Model):
session = models.ForeignKey(Session, null=True, db_column='session_id')
platform = models.IntegerField(choices=PLATFORM_CHOICE)
visitor_type = models.IntegerField(default=1)
class SessionData(models.Model):
session = models.ForeignKey(Session, db_column='session_id', on_delete=models.CASCADE)
site = models.ForeignKey(Site, db_column='site_id', db_index=True, default=None, null=True)
last_page = models.ForeignKey(Page, db_column='last_page_id', default=None, null=True, related_name='session_last_page')
first_page = models.ForeignKey(Page, db_column='first_page_id', default=None, null=True, related_name='session_first_page')
The tables Session and Page are only referred to in terms of their ids, which are auto-incremented.
表会话和页面仅根据它们的id进行引用,这些id是自动递增的。
I want to look at the last page of the session, thus only taking in the last_page_id, and get the keywords from the respective page. If the same page is often the last page, I want to add more weight, as previously stated.
我想查看会话的最后一页,因此只接受last_page_id,并从相应的页面中获取关键字。如果同一页通常是最后一页,我想添加更多的权重,如前所述。
Let me know if some more information is needed, and thanks again!
如果需要更多的信息,请告诉我,再次感谢!
2 个解决方案
#1
1
Is the "__in" filter in django made to only return unique entries?
django中的“__in”过滤器是否只返回惟一的条目?
The __in
filter in Django maps directly to the IN
condition in SQL, and its behavior is as you've observed.
Django中的__in过滤器直接映射到SQL中的in条件,其行为如您所见。
If you want duplicate rows you should probably reframe your query as an SQL JOIN
. You didn't post your models so I'm forced to guess here, but the following Django query should give you what you want:
如果您想要重复的行,您应该将查询重新构建为SQL连接。您没有发布您的模型,所以我不得不在这里进行猜测,但是下面的Django查询应该会给出您想要的结果:
KeywordData.objects.filter(page__session_last_page__session_id__mlsession__platform=2)
#2
0
Create a dictionary of keywords keyed by the page id:
根据页面id创建关键字字典:
from collections = defaultdict
result = sd.values_list('last_page', flat=True)
keywords_by_page_id = defaultdict(list)
for k in KeywordData.objects.filter(page_id__in=result):
keywords_by_page_id[k.page_id].append(k)
Then loop through the result
to build your required output.
然后循环遍历结果以构建所需的输出。
out = []
for x in results:
out += keywords_by_page_id[x]
#1
1
Is the "__in" filter in django made to only return unique entries?
django中的“__in”过滤器是否只返回惟一的条目?
The __in
filter in Django maps directly to the IN
condition in SQL, and its behavior is as you've observed.
Django中的__in过滤器直接映射到SQL中的in条件,其行为如您所见。
If you want duplicate rows you should probably reframe your query as an SQL JOIN
. You didn't post your models so I'm forced to guess here, but the following Django query should give you what you want:
如果您想要重复的行,您应该将查询重新构建为SQL连接。您没有发布您的模型,所以我不得不在这里进行猜测,但是下面的Django查询应该会给出您想要的结果:
KeywordData.objects.filter(page__session_last_page__session_id__mlsession__platform=2)
#2
0
Create a dictionary of keywords keyed by the page id:
根据页面id创建关键字字典:
from collections = defaultdict
result = sd.values_list('last_page', flat=True)
keywords_by_page_id = defaultdict(list)
for k in KeywordData.objects.filter(page_id__in=result):
keywords_by_page_id[k.page_id].append(k)
Then loop through the result
to build your required output.
然后循环遍历结果以构建所需的输出。
out = []
for x in results:
out += keywords_by_page_id[x]