从django查询获取非唯一值

I'm writing a script where I want to get every occurrence of a value, from visited sites.

我正在编写一个脚本，在这个脚本中，我希望从访问的站点中获取每个值的出现。

First I get sites visited:

首先，我访问了一些网站:

sd = SessionData.objects.filter(session_id__mlsession__platform__exact=int('2'))
result =  sd.values('last_page')

I then get the values that I'm expecting:

然后我得到我期望的值:

[{'last_page': 10L}, {'last_page': 4L}, {'last_page': 10L}]

With that, I want the page with 10L as an id to have double the weight of 4L, since it's appearing two times.

有了这个，我想要以10L为id的页面的重量是4L的两倍，因为它出现了两次。

I try to get the values from the list:

我试图从列表中获取值:

wordData = KeywordData.objects.filter(page_id__in=result)

but then I only get unique values:

但我只得到唯一的值:

[<KeywordData: 23>, <KeywordData: 24>, <KeywordData: 8>]

where my wanted outcome would be:

我想要的结果是:

[<KeywordData: 23>, <KeywordData: 24>, <KeywordData: 8>, <KeywordData: 23>, <KeywordData: 24>]

The only way I've managed to not get a unique list is by iterating through a for-loop but that isn't really an option since the data I'm dealing with has millions of entries.

唯一的方法是，我没有得到唯一的列表，就是通过for循环迭代，但这并不是一个真正的选择，因为我处理的数据有数百万个条目。

Is the "__in" filter in django made to only return unique entries? Is there a way that I can get the right output the "django"-way?

django中的“__in”过滤器是否只返回惟一的条目?有没有一种方法可以让我以django的方式获得正确的输出?

Thank you in advance for your help!

感谢您的帮助!

EDIT: The relevant models:

编辑:相关的模型:

class KeywordData(models.Model):
    page = models.ForeignKey(Page, db_column='page_id', related_name='page_pageid', default=None)
    site = models.ForeignKey(Page, db_column='site_id', related_name='page_siteid', default=None)
    keywords = models.CharField(max_length=255, blank=True, null=True, default=None)

class MLSession(models.Model):
    session = models.ForeignKey(Session, null=True, db_column='session_id')
    platform = models.IntegerField(choices=PLATFORM_CHOICE)
    visitor_type = models.IntegerField(default=1)

class SessionData(models.Model):
    session = models.ForeignKey(Session, db_column='session_id', on_delete=models.CASCADE)
    site = models.ForeignKey(Site, db_column='site_id', db_index=True, default=None, null=True)
    last_page = models.ForeignKey(Page, db_column='last_page_id', default=None, null=True, related_name='session_last_page')
    first_page = models.ForeignKey(Page, db_column='first_page_id', default=None, null=True, related_name='session_first_page')

The tables Session and Page are only referred to in terms of their ids, which are auto-incremented.

表会话和页面仅根据它们的id进行引用，这些id是自动递增的。

I want to look at the last page of the session, thus only taking in the last_page_id, and get the keywords from the respective page. If the same page is often the last page, I want to add more weight, as previously stated.

我想查看会话的最后一页，因此只接受last_page_id，并从相应的页面中获取关键字。如果同一页通常是最后一页，我想添加更多的权重，如前所述。

Let me know if some more information is needed, and thanks again!

如果需要更多的信息，请告诉我，再次感谢!

2 个解决方案

#1

Is the "__in" filter in django made to only return unique entries?

django中的“__in”过滤器是否只返回惟一的条目?

The __in filter in Django maps directly to the IN condition in SQL, and its behavior is as you've observed.

Django中的__in过滤器直接映射到SQL中的in条件，其行为如您所见。

If you want duplicate rows you should probably reframe your query as an SQL JOIN. You didn't post your models so I'm forced to guess here, but the following Django query should give you what you want:

如果您想要重复的行，您应该将查询重新构建为SQL连接。您没有发布您的模型，所以我不得不在这里进行猜测，但是下面的Django查询应该会给出您想要的结果:

KeywordData.objects.filter(page__session_last_page__session_id__mlsession__platform=2)

#2

Create a dictionary of keywords keyed by the page id:

根据页面id创建关键字字典:

from collections = defaultdict

result =  sd.values_list('last_page', flat=True)
keywords_by_page_id = defaultdict(list)
for k in KeywordData.objects.filter(page_id__in=result):
    keywords_by_page_id[k.page_id].append(k)

Then loop through the result to build your required output.

然后循环遍历结果以构建所需的输出。

out = []
for x in results:
    out += keywords_by_page_id[x]

#1