Django ORM将queryset限制为仅返回数据子集

时间:2022-07-19 22:47:56

I have the following query in a Django app. The user field is a foreign key. The results may contain 1000 MyModel objects, but only for a handful of users. I'd like to limit it to 5 MyModel objects returned per user in the user__in= portion of the query. I should end up with 5*#users or less MyModel objects.

我在Django应用程序中有以下查询。用户字段是外键。结果可能包含1000个MyModel对象,但仅适用于少数用户。我想将它限制为查询的user__in =部分中每个用户返回的5个MyModel对象。我最终应该使用5 *#个用户或更少的MyModel对象。

lfs = MyModel.objects.filter(
    user__in=[some,users,here,],
    active=True,
    follow=True,
)

Either through the ORM or SQL (using Postgres) would be acceptable.

通过ORM或SQL(使用Postgres)是可以接受的。

Thanks

EDIT 2

Found a simpler way to get this done, which I've added as an answer below.

找到了一种更简单的方法来完成这项工作,我已在下面添加了答案。

EDIT

Some of the links mentioned in the comments had some good information, although none really worked with Postgres or the Django ORM. For anyone else looking for this information in the future my adaptation of the code in those other questions/asnwers is here.

评论中提到的一些链接有一些很好的信息,虽然没有一个真正适用于Postgres或Django ORM。对于将来寻找此信息的任何其他人,我在其他问题/ asnwers中对代码的修改就在这里。

To implement this is postgres 9.1, I had to create a couple functions using pgperl (which also required me to install pgperl)

要实现这一点是postgres 9.1,我必须使用pgperl创建一些函数(这也需要我安装pgperl)

CREATE OR REPLACE FUNCTION set_int_var(name text, val bigint) RETURNS bigint AS $$
    if ($_SHARED{$_[0]} = $_[1]) {
        return $_[1];
    } else {
        return $_[1];
    }
$$ LANGUAGE plperl;

CREATE OR REPLACE FUNCTION get_int_var(name text) RETURNS bigint AS $$
    return $_SHARED{$_[0]};
$$ LANGUAGE plperl;

And my final query looks something like the following

我的最终查询看起来如下所示

SELECT x.id, x.ranking, x.active, x.follow, x.user_id
FROM (
    SELECT tbl.id, tbl.active, tbl.follow, tbl.user_id,
           CASE WHEN get_int_var('user_id') != tbl.user_id
THEN
    set_int_var('rownum', 1)
ELSE
    set_int_var('rownum', get_int_var('rownum') + 1)
END AS
    ranking,
set_int_var('user_id', tbl.user_id)
FROM my_table AS tbl
WHERE tbl.active = TRUE AND tbl.follow=TRUE
ORDER BY tbl.user_id
) AS x
WHERE x.ranking <= 5
ORDER BY x.user_id
LIMIT 50

The only downside to this is that if I try to limit the users that it looks for by using user_id IN (), the whole thing breaks and it just returns every row, rather than just 5 per user.

唯一的缺点是,如果我尝试使用user_id IN()来限制它所寻找的用户,整个事情就会中断并且它只返回每一行,而不是每个用户只返回5。

2 个解决方案

#1


2  

This is what ended up working, and allowed me to only select a handful of users, or all users (by removing the AND mt.user_id IN () line).

这就是最终工作,并允许我只选择少数用户或所有用户(通过删除AND mt.user_id IN()行)。

SELECT * FROM mytable
WHERE (id, user_id, follow, active) IN (
    SELECT id, likeable, user_id, follow, active FROM mytable mt
    WHERE mt.user_id = mytable.user_id
    AND mt.user_id IN (1, 2)
    ORDER BY user_id LIMIT 5)
ORDER BY likeable

#2


-1  

I think this is what you where looking for (i didn't see it in other posts):

我认为这就是你在寻找的地方(我没有在其他帖子中看到它):

https://docs.djangoproject.com/en/dev/topics/db/queries/#limiting-querysets

In other examples, they pass from queryset to list before "slicing". If you make something like this (for example):

在其他示例中,它们在“切片”之前从查询集传递到列表。如果你做这样的事情(例如):

    lfs = MyModel.objects.filter(
        user__in=[some,users,here,],
        active=True,
        follow=True,
    )[:10]

the resulting SQL it's a query with LIMIT 10 in it's clauses.

结果SQL它是一个查询,其中包含LIMIT 10。

So, the query you are looking for would be something like this:

所以,您正在寻找的查询将是这样的:

mymodel_ids = []
for user in users:
    mymodel_5ids_for_user = (MyModel.objects.filter(
        user=user,
        active=True,
        follow=True,
    )[:5]).values_list('id', flat=True)

    mymodel_ids.extend(mymodel_5ids_for_user)

lfs = MyModel.objects.filter(id__in=mymodel_ids)

having in lfs the objects of MyModel you where looking for (5 entries per user).

在lfs中你可以找到MyModel的对象(每个用户5个条目)。

I think the number of queries is, at least, one per user and one to retrieve all MyModel objects with that filter.

我认为查询的数量至少是每个用户一个,一个用于检索具有该过滤器的所有MyModel对象。

Be aware of the order you want to filter the objects. If you change the order of "mymodel_5ids_for_user" query, the first 5 elements of the query could change.

请注意要过滤对象的顺序。如果更改“mymodel_5ids_for_user”查询的顺序,查询的前5个元素可能会更改。

#1


2  

This is what ended up working, and allowed me to only select a handful of users, or all users (by removing the AND mt.user_id IN () line).

这就是最终工作,并允许我只选择少数用户或所有用户(通过删除AND mt.user_id IN()行)。

SELECT * FROM mytable
WHERE (id, user_id, follow, active) IN (
    SELECT id, likeable, user_id, follow, active FROM mytable mt
    WHERE mt.user_id = mytable.user_id
    AND mt.user_id IN (1, 2)
    ORDER BY user_id LIMIT 5)
ORDER BY likeable

#2


-1  

I think this is what you where looking for (i didn't see it in other posts):

我认为这就是你在寻找的地方(我没有在其他帖子中看到它):

https://docs.djangoproject.com/en/dev/topics/db/queries/#limiting-querysets

In other examples, they pass from queryset to list before "slicing". If you make something like this (for example):

在其他示例中,它们在“切片”之前从查询集传递到列表。如果你做这样的事情(例如):

    lfs = MyModel.objects.filter(
        user__in=[some,users,here,],
        active=True,
        follow=True,
    )[:10]

the resulting SQL it's a query with LIMIT 10 in it's clauses.

结果SQL它是一个查询,其中包含LIMIT 10。

So, the query you are looking for would be something like this:

所以,您正在寻找的查询将是这样的:

mymodel_ids = []
for user in users:
    mymodel_5ids_for_user = (MyModel.objects.filter(
        user=user,
        active=True,
        follow=True,
    )[:5]).values_list('id', flat=True)

    mymodel_ids.extend(mymodel_5ids_for_user)

lfs = MyModel.objects.filter(id__in=mymodel_ids)

having in lfs the objects of MyModel you where looking for (5 entries per user).

在lfs中你可以找到MyModel的对象(每个用户5个条目)。

I think the number of queries is, at least, one per user and one to retrieve all MyModel objects with that filter.

我认为查询的数量至少是每个用户一个,一个用于检索具有该过滤器的所有MyModel对象。

Be aware of the order you want to filter the objects. If you change the order of "mymodel_5ids_for_user" query, the first 5 elements of the query could change.

请注意要过滤对象的顺序。如果更改“mymodel_5ids_for_user”查询的顺序,查询的前5个元素可能会更改。