使用有序相关模型的第一个值来注释QuerySet

时间:2021-07-14 18:04:16

I have a QuerySet of some objects. For each one, I wish to annotate with the minimum value of a related model (joined on a few conditions, ordered by date). I can express my desired results neatly in SQL, but am curious how to translate to Django's ORM.

我有一些对象的QuerySet。对于每一个,我希望用相关模型的最小值进行注释(在几个条件下加入,按日期排序)。我可以在SQL中整齐地表达我想要的结果,但很好奇如何翻译成Django的ORM。

Background

Let's say that I have two related models: Book, and BlogPost, each with a foreign key to an Author:

假设我有两个相关的模型:Book和BlogPost,每个模型都有一个作者的外键:

class Book(models.Model):
    title = models.CharField(max_length=255)
    genre = models.CharField(max_length=63)
    author = models.ForeignKey(Author)
    date_published = models.DateField()

class BlogPost(models.Model):
    author = models.ForeignKey(Author)
    date_published = models.DateField()

I'm trying to find the first mystery book that a given author published after each blog post that they write. In SQL, this can be achieved nicely with windowing.

我试图找到一个给定作者在他们写的每篇博文之后发表的第一本神秘书。在SQL中,这可以通过窗口很好地实现。

Working solution in PostgreSQL 9.6

WITH ordered AS (
  SELECT blog_post.id,
         book.title,
         ROW_NUMBER() OVER (
            PARTITION BY blog_post.id ORDER BY book.date_published
         ) AS rn
    FROM blog_post
         LEFT JOIN book ON book.author_id = blog_post.author_id
                       AND book.genre = 'mystery'
                       AND book.date_published >= blog_post.date_published
)
SELECT id,
       title
  FROM ordered
 WHERE rn = 1;

Translating to Django's ORM

While the above SQL suits my needs well (and I could use raw SQL if needed), I'm curious as to how one would do this in QuerySet. I have an existing QuerySet where I'd like to annotate it even further

虽然上面的SQL很适合我的需求(如果需要我可以使用原始SQL),但我很好奇在QuerySet中如何做到这一点。我有一个现有的QuerySet,我想进一步注释它

books = models.Book.objects.filter(...).select_related(...).prefetch_related(...)
annotated_books = books.annotate(
    most_recent_title=...
)

I'm aware that Django 2.0 supports window functions, but I'm on Django 1.10 for now.

我知道Django 2.0支持窗口函数,但我现在在Django 1.10上。

Attempted solution

I'd first built a Q object to filter down to mystery books published after the blog post.

我首先构建了一个Q对象,以过滤博客文章后发布的神秘书籍。

published_after = Q(
    author__book__date_published__gte=F('date_published'),
    author__book__genre='mystery'
)

From here, I attempted to piece together django.db.models.Min and additional F objects to acheive my desired results, but with no success.

从这里开始,我试图拼凑django.db.models.Min和其他F对象来实现我想要的结果,但没有成功。

Note: Django 2.0 introduces window expressions, but I'm currently on Django 1.10, and curious how one would do this with the QuerySet features available there.

注意:Django 2.0引入了窗口表达式,但我目前正在使用Django 1.10,并且很好奇如何使用那里提供的QuerySet功能。

2 个解决方案

#1


4  

Perhaps using .raw isn't such a bad idea. Checking the code for Window class we can see that essentially composes an SQL query to achieve the "Windowing".

也许使用.raw并不是一个坏主意。检查Window类的代码,我们可以看到实际上组成了一个SQL查询来实现“Windowing”。

An easy way out may be the usage of the architect module which can add partition functionality for PostgreSQL according to the documentation.

一个简单的方法可能是使用架构师模块,它可以根据文档为PostgreSQL添加分区功能。

Another module that claims to inject Window functionality to Django < 2.0 is the django-query-builder which adds a partition_by() queryset method and can be used with order_by:

另一个声称向Django <2.0注入Window功能的模块是django-query-builder,它添加了partition_by()查询集方法,可以与order_by一起使用:

query = Query().from_table(
    Order,
    ['*', RowNumberField(
              'revenue', 
              over=QueryWindow().order_by('margin')
                                .partition_by('account_id')
          )
    ]
)
query.get_sql()
# SELECT tests_order.*, ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY margin ASC) AS revenue_row_number
# FROM tests_order

Finally, you can always copy the Window class source code in your project or use this alternate Window class code.

最后,您始终可以在项目中复制Window类源代码或使用此备用Window类代码。

#2


3  

Your apparent problem is that Django 1.10 is too old to handle window functions properly (which have been around for a very long time already).

你明显的问题是Django 1.10太旧了,无法正常处理窗口函数(已经存在了很长时间)。

That problem goes away if you rewrite your query without window function.

如果您在没有窗口函数的情况下重写查询,那么问题就会消失。

3 equivalent queries

Which of them is fastest depends on available indexes and data distribution. But each of them should be faster than your original.

其中哪一个最快取决于可用的索引和数据分布。但是每一个都应该比你原来的更快。

1. With DISTINCT ON:

1.使用DISTINCT ON:

SELECT DISTINCT ON (p.id)
       p.id, b.title
FROM   blog_post p
LEFT   JOIN book b ON b.author_id = p.author_id
                  AND b.genre = 'mystery'
                  AND b.date_published >= p.date_published
ORDER  BY p.id, b.date_published;

Related, with detailed explanation:

相关,详细说明:

2. With a LATERAL subquery (requires Postgres 9.3 or later):

2.使用LATERAL子查询(需要Postgres 9.3或更高版本):

SELECT p.id, b.title
FROM   blog_post p
LEFT   JOIN LATERAL (
   SELECT title
   FROM   book 
   WHERE  author_id = p.author_id
   AND    genre = 'mystery'
   AND    date_published >= p.date_published
   ORDER  BY date_published
   LIMIT  1
   ) b ON true;
-- ORDER BY p.id  -- optional

Related, with detailed explanation:

相关,详细说明:

3. Or simpler, yet, with a correlated subquery:

3.或者更简单,但是,使用相关的子查询:

SELECT p.id
     ,(SELECT title
       FROM   book 
       WHERE  author_id = p.author_id
       AND    genre = 'mystery'
       AND    date_published >= p.date_published
       ORDER  BY date_published
       LIMIT  1)
FROM   blog_post p;
-- ORDER BY p.id  -- optional

Each should be translated easily to Django syntax. You might also just use the raw SQL, that's what is sent to the Postgres server anyway.

每个都应该很容易翻译成Django语法。您也可以只使用原始SQL,这就是发送到Postgres服务器的内容。

#1


4  

Perhaps using .raw isn't such a bad idea. Checking the code for Window class we can see that essentially composes an SQL query to achieve the "Windowing".

也许使用.raw并不是一个坏主意。检查Window类的代码,我们可以看到实际上组成了一个SQL查询来实现“Windowing”。

An easy way out may be the usage of the architect module which can add partition functionality for PostgreSQL according to the documentation.

一个简单的方法可能是使用架构师模块,它可以根据文档为PostgreSQL添加分区功能。

Another module that claims to inject Window functionality to Django < 2.0 is the django-query-builder which adds a partition_by() queryset method and can be used with order_by:

另一个声称向Django <2.0注入Window功能的模块是django-query-builder,它添加了partition_by()查询集方法,可以与order_by一起使用:

query = Query().from_table(
    Order,
    ['*', RowNumberField(
              'revenue', 
              over=QueryWindow().order_by('margin')
                                .partition_by('account_id')
          )
    ]
)
query.get_sql()
# SELECT tests_order.*, ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY margin ASC) AS revenue_row_number
# FROM tests_order

Finally, you can always copy the Window class source code in your project or use this alternate Window class code.

最后,您始终可以在项目中复制Window类源代码或使用此备用Window类代码。

#2


3  

Your apparent problem is that Django 1.10 is too old to handle window functions properly (which have been around for a very long time already).

你明显的问题是Django 1.10太旧了,无法正常处理窗口函数(已经存在了很长时间)。

That problem goes away if you rewrite your query without window function.

如果您在没有窗口函数的情况下重写查询,那么问题就会消失。

3 equivalent queries

Which of them is fastest depends on available indexes and data distribution. But each of them should be faster than your original.

其中哪一个最快取决于可用的索引和数据分布。但是每一个都应该比你原来的更快。

1. With DISTINCT ON:

1.使用DISTINCT ON:

SELECT DISTINCT ON (p.id)
       p.id, b.title
FROM   blog_post p
LEFT   JOIN book b ON b.author_id = p.author_id
                  AND b.genre = 'mystery'
                  AND b.date_published >= p.date_published
ORDER  BY p.id, b.date_published;

Related, with detailed explanation:

相关,详细说明:

2. With a LATERAL subquery (requires Postgres 9.3 or later):

2.使用LATERAL子查询(需要Postgres 9.3或更高版本):

SELECT p.id, b.title
FROM   blog_post p
LEFT   JOIN LATERAL (
   SELECT title
   FROM   book 
   WHERE  author_id = p.author_id
   AND    genre = 'mystery'
   AND    date_published >= p.date_published
   ORDER  BY date_published
   LIMIT  1
   ) b ON true;
-- ORDER BY p.id  -- optional

Related, with detailed explanation:

相关,详细说明:

3. Or simpler, yet, with a correlated subquery:

3.或者更简单,但是,使用相关的子查询:

SELECT p.id
     ,(SELECT title
       FROM   book 
       WHERE  author_id = p.author_id
       AND    genre = 'mystery'
       AND    date_published >= p.date_published
       ORDER  BY date_published
       LIMIT  1)
FROM   blog_post p;
-- ORDER BY p.id  -- optional

Each should be translated easily to Django syntax. You might also just use the raw SQL, that's what is sent to the Postgres server anyway.

每个都应该很容易翻译成Django语法。您也可以只使用原始SQL,这就是发送到Postgres服务器的内容。