Django - 找到每个组的极端成员

时间:2021-01-29 16:10:38

I've been playing around with the new aggregation functionality in the Django ORM, and there's a class of problem I think should be possible, but I can't seem to get it to work. The type of query I'm trying to generate is described here.

我一直在使用Django ORM中的新聚合功能,并且我认为应该可以解决一类问题,但我似乎无法让它工作。我试图生成的查询类型在这里描述。

So, let's say I have the following models -

所以,假设我有以下型号 -

class ContactGroup(models.Model):
    .... whatever ....

class Contact(models.Model):
    group = models.ForeignKey(ContactGroup)
    name = models.CharField(max_length=20)
    email = models.EmailField()
...

class Record(models.Model):
    contact = models.ForeignKey(Contact)
    group = models.ForeignKey(ContactGroup)
    record_date = models.DateTimeField(default=datetime.datetime.now)

    ... name, email, and other fields that are in Contact ...

So, each time a Contact is created or modified, a new Record is created that saves the information as it appears in the contact at that time, along with a timestamp. Now, I want a query that, for example, returns the most recent Record instance for every Contact associated to a ContactGroup. In pseudo-code:

因此,每次创建或修改联系人时,都会创建一个新记录,用于保存当时在联系人中显示的信息以及时间戳。现在,我想要一个查询,例如,返回与ContactGroup关联的每个联系人的最新Record实例。在伪代码中:

group = ContactGroup.objects.get(...)
records_i_want = group.record_set.most_recent_record_for_every_contact()

Once I get this figured out, I just want to be able to throw a filter(record_date__lt=some_date) on the queryset, and get the information as it existed at some_date.

一旦我理解了这一点,我只想在查询集上抛出一个过滤器(record_date__lt = some_date),并获取some_date中存在的信息。

Anybody have any ideas?

有人有什么想法吗?

edit: It seems I'm not really making myself clear. Using models like these, I want a way to do the following with pure django ORM (no extra()):

编辑:看来我并没有真正说清楚。使用这样的模型,我想要一种方法来使用纯django ORM(没有extra())执行以下操作:

ContactGroup.record_set.extra(where=["history_date = (select max(history_date) from app_record r where r.id=app_record.id and r.history_date <= '2009-07-18')"])

Putting the subquery in the where clause is only one strategy for solving this problem, the others are pretty well covered by the first link I gave above. I know where-clause subselects are not possible without using extra(), but I thought perhaps one of the other ways was made possible by the new aggregation features.

将子查询放在where子句中只是解决这个问题的一种策略,其他的很好地覆盖了我上面给出的第一个链接。我知道在不使用extra()的情况下where子句子选择是不可能的,但我想也许其他方法之一可以通过新的聚合功能实现。

2 个解决方案

#1


It sounds like you want to keep records of changes to objects in Django.

听起来你想在Django中记录对象的变化。

Pro Django has a section in chapter 11 (Enhancing Applications) in which the author shows how to create a model that uses another model as a client that it tracks for inserts/deletes/updates.The model is generated dynamically from the client definition and relies on signals. The code shows most_recent() function but you could adapt this to obtain the object state on a particular date.

Pro Django在第11章(增强应用程序)中有一节,其中作者展示了如何创建一个模型,该模型使用另一个模型作为它跟踪插入/删除/更新的客户端。模型是从客户端定义动态生成的并依赖于在信号上。代码显示了most_recent()函数,但您可以对此进行调整以获取特定日期的对象状态。

I assume it is the tracking in Django that is problematic, not the SQL to obtain this, right?

我认为Django中的跟踪是有问题的,而不是SQL来获取它,对吧?

#2


First of all, I'll point out that:

首先,我要指出:

ContactGroup.record_set.extra(where=["history_date = (select max(history_date) from app_record r where r.id=app_record.id and r.history_date <= '2009-07-18')"])

will not get you the same effect as:

不会给你带来同样的效果:

records_i_want = group.record_set.most_recent_record_for_every_contact()

The first query returns every record associated with a particular group (or associated with any of the contacts of a particular group) that has a record_date less than the date/ time specified in the extra. Run this on the shell and then do this to review the query django created:

第一个查询返回与特定组相关联的每个记录(或与特定组的任何联系人相关联),其记录的日期/时间小于额外的日期/时间。在shell上运行它,然后执行此操作以查看创建的查询django:

from django.db import connection
connection.queries[-1]

which reveals:

'SELECT "contacts_record"."id", "contacts_record"."contact_id", "contacts_record"."group_id", "contacts_record"."record_date", "contacts_record"."name", "contacts_record"."email" FROM "contacts_record" WHERE "contacts_record"."group_id" = 1  AND record_date = (select max(record_date) from contacts_record r where r.id=contacts_record.id and r.record_date <= \'2009-07-18\')

Not exactly what you want, right?

不完全是你想要的,对吧?

Now the aggregation feature is used to retrieve aggregated data and not objects associated with aggregated data. So if you're trying to minimize number of queries executed using aggregation when trying to obtain group.record_set.most_recent_record_for_every_contact() you won't succeed.

现在,聚合功能用于检索聚合数据,而不是与聚合数据关联的对象。因此,如果您尝试在尝试获取group.record_set.most_recent_record_for_every_contact()时尝试使用聚合执行的查询数量最小化,则您将无法成功。

Without using aggregation, you can get the most recent record for all contacts associated with a group using:

在不使用聚合的情况下,您可以使用以下方式获取与组关联的所有联系人的最新记录:

[x.record_set.all().order_by('-record_date')[0] for x in group.contact_set.all()]

Using aggregation, the closest I could get to that was:

使用聚合,我能得到的最接近的是:

group.record_set.values('contact').annotate(latest_date=Max('record_date'))

The latter returns a list of dictionaries like:

后者返回一个字典列表,如:

[{'contact': 1, 'latest_date': somedate }, {'contact': 2, 'latest_date': somedate }]

So one entry for for each contact in a given group and the latest record date associated with it.

因此,给定组中每个联系人的一个条目以及与其关联的最新记录日期。

Anyway, the minimum query number is probably 1 + # of contacts in a group. If you are interested obtaining the result using a single query, that is also possible, but you'll have to construct your models in a different way. But that's a totally different aspect of your problem.

无论如何,最小查询号可能是组中联系人的1 +#。如果您有兴趣使用单个查询获得结果,那也是可能的,但您必须以不同的方式构建模型。但这是你问题的一个完全不同的方面。

I hope this will help you understand how to approach the problem using aggregation/ the regular ORM functions.

我希望这可以帮助您了解如何使用聚合/常规ORM函数来解决问题。

#1


It sounds like you want to keep records of changes to objects in Django.

听起来你想在Django中记录对象的变化。

Pro Django has a section in chapter 11 (Enhancing Applications) in which the author shows how to create a model that uses another model as a client that it tracks for inserts/deletes/updates.The model is generated dynamically from the client definition and relies on signals. The code shows most_recent() function but you could adapt this to obtain the object state on a particular date.

Pro Django在第11章(增强应用程序)中有一节,其中作者展示了如何创建一个模型,该模型使用另一个模型作为它跟踪插入/删除/更新的客户端。模型是从客户端定义动态生成的并依赖于在信号上。代码显示了most_recent()函数,但您可以对此进行调整以获取特定日期的对象状态。

I assume it is the tracking in Django that is problematic, not the SQL to obtain this, right?

我认为Django中的跟踪是有问题的,而不是SQL来获取它,对吧?

#2


First of all, I'll point out that:

首先,我要指出:

ContactGroup.record_set.extra(where=["history_date = (select max(history_date) from app_record r where r.id=app_record.id and r.history_date <= '2009-07-18')"])

will not get you the same effect as:

不会给你带来同样的效果:

records_i_want = group.record_set.most_recent_record_for_every_contact()

The first query returns every record associated with a particular group (or associated with any of the contacts of a particular group) that has a record_date less than the date/ time specified in the extra. Run this on the shell and then do this to review the query django created:

第一个查询返回与特定组相关联的每个记录(或与特定组的任何联系人相关联),其记录的日期/时间小于额外的日期/时间。在shell上运行它,然后执行此操作以查看创建的查询django:

from django.db import connection
connection.queries[-1]

which reveals:

'SELECT "contacts_record"."id", "contacts_record"."contact_id", "contacts_record"."group_id", "contacts_record"."record_date", "contacts_record"."name", "contacts_record"."email" FROM "contacts_record" WHERE "contacts_record"."group_id" = 1  AND record_date = (select max(record_date) from contacts_record r where r.id=contacts_record.id and r.record_date <= \'2009-07-18\')

Not exactly what you want, right?

不完全是你想要的,对吧?

Now the aggregation feature is used to retrieve aggregated data and not objects associated with aggregated data. So if you're trying to minimize number of queries executed using aggregation when trying to obtain group.record_set.most_recent_record_for_every_contact() you won't succeed.

现在,聚合功能用于检索聚合数据,而不是与聚合数据关联的对象。因此,如果您尝试在尝试获取group.record_set.most_recent_record_for_every_contact()时尝试使用聚合执行的查询数量最小化,则您将无法成功。

Without using aggregation, you can get the most recent record for all contacts associated with a group using:

在不使用聚合的情况下,您可以使用以下方式获取与组关联的所有联系人的最新记录:

[x.record_set.all().order_by('-record_date')[0] for x in group.contact_set.all()]

Using aggregation, the closest I could get to that was:

使用聚合,我能得到的最接近的是:

group.record_set.values('contact').annotate(latest_date=Max('record_date'))

The latter returns a list of dictionaries like:

后者返回一个字典列表,如:

[{'contact': 1, 'latest_date': somedate }, {'contact': 2, 'latest_date': somedate }]

So one entry for for each contact in a given group and the latest record date associated with it.

因此,给定组中每个联系人的一个条目以及与其关联的最新记录日期。

Anyway, the minimum query number is probably 1 + # of contacts in a group. If you are interested obtaining the result using a single query, that is also possible, but you'll have to construct your models in a different way. But that's a totally different aspect of your problem.

无论如何,最小查询号可能是组中联系人的1 +#。如果您有兴趣使用单个查询获得结果,那也是可能的,但您必须以不同的方式构建模型。但这是你问题的一个完全不同的方面。

I hope this will help you understand how to approach the problem using aggregation/ the regular ORM functions.

我希望这可以帮助您了解如何使用聚合/常规ORM函数来解决问题。