I searched all over place for an answer to this but couldn't find anything. Perhaps this is just a stupid question or a really tricky one. Here it is:
我搜遍了所有地方寻找答案但找不到任何东西。也许这只是一个愚蠢的问题,或者是一个非常棘手的问题。这里是:
Let's say my model is this (pseudo django code):
假设我的模型是这样的(伪django代码):
Event
type = ForeignKey(EventType)
name = CharField
date_start = DateField
date_end = DateField
EventType
name = CharField
What I want to know is the average duration time for each event type. What I do now is calculate the average duration whenever a new event is created (save method) and have that stored in an average_duration column in EventType. The problem with this approach is that I cannot answer questions like "what was the average duration time for events of type X, during the year Y". So instead of adding more columns to answer questions like these I would prefer to have it done in "real-time".
我想知道的是每种事件类型的平均持续时间。我现在所做的是计算创建新事件时的平均持续时间(保存方法),并将其存储在EventType的average_duration列中。这种方法的问题在于我无法回答诸如“在Y年期间X型事件的平均持续时间是多少”这样的问题。因此,我不想添加更多列来回答这些问题,而是希望“实时”完成。
Can this be done by annotating the queryset? First I would have to get the date differences for each event type, then come up with their average, and then annotate the Event queryset with that average, I assume.
可以通过注释查询集来完成吗?首先,我必须得到每个事件类型的日期差异,然后得出它们的平均值,然后用这个平均值注释事件查询集,我假设。
4 个解决方案
#1
23
Just an update. In Django >= 1.8 it is possible to do:
只是一个更新。在Django> = 1.8中,可以这样做:
from django.db.models import F, ExpressionWrapper, fields
duration = ExpressionWrapper(F('date_end') - F('date_start'), output_field=fields.DurationField())
events_with_duration = Event.objects.annotate(duration=duration)
after which you can run queries like:
之后您可以运行以下查询:
events_with_duration.filter(duration__gt=timedelta(days=10))
#2
2
I would suggest that you store the event duration as a column :
我建议您将事件持续时间存储为列:
event_duration = models.IntegerField()
...
def __init__(self, *args, **kwargs):
super(Event, self).__init__(*args, **kwargs)
self.update_event_duration()
def save(self, **kwargs):
self.update_event_duration()
super(Event, self).save(*args, **kwargs)
Then, you can use this library : http://code.google.com/p/django-cube/ to calculate the average on several different dimensions (year, type, or other) :
然后,您可以使用此库:http://code.google.com/p/django-cube/来计算几个不同维度(年份,类型或其他)的平均值:
>>> def my_avg(queryset):
... return queryset.aggregate(Avg("event_duration"))["event_duration__avg"]
>>> c = Cube(["date_start__year", "type"], Event.objects.all(), my_avg)
Use the cube like this :
像这样使用立方体:
>>> c.measure(date_start__year=1999, type=event_type2)
123.456
Or you can get all the averages on all years :
或者你可以获得所有年份的所有平均值:
>>> c.measure_dict("date_start__year")
{1984: {'measure': 111.789}, 1985: {'measure': 234.666}, ...}
Or by year/event type :
或按年/事件类型:
>>> c.measure_dict("date_start__year", "type")
{
1984: {eventtype1: {'measure': 111.789}, eventtype2: {'measure': 234.666}, ...},
1985: {eventtype1: {'measure': 122.79}, eventtype2: {'measure': 233.444}, ...},
...
}
#3
1
I think your best bet is to create an SQL view with the date_end - date_start
column, create a django model on this view and then you will be able to query the view and annotate it as you want. I've done this with models similars to yours and maybe I could extract some interesting code for you if you need.
我认为最好的办法是使用date_end - date_start列创建一个SQL视图,在此视图上创建一个django模型,然后您就可以查询视图并根据需要对其进行注释。我已经用你的模型类似物完成了这个,如果你需要的话,我可以为你提取一些有趣的代码。
#4
1
You'll need to create a queryset with the extra method to add the date difference to each row
您需要使用额外方法创建一个查询集,以便为每一行添加日期差异
Then use the aggregate method to compute the average for your just added column:
然后使用aggregate方法计算刚刚添加的列的平均值:
Be careful though, this method is slow and won't scale. Storing the computed value on event_type is imho your best option.
但要小心,这种方法很慢,无法扩展。将计算值存储在event_type上是最好的选择。
#1
23
Just an update. In Django >= 1.8 it is possible to do:
只是一个更新。在Django> = 1.8中,可以这样做:
from django.db.models import F, ExpressionWrapper, fields
duration = ExpressionWrapper(F('date_end') - F('date_start'), output_field=fields.DurationField())
events_with_duration = Event.objects.annotate(duration=duration)
after which you can run queries like:
之后您可以运行以下查询:
events_with_duration.filter(duration__gt=timedelta(days=10))
#2
2
I would suggest that you store the event duration as a column :
我建议您将事件持续时间存储为列:
event_duration = models.IntegerField()
...
def __init__(self, *args, **kwargs):
super(Event, self).__init__(*args, **kwargs)
self.update_event_duration()
def save(self, **kwargs):
self.update_event_duration()
super(Event, self).save(*args, **kwargs)
Then, you can use this library : http://code.google.com/p/django-cube/ to calculate the average on several different dimensions (year, type, or other) :
然后,您可以使用此库:http://code.google.com/p/django-cube/来计算几个不同维度(年份,类型或其他)的平均值:
>>> def my_avg(queryset):
... return queryset.aggregate(Avg("event_duration"))["event_duration__avg"]
>>> c = Cube(["date_start__year", "type"], Event.objects.all(), my_avg)
Use the cube like this :
像这样使用立方体:
>>> c.measure(date_start__year=1999, type=event_type2)
123.456
Or you can get all the averages on all years :
或者你可以获得所有年份的所有平均值:
>>> c.measure_dict("date_start__year")
{1984: {'measure': 111.789}, 1985: {'measure': 234.666}, ...}
Or by year/event type :
或按年/事件类型:
>>> c.measure_dict("date_start__year", "type")
{
1984: {eventtype1: {'measure': 111.789}, eventtype2: {'measure': 234.666}, ...},
1985: {eventtype1: {'measure': 122.79}, eventtype2: {'measure': 233.444}, ...},
...
}
#3
1
I think your best bet is to create an SQL view with the date_end - date_start
column, create a django model on this view and then you will be able to query the view and annotate it as you want. I've done this with models similars to yours and maybe I could extract some interesting code for you if you need.
我认为最好的办法是使用date_end - date_start列创建一个SQL视图,在此视图上创建一个django模型,然后您就可以查询视图并根据需要对其进行注释。我已经用你的模型类似物完成了这个,如果你需要的话,我可以为你提取一些有趣的代码。
#4
1
You'll need to create a queryset with the extra method to add the date difference to each row
您需要使用额外方法创建一个查询集,以便为每一行添加日期差异
Then use the aggregate method to compute the average for your just added column:
然后使用aggregate方法计算刚刚添加的列的平均值:
Be careful though, this method is slow and won't scale. Storing the computed value on event_type is imho your best option.
但要小心,这种方法很慢,无法扩展。将计算值存储在event_type上是最好的选择。