I have some big data sets that I am looping through to display a table of data. The trouble is the looping takes a ton of time, which is okay at the moment as this is an internal tool but I would like to improve it.
我有一些大数据集,我正在循环显示数据表。问题是循环需要大量的时间,目前还可以,因为这是一个内部工具,但我想改进它。
The model:
该模型:
class Metric_Data(models.Model):
metric = models.ForeignKey(Metric)
count = models.IntegerField()
start_date = models.DateField()
I am displaying a table where the first column is the dates then each following column is a metric listing the count for that date. Like so:
我正在显示一个表,其中第一列是日期,然后每个后续列是一个列出该日期计数的指标。像这样:
Dates Metric Metric Metric ...
10/11 10 11 12
11/11 22 100 1000
... ... ... ...
I tried looping over the data in the view and creating the table out of lists and passing this to the template for rendering but with several metrics and thousands of data points per metric this was rather slow. I have since switched to a template tag:
我尝试在视图中循环数据并从列表中创建表并将其传递给模板进行渲染,但是每个指标有几个指标和数千个数据点,这相当慢。我已经切换到模板标签:
def getIndex(parser, token):
try:
tag_name, a_list, index = token.split_contents()
except ValueError:
raise template.TemplateSyntaxError, "%r tag requires exactly two arguments" % token.contents.split()[0]
return GetIndexNode(a_list, index)
class GetIndexNode(template.Node):
def __init__(self, a_list, index):
self.the_list = template.Variable(a_list)
self.index = template.Variable(index)
def render(self, context):
try:
the_list = self.the_list.resolve(context)
i = self.index.resolve(context)
return the_list[i]
except template.VariableDoesNotExist:
return ''
Which is still rather slow, which just could because it is my first time writing a template tag and I've done something wrong.
这仍然很慢,这可能是因为这是我第一次写模板标签而且我做错了。
EDIT: I am fetching the data in the view like so:
编辑:我在视图中获取数据,如下所示:
def show_all(request):
metrics = Metric.objects.all()
dates = Metric_Data.objects.all().values_list('start_date',flat=True).distinct().order_by('start_date')
data = []
for metric in metrics:
data.append(Metric_Data.objects.filter(metric=metric).order_by('start_date').values_list('count', flat=True))
return render_to_response('metric/show_all.html', {'dates': dates,
'metrics': metrics,
'data': data})
Edit: And the template
编辑:和模板
<table id="theTable" class="paginate-5">
<thead>
<tr>
<th>Dates</th>
{% for metric in metrics %}
<th>{{ metric.name }}</th>
{% endfor %}
</tr>
</thead>
<tbody>
{% for date in dates %}
<tr>
<td>{{date}}</td>
{% for metric in data %}
<td>{% get_index metric forloop.parentloop.counter0 %}</td>
{% endfor %}
</tr>
{% endfor %}
</tbody>
I am thinking the best place to fix this problem might be in the model but I'm not sure how to go about it. Create a table for the dates perhaps and do the query on that table?
我认为解决这个问题的最佳位置可能在模型中,但我不知道如何去解决它。为日期创建一个表,并在该表上进行查询?
Ideas much appreciated thanks!
非常感谢的想法谢谢!
3 个解决方案
#1
1
I think you're just badly grouping your data, so that you end up looping multiple times over the same items, yielding very poor complexity. Try to structure your data very closely to the way it will be used in the template.
我认为你只是对数据进行了严格的分组,因此最终会在相同的项目上多次循环,从而产生非常差的复杂性。尝试将数据结构与模板中使用的方式非常接近。
For instance:
例如:
def metric_count_on ( metric, date ):
return Metric_Data.objects.filter(metric=metric,start_date=date).values_list('count',flat=True)
def show_all(request):
metrics = Metric.objects.all()
dates = Metric_Data.objects.all().values_list('start_date',flat=True).distinct().order_by('start_date')
# generate full report. now, template only has to loop.
data = [{'date':date, 'metrics':metric_count_on(date, metric)}
for (date,metric) in itertools.product(dates,metrics)]
# ...
Then, in the template, you can basically just loop as:
然后,在模板中,您基本上可以循环为:
{% for row in data %}
<tr>
<td>{{ row.date }}</td>
{% for count in row.metrics %}
<td>{{ count }}</td>
{% endfor %}
</tr>
{% endfor %}
#2
0
If you find your view to be slow, the problem is often in the database. Are you sure you know what queries are going to the database? It's possible you can make a small change that would greatly reduce the db traffic.
如果您发现视图缓慢,则问题通常出在数据库中。您确定知道哪些查询会进入数据库吗?您可以进行一些小的更改,从而大大减少数据库流量。
#3
0
I found this blog post which seems to hint at similar problems.
我发现这篇博文似乎暗示了类似的问题。
http://www.xorad.com/blog/?p=1497
http://www.xorad.com/blog/?p=1497
Using "|safe" on all my variables cut my load time in half which is at least something...
在我的所有变量上使用“| safe”会将我的加载时间缩短一半,这至少是......
Posting in case anyone else stumbles on this problem.
发布以防万一其他人偶然发现此问题。
#1
1
I think you're just badly grouping your data, so that you end up looping multiple times over the same items, yielding very poor complexity. Try to structure your data very closely to the way it will be used in the template.
我认为你只是对数据进行了严格的分组,因此最终会在相同的项目上多次循环,从而产生非常差的复杂性。尝试将数据结构与模板中使用的方式非常接近。
For instance:
例如:
def metric_count_on ( metric, date ):
return Metric_Data.objects.filter(metric=metric,start_date=date).values_list('count',flat=True)
def show_all(request):
metrics = Metric.objects.all()
dates = Metric_Data.objects.all().values_list('start_date',flat=True).distinct().order_by('start_date')
# generate full report. now, template only has to loop.
data = [{'date':date, 'metrics':metric_count_on(date, metric)}
for (date,metric) in itertools.product(dates,metrics)]
# ...
Then, in the template, you can basically just loop as:
然后,在模板中,您基本上可以循环为:
{% for row in data %}
<tr>
<td>{{ row.date }}</td>
{% for count in row.metrics %}
<td>{{ count }}</td>
{% endfor %}
</tr>
{% endfor %}
#2
0
If you find your view to be slow, the problem is often in the database. Are you sure you know what queries are going to the database? It's possible you can make a small change that would greatly reduce the db traffic.
如果您发现视图缓慢,则问题通常出在数据库中。您确定知道哪些查询会进入数据库吗?您可以进行一些小的更改,从而大大减少数据库流量。
#3
0
I found this blog post which seems to hint at similar problems.
我发现这篇博文似乎暗示了类似的问题。
http://www.xorad.com/blog/?p=1497
http://www.xorad.com/blog/?p=1497
Using "|safe" on all my variables cut my load time in half which is at least something...
在我的所有变量上使用“| safe”会将我的加载时间缩短一半,这至少是......
Posting in case anyone else stumbles on this problem.
发布以防万一其他人偶然发现此问题。