何时在django ORM中使用或不使用iterator()

时间:2022-06-14 17:08:27

This is from the django docs on the queryset iterator() method:

这是来自queryset迭代器()方法上的django文档:

A QuerySet typically caches its results internally so that repeated evaluations do not result in additional queries. In contrast, iterator() will read results directly, without doing any caching at the QuerySet level (internally, the defaul t iterator calls iterator() and caches the return value). For a QuerySet which returns a large number of objects that you only need to access once, this can results in better performance and a significant reduction in memory.

QuerySet通常在内部缓存其结果,以便重复的计算不会导致额外的查询。相反,iterator()将直接读取结果,在QuerySet级别不做任何缓存(在内部,defaul t iterator调用iterator()并缓存返回值)。对于返回大量只需要访问一次的对象的QuerySet,这可以提高性能并显著减少内存。

After reading, I'm still confused: The line about increased performance and memory reduction suggests we should just use the iterator() method. Can someone give some examples of good and bad cases iterator() usage?

在阅读之后,我仍然感到困惑:关于提高性能和减少内存的一行建议我们应该使用iterator()方法。有人能举出一些使用iterator()的好和坏情况的例子吗?

Even if the query results are not cached, if they really wanted to access the models more than once, can't someone just do the following?

即使查询结果没有被缓存,如果他们真的想要多次访问模型,难道就不能有人做以下操作吗?

saved_queries = list(Model.objects.all().iterator())

2 个解决方案

#1


25  

Note the first part of the sentence you call out: For a QuerySet which returns a large number of objects that you only need to access once

注意您所调用的句子的第一部分:对于一个QuerySet,它返回大量的对象,您只需要访问一次

So the converse of this is: if you need to re-use a set of results, and they are not so numerous as to cause a memory problem then you should not use iterator. Because the extra database round trip is always going to reduce your performance vs. using the cached result.

因此,相反的是:如果您需要重用一组结果,而这些结果并不多到导致内存问题的程度,那么您不应该使用迭代器。因为与使用缓存的结果相比,额外的数据库往返总是会降低性能。

You could force your QuerySet to be evaluated into a list but:

您可以强制您的QuerySet被评估为一个列表,但是:

  • it requires more typing than just saved_queries = Model.objects.all()
  • 它需要更多的输入,而不仅仅是saved_queries = Model.objects.all()
  • say you are paginating results on a web page: you will have forced all results into memory (back to possible memory problems) rather than allowing the subsequent paginator to select the slice of 20 results it needs
  • 假设您正在对web页面上的结果进行分页:您将强制将所有结果放入内存(返回到可能的内存问题),而不允许后续的分页器选择它需要的20个结果的切片
  • QuerySets are lazy, so you can have a context processor, for instance, that puts a QuerySet into the context of every request but only gets evaluated when you access it on certain requests but if you've forced evaluation that database hit happens every request
  • QuerySets是惰性的,因此您可以有一个上下文处理器,例如,它将一个QuerySet放到每个请求的上下文中,但是只有在访问特定请求时才对其进行计算,但是如果您强制计算,那么每次请求都会发生数据库命中

The typical web app case is for relatively small result sets (they have to be delivered to a browser in a timely fashion, so pagination or a similar technique is employed to decrease the data volume if required) so generally the standard QuerySet behaviour is what you want. As you are no doubt aware, you must store the QuerySet in a variable to get the benefit of the caching.

典型的web应用程序案例是针对相对较小的结果集(它们必须及时地交付给浏览器,因此如果需要,可以使用分页或类似的技术来减少数据量),所以通常标准的QuerySet行为就是您想要的。正如您所知道的,您必须将QuerySet存储在一个变量中,以获得缓存的好处。

Good use of iterator: processing results that take up a large amount of available memory (lots of small objects or fewer large objects). In my experience this is often in management commands when doing heavy data processing.

很好的使用迭代器:处理占用大量可用内存的结果(大量的小对象或较少的大对象)。在我的经验中,这经常出现在管理命令中,当执行繁重的数据处理时。

#2


0  

I agree with Steven and I would like to had an observation:

我同意Steven的观点,我想做个观察:

  • "it requires more typing than just saved_queries = Model.objects.all()". Yes it does but there is a major difference why you should use list(Model.objcts.all()). Let me give you an example, if you put the that assigned to a variable, it will execute the query and than save it there, let's imagine you have +1M records, so that means, you will have +1M records in a list taht you may or may not use immediately after, so I would recommend only using as Steven said, only using Model.objects.all(), because this assigned to a variable, it won't execute until you call the variable, saving you DB calls.

    “它需要更多的输入,而不仅仅是saved_queries = Model.objects.all()”。是的,但是使用list(Model.objcts.all())有很大的不同。让我给你举个例子,如果你把分配给一个变量,它将执行查询和比保存它,让我们想象你有记录+ 1米,这意味着,你将会有+ 1 m列表中记录:您可能会或可能不会使用后立即,所以我建议只使用史蒂文说,只用Model.objects.all(),因为这个分配给一个变量,它不会执行,直到你叫变量,保存你数据库调用。

  • You should use the prefetch_related() to save you from doing to many calls into a DB and therefore, it will use the django reverse lookup to help you and save you tons of time.

    您应该使用prefetch_related()来避免对数据库中的许多调用执行操作,因此,它将使用django反向查找来帮助您并节省大量时间。

#1


25  

Note the first part of the sentence you call out: For a QuerySet which returns a large number of objects that you only need to access once

注意您所调用的句子的第一部分:对于一个QuerySet,它返回大量的对象,您只需要访问一次

So the converse of this is: if you need to re-use a set of results, and they are not so numerous as to cause a memory problem then you should not use iterator. Because the extra database round trip is always going to reduce your performance vs. using the cached result.

因此,相反的是:如果您需要重用一组结果,而这些结果并不多到导致内存问题的程度,那么您不应该使用迭代器。因为与使用缓存的结果相比,额外的数据库往返总是会降低性能。

You could force your QuerySet to be evaluated into a list but:

您可以强制您的QuerySet被评估为一个列表,但是:

  • it requires more typing than just saved_queries = Model.objects.all()
  • 它需要更多的输入,而不仅仅是saved_queries = Model.objects.all()
  • say you are paginating results on a web page: you will have forced all results into memory (back to possible memory problems) rather than allowing the subsequent paginator to select the slice of 20 results it needs
  • 假设您正在对web页面上的结果进行分页:您将强制将所有结果放入内存(返回到可能的内存问题),而不允许后续的分页器选择它需要的20个结果的切片
  • QuerySets are lazy, so you can have a context processor, for instance, that puts a QuerySet into the context of every request but only gets evaluated when you access it on certain requests but if you've forced evaluation that database hit happens every request
  • QuerySets是惰性的,因此您可以有一个上下文处理器,例如,它将一个QuerySet放到每个请求的上下文中,但是只有在访问特定请求时才对其进行计算,但是如果您强制计算,那么每次请求都会发生数据库命中

The typical web app case is for relatively small result sets (they have to be delivered to a browser in a timely fashion, so pagination or a similar technique is employed to decrease the data volume if required) so generally the standard QuerySet behaviour is what you want. As you are no doubt aware, you must store the QuerySet in a variable to get the benefit of the caching.

典型的web应用程序案例是针对相对较小的结果集(它们必须及时地交付给浏览器,因此如果需要,可以使用分页或类似的技术来减少数据量),所以通常标准的QuerySet行为就是您想要的。正如您所知道的,您必须将QuerySet存储在一个变量中,以获得缓存的好处。

Good use of iterator: processing results that take up a large amount of available memory (lots of small objects or fewer large objects). In my experience this is often in management commands when doing heavy data processing.

很好的使用迭代器:处理占用大量可用内存的结果(大量的小对象或较少的大对象)。在我的经验中,这经常出现在管理命令中,当执行繁重的数据处理时。

#2


0  

I agree with Steven and I would like to had an observation:

我同意Steven的观点,我想做个观察:

  • "it requires more typing than just saved_queries = Model.objects.all()". Yes it does but there is a major difference why you should use list(Model.objcts.all()). Let me give you an example, if you put the that assigned to a variable, it will execute the query and than save it there, let's imagine you have +1M records, so that means, you will have +1M records in a list taht you may or may not use immediately after, so I would recommend only using as Steven said, only using Model.objects.all(), because this assigned to a variable, it won't execute until you call the variable, saving you DB calls.

    “它需要更多的输入,而不仅仅是saved_queries = Model.objects.all()”。是的,但是使用list(Model.objcts.all())有很大的不同。让我给你举个例子,如果你把分配给一个变量,它将执行查询和比保存它,让我们想象你有记录+ 1米,这意味着,你将会有+ 1 m列表中记录:您可能会或可能不会使用后立即,所以我建议只使用史蒂文说,只用Model.objects.all(),因为这个分配给一个变量,它不会执行,直到你叫变量,保存你数据库调用。

  • You should use the prefetch_related() to save you from doing to many calls into a DB and therefore, it will use the django reverse lookup to help you and save you tons of time.

    您应该使用prefetch_related()来避免对数据库中的许多调用执行操作,因此,它将使用django反向查找来帮助您并节省大量时间。