Django ORM和SQL内部连接

时间:2022-07-13 15:21:30

I am trying to get all Horse objects which fall within a specific from_date and to_date range on a related listing object. eg.

我正在尝试获取相关列表对象上属于特定from_date和to_date范围的所有Horse对象。如。

Horse.objects.filter(listings__to_date__lt=to_date.datetime,
listings__from_date__gt=from_date.datetime)

Now as I understand this database query creates an inner join which then enables me to find all my horse objects based on the related listing dates.

现在,我理解了这个数据库查询创建了一个内部连接,它使我能够根据相关的清单日期查找所有的马对象。

My question is how this exactly works, it probably comes down to a major lack of understanding in how inner joins actually work. Would this query need to first 'check' each and ever horse object first to ascertain whether or not it has a related listing object? I'd imagine this could prove to be quite inefficient because you might have 5million horse objects with no related listing object yet you still would have to check each and every one first?

我的问题是它到底是如何工作的,它可能归结为对内部连接的实际工作缺乏理解。这个查询是否需要首先“检查”每个和所有马对象,以确定它是否有一个相关的列表对象?我想这可能会被证明是非常低效的因为你可能有500万个没有相关列表对象的马但你仍然需要首先检查每一个?

Alternatively I could start with my Listings and do something like this first:

或者我可以先从我的列表开始,先做这样的事情:

Listing.objects.filter(to_date__lt=to_date.datetime, 
from_date__gt=from_date.datetime)

And then:

然后:

for listing in listing_objs:
    if listing.horse:
        horses.append(horse)

But this seems like a rather odd way of achieving my results too.

但这似乎也是实现我的结果的一种相当奇怪的方式。

If anyone could help me understand how queries work in Django and which is the most efficient way to go about doing such a query it would be a great help!

如果有人能帮助我理解查询在Django中是如何工作的,以及哪种查询是执行此类查询的最有效的方式,那将是一个很大的帮助!

This is my current model setup:

这是我目前的模型设置:

class Listing(models.Model):

    to_date = models.DateTimeField(null=True, blank=True)
    from_date = models.DateTimeField(null=True, blank=True)
    promoted_to_date = models.DateTimeField(null=True, blank=True)
    promoted_from_date = models.DateTimeField(null=True, blank=True)

    # Relationships
    horse = models.ForeignKey('Horse', related_name='listings', null=True, blank=True)

class Horse(models.Model):
    created_date = models.DateTimeField(null=True, blank=True, auto_now=True)
    type = models.CharField(max_length=200, null=True, blank=True)
    name = models.CharField(max_length=200, null=True, blank=True)
    age = models.IntegerField(null=True, blank=True)
    colour = models.CharField(max_length=200, null=True, blank=True)
    height = models.IntegerField(null=True, blank=True)

1 个解决方案

#1


1  

The way you write your query really depends on what information you want back most of the time. If you are interested in the horses, then query from Horse. If you're interested in listings then you should query from Listing. That's generally the correct thing to do, especially when you're working with simple foreign keys.

编写查询的方式实际上取决于您在大多数情况下希望返回的信息。如果你对马感兴趣,可以向马查询。如果您对列表感兴趣,那么您应该从列表中查询。这通常是正确的做法,特别是当您使用简单的外键时。

Your first query is probably the better one with regards to Django. I've used slightly simpler models to illustrate the differences. I've created an active field rather than using datetimes.

关于Django,您的第一个查询可能是更好的查询。我使用了稍微简单一点的模型来说明差异。我创建了一个活动字段,而不是使用datetimes。

In [18]: qs = Horse.objects.filter(listings__active=True)

In [19]: print(qs.query)
SELECT 
"scratch_horse"."id", 
"scratch_horse"."name" 
FROM "scratch_horse" 
INNER JOIN "scratch_listing" 
ON ( "scratch_horse"."id" = "scratch_listing"."horse_id" ) 
WHERE "scratch_listing"."active" = True

The inner join in the query above will ensure that you only get horses that have a listing. (Most) databases are very good at using joins and indexes to filter out unwanted rows.

上面查询中的内部连接将确保您只获得具有列表的马匹。(大多数)数据库非常擅长使用连接和索引来过滤不需要的行。

If Listing was very small, and Horse was rather large, then I would hope the database would only look at the Listing table, and then use an index to fetch the correct parts of Horse without doing a full table scan (inspecting every horse). You will need to run the query and check what your database is doing though. EXPLAIN (or whatever database you use) is extremely useful. If you're guessing what the database is doing, you're probably wrong.

如果列表很小,而且Horse很大,那么我希望数据库只查看列表表,然后使用索引获取Horse的正确部分,而不进行全表扫描(检查每匹马)。您将需要运行查询并检查您的数据库正在做什么。EXPLAIN(或您使用的任何数据库)非常有用。如果您正在猜测数据库在做什么,那么您可能错了。

Note that if you need to access the listings of each horse then you'll be executing another query each time you access horse.listings. prefetch_related can help you if you need to access listings, by executing a single query and storing it in cache.

注意,如果您需要访问每匹马的列表,那么每次访问horse.listing时,您将执行另一个查询。如果需要访问列表,prefetch_related可以帮助您执行一个查询并将其存储在缓存中。

Now, your second query:

现在,你的第二个查询:

In [20]: qs = Listing.objects.filter(active=True).select_related('horse')

In [21]: print(qs.query)
SELECT 
"scratch_listing"."id", 
"scratch_listing"."active", 
"scratch_listing"."horse_id", 
"scratch_horse"."id", 
"scratch_horse"."name" 
FROM "scratch_listing" 
LEFT OUTER JOIN "scratch_horse" 
ON ( "scratch_listing"."horse_id" = "scratch_horse"."id" ) 
WHERE "scratch_listing"."active" = True

This does a LEFT join, which means that the right hand side can contain NULL. The right hand side is Horse in this instance. This would perform very poorly if you had a lot of listings without a Horse, because it would bring back every single active listing, whether or not a horse was associated with it. You could fix that with .filter(active=True, horse__isnull=False) though.

它执行左连接,这意味着右边可以包含NULL。在这个例子中,右边是马。如果您有很多没有马的列表,那么这将表现得非常糟糕,因为它将返回每个活动列表,无论是否有马与它关联。您可以使用.filter(active=True, __isnull=False)进行修复。

See that I've used select_related, which joins the tables so that you're able to access listing.horse without incurring another query.

请注意,我使用了select_related,它连接了表,以便您能够访问列表。不会产生另一个查询的horse。

Now I should probably ask why all your fields are nullable. That's usually a terrible design choice, especially for ForeignKeys. Will you ever have a listing that's not associated with a horse? If not, get rid of the null. Will you ever have a horse that won't have a name? If not, get rid of the null.

现在我应该问为什么所有的字段都是空的。这通常是一个糟糕的设计选择,尤其是对外国人来说。你是否会有一个与马无关的清单?如果不是,就消去零。你会有一匹没有名字的马吗?如果不是,就消去零。

So the answer is, do what seems natural most of the time. If you know a particular table is going to be large, then you must inspect the query planner (EXPLAIN), look into adding/using indexes on filter/join conditions, or querying from the other side of the relation.

所以答案是,做大多数时候看起来很自然的事情。如果您知道一个特定的表将很大,那么您必须检查查询计划器(EXPLAIN),考虑在筛选/联接条件上添加/使用索引,或者从关系的另一端查询。

#1


1  

The way you write your query really depends on what information you want back most of the time. If you are interested in the horses, then query from Horse. If you're interested in listings then you should query from Listing. That's generally the correct thing to do, especially when you're working with simple foreign keys.

编写查询的方式实际上取决于您在大多数情况下希望返回的信息。如果你对马感兴趣,可以向马查询。如果您对列表感兴趣,那么您应该从列表中查询。这通常是正确的做法,特别是当您使用简单的外键时。

Your first query is probably the better one with regards to Django. I've used slightly simpler models to illustrate the differences. I've created an active field rather than using datetimes.

关于Django,您的第一个查询可能是更好的查询。我使用了稍微简单一点的模型来说明差异。我创建了一个活动字段,而不是使用datetimes。

In [18]: qs = Horse.objects.filter(listings__active=True)

In [19]: print(qs.query)
SELECT 
"scratch_horse"."id", 
"scratch_horse"."name" 
FROM "scratch_horse" 
INNER JOIN "scratch_listing" 
ON ( "scratch_horse"."id" = "scratch_listing"."horse_id" ) 
WHERE "scratch_listing"."active" = True

The inner join in the query above will ensure that you only get horses that have a listing. (Most) databases are very good at using joins and indexes to filter out unwanted rows.

上面查询中的内部连接将确保您只获得具有列表的马匹。(大多数)数据库非常擅长使用连接和索引来过滤不需要的行。

If Listing was very small, and Horse was rather large, then I would hope the database would only look at the Listing table, and then use an index to fetch the correct parts of Horse without doing a full table scan (inspecting every horse). You will need to run the query and check what your database is doing though. EXPLAIN (or whatever database you use) is extremely useful. If you're guessing what the database is doing, you're probably wrong.

如果列表很小,而且Horse很大,那么我希望数据库只查看列表表,然后使用索引获取Horse的正确部分,而不进行全表扫描(检查每匹马)。您将需要运行查询并检查您的数据库正在做什么。EXPLAIN(或您使用的任何数据库)非常有用。如果您正在猜测数据库在做什么,那么您可能错了。

Note that if you need to access the listings of each horse then you'll be executing another query each time you access horse.listings. prefetch_related can help you if you need to access listings, by executing a single query and storing it in cache.

注意,如果您需要访问每匹马的列表,那么每次访问horse.listing时,您将执行另一个查询。如果需要访问列表,prefetch_related可以帮助您执行一个查询并将其存储在缓存中。

Now, your second query:

现在,你的第二个查询:

In [20]: qs = Listing.objects.filter(active=True).select_related('horse')

In [21]: print(qs.query)
SELECT 
"scratch_listing"."id", 
"scratch_listing"."active", 
"scratch_listing"."horse_id", 
"scratch_horse"."id", 
"scratch_horse"."name" 
FROM "scratch_listing" 
LEFT OUTER JOIN "scratch_horse" 
ON ( "scratch_listing"."horse_id" = "scratch_horse"."id" ) 
WHERE "scratch_listing"."active" = True

This does a LEFT join, which means that the right hand side can contain NULL. The right hand side is Horse in this instance. This would perform very poorly if you had a lot of listings without a Horse, because it would bring back every single active listing, whether or not a horse was associated with it. You could fix that with .filter(active=True, horse__isnull=False) though.

它执行左连接,这意味着右边可以包含NULL。在这个例子中,右边是马。如果您有很多没有马的列表,那么这将表现得非常糟糕,因为它将返回每个活动列表,无论是否有马与它关联。您可以使用.filter(active=True, __isnull=False)进行修复。

See that I've used select_related, which joins the tables so that you're able to access listing.horse without incurring another query.

请注意,我使用了select_related,它连接了表,以便您能够访问列表。不会产生另一个查询的horse。

Now I should probably ask why all your fields are nullable. That's usually a terrible design choice, especially for ForeignKeys. Will you ever have a listing that's not associated with a horse? If not, get rid of the null. Will you ever have a horse that won't have a name? If not, get rid of the null.

现在我应该问为什么所有的字段都是空的。这通常是一个糟糕的设计选择,尤其是对外国人来说。你是否会有一个与马无关的清单?如果不是,就消去零。你会有一匹没有名字的马吗?如果不是,就消去零。

So the answer is, do what seems natural most of the time. If you know a particular table is going to be large, then you must inspect the query planner (EXPLAIN), look into adding/using indexes on filter/join conditions, or querying from the other side of the relation.

所以答案是,做大多数时候看起来很自然的事情。如果您知道一个特定的表将很大,那么您必须检查查询计划器(EXPLAIN),考虑在筛选/联接条件上添加/使用索引,或者从关系的另一端查询。