在内连接之前使用子查询更有效吗?

时间:2022-01-17 04:17:25

I'm just in the process of learning MYSQL, and have something I've been wondering about.

我刚刚学习MYSQL,并且有一些我一直在想的东西。

Let's take this simple scenario: A hypothetical website for taking online courses, comprised of 4 tables: Students, Teachers, Courses and Registrations (one entry per course that a student has registered for)

让我们来看看这个简单的场景:一个假设的在线课程网站,由4个表组成:学生,教师,课程和注册(学生注册的每门课程一个)

You can find the DB generation code on github.

您可以在github上找到数据库生成代码。

While the provided DB is tiny for clarity, to keep it relevant to what I need help with, let's assume that this is with a large enough database where efficiency would be a real issue - let's say hundreds of thousands of students, teachers, etc.

虽然提供的数据库很小,但为了保持与我需要的帮助相关,我们假设这是一个足够大的数据库,效率将是一个真正的问题 - 让我们说成千上万的学生,教师等。



As far as I understand with MYSQL, if we want a table of students being taught by 'Charles Darwin', one possible query would be this:

据我了解MYSQL,如果我们想要一个由'Charles Darwin'教授的学生表,一个可能的问题是:

Method 1

SELECT Students.name FROM Teachers
INNER JOIN Courses ON Teachers.id = Courses.teacher_id
INNER JOIN Registrations ON Courses.id = Registrations.course_id
INNER JOIN Students ON Registrations.student_id = Students.id
WHERE Teachers.name = "Charles Darwin"

which does indeed return what we want.

这确实回归了我们想要的东西。

+----------------+
| name           |
+----------------+
| John Doe       |
| Jamie Heineman |
| Claire Doe     |
+----------------+


So Here's my question:

所以这是我的问题:

With my (very) limited MYSQL knowledge, it seems to me that here we are JOIN-ing elements onto the teachers table, which could be quite large, while we are ultimately only after a single teacher, who we filter out at the very very end of the query.

凭借我(非常)有限的MYSQL知识,在我看来,我们在教师桌上加入了元素,这可能非常大,而我们最终只是在一位老师之后,我们过滤了非常非常查询结束。

My 'Intuition' Says that it would be much more efficient to first get a single row for the teacher we need, and then join the remaining stuff onto that instead:

我的“直觉”说,首先为我们需要的老师获取一行会更有效率,然后将其余的东西加入到其中:

Method 2

SELECT Students.name FROM (SELECT Teachers.id FROM Teachers WHERE Teachers.name = 
"Charles Darwin") as Teacher
INNER JOIN Courses ON Teacher.id = Courses.teacher_id
INNER JOIN Registrations ON Courses.id = Registrations.course_id
INNER JOIN Students ON Registrations.student_id = Students.id

But is that really the case? Assuming thousands of teachers and students, is this more efficient than the first query? It could be that MYSQL is smart enough to parse the method 1 query in such a way that it runs more efficiently.

但情况确实如此吗?假设成千上万的老师和学生,这比第一个查询更有效吗?可能MYSQL足够智能,可以解析方法1查询,使其运行更有效。


Also, if anyone could suggest an even more efficient query, I would be quite interested to hear it too.

此外,如果任何人都可以提出更有效的查询,我也很想听到它。

Note: I've read before to use EXPLAIN to figure out how efficient a query is, but I don't understand MYSQL well enough to be able to decipher the result. Any insight here would be much appreciated as well.

注意:我之前读过使用EXPLAIN来确定查询的效率,但我不太了解MYSQL能够解密结果。这里的任何见解也将非常受欢迎。

1 个解决方案

#1


1  

My 'Intuition' Says that it would be much more efficient to first get a single row for the teacher we need, and then join the remaining stuff onto that instead:

我的“直觉”说,首先为我们需要的老师获取一行会更有效率,然后将其余的东西加入到其中:

You are getting a single row for teacher in method 1 by using the predicate Teachers.name = "Charles Darwin". The query optimiser should determine that it is more efficient to restrict the Teacher set using this predicate before joining the other tables.

通过使用谓词Teachers.name =“Charles Darwin”,您在方法1中为教师获得了一行。查询优化器应确定在加入其他表之前使用此谓词限制教师集更有效。

If you don't trust the optimiser or want to lessen the work it does you can even force the table read order by using SELECT STRAIGHT_JOIN ... or STRAIGHT_JOIN instead of INNER_JOIN to make sure that MySQL reads the tables in the order you have specified in the query.

如果您不信任优化器或想要减少它所做的工作,您甚至可以使用SELECT STRAIGHT_JOIN ...或STRAIGHT_JOIN而不是INNER_JOIN来强制执行表读取顺序,以确保MySQL按照您指定的顺序读取表在查询中。

Your second query results in the same answer but may be less efficient because a temporary table is created for your teacher subquery.

您的第二个查询会得到相同的答案,但可能效率较低,因为为您的教师子查询创建了临时表。

The EXPLAIN documentation is a good source on how to interpret the EXPLAIN output.

EXPLAIN文档是如何解释EXPLAIN输出的一个很好的来源。

#1


1  

My 'Intuition' Says that it would be much more efficient to first get a single row for the teacher we need, and then join the remaining stuff onto that instead:

我的“直觉”说,首先为我们需要的老师获取一行会更有效率,然后将其余的东西加入到其中:

You are getting a single row for teacher in method 1 by using the predicate Teachers.name = "Charles Darwin". The query optimiser should determine that it is more efficient to restrict the Teacher set using this predicate before joining the other tables.

通过使用谓词Teachers.name =“Charles Darwin”,您在方法1中为教师获得了一行。查询优化器应确定在加入其他表之前使用此谓词限制教师集更有效。

If you don't trust the optimiser or want to lessen the work it does you can even force the table read order by using SELECT STRAIGHT_JOIN ... or STRAIGHT_JOIN instead of INNER_JOIN to make sure that MySQL reads the tables in the order you have specified in the query.

如果您不信任优化器或想要减少它所做的工作,您甚至可以使用SELECT STRAIGHT_JOIN ...或STRAIGHT_JOIN而不是INNER_JOIN来强制执行表读取顺序,以确保MySQL按照您指定的顺序读取表在查询中。

Your second query results in the same answer but may be less efficient because a temporary table is created for your teacher subquery.

您的第二个查询会得到相同的答案,但可能效率较低,因为为您的教师子查询创建了临时表。

The EXPLAIN documentation is a good source on how to interpret the EXPLAIN output.

EXPLAIN文档是如何解释EXPLAIN输出的一个很好的来源。