使用内部联接优化MySQL查询

时间:2021-07-25 15:40:26

I've spent a lot of time optimizing this query but it's starting to slow down with larger tables. I imagine these are probably the worst types of questions but I'm looking for some guidance. I'm not really at liberty to disclose the database schema so hopefully this is enough information. Thanks,

我花了很多时间来优化这个查询,但它开始因为更大的表而变慢。我想这些可能是最糟糕的问题类型,但我正在寻找一些指导。我并不是真的可以*地披露数据库架构,所以希望这是足够的信息。谢谢,

SELECT tblA.id, tblB.id, tblC.id, tblD.id
FROM tblA, tblB, tblC, tblD
INNER JOIN (SELECT max(tblB.id) AS xid
                FROM tblB
                WHERE tblB.rdd = 11305
                GROUP BY tblB.index_id
                ORDER BY NULL) AS rddx
           ON tblB.id = rddx.xid
WHERE
    tblA.id = tblB.index_id
    AND tblC.name = tblD.s_type
    AND tblD.name = tblA.s_name
GROUP BY tblA.s_name
ORDER BY NULL;

There is a one-to-many relationship between:

以下之间存在一对多的关系:

  • tblA.id and tblB.index_id
  • tblA.id和tblB.index_id
  • tblC.name and tblD.s_type
  • tblC.name和tblD.s_type
  • tblD.name and tblA.s_name
  • tblD.name和tblA.s_name
+----+-------------+------------+--------+---------------+-----------+---------+------------------------------+-------+------------------------------+
| id | select_type | table      | type   | possible_keys | key       | key_len | ref                          | rows  | Extra                        |
+----+-------------+------------+--------+---------------+-----------+---------+------------------------------+-------+------------------------------+
|  1 | PRIMARY     | derived2   | ALL    | NULL          | NULL      | NULL    | NULL                         | 32568 | Using temporary              |
|  1 | PRIMARY     | tblB       | eq_ref | PRIMARY       | PRIMARY   | 8       | rddx.xid                     |     1 |                              |
|  1 | PRIMARY     | tblA       | eq_ref | PRIMARY       | PRIMARY   | 8       | tblB.index_id                |     1 | Using where                  |
|  1 | PRIMARY     | tblD       | eq_ref | PRIMARY       | PRIMARY   | 22      | tblA.s_name                  |     1 | Using where                  |
|  1 | PRIMARY     | tblC       | eq_ref | PRIMARY       | PRIMARY   | 22      | tblD.s_type                  |     1 |                              |
|  2 | DERIVED     | tblB       | ref    | rdd_idx       | rdd_idx   | 7       |                              | 65722 | Using where; Using temporary |
+----+-------------+------------+--------+---------------+-----------+---------+------------------------------+-------+------------------------------+

2 个解决方案

#1


1  

I have updated the query using joins instead of the join within the WHERE clause. Also, by looking at it, as a developer, you can directly see the relationship between the tables. A->B, A->D and D->C. Now, on table B where you want the highest ID based on the common "ID=Index_ID" AND the RDD = 11305 won't require a complete sub-query. However, this has moved the "MAX()" to the upper portion of the field selection clause. I would ensure you have an index on tblB on (index_id, rdd). Finally, by doing STRAIGHT_JOIN will help enforce the order to run the query based on how specifically listed.

我已使用连接而不是WHERE子句中的连接更新了查询。另外,通过查看它,作为开发人员,您可以直接查看表之间的关系。 A-> B,A-> D和D-> C.现在,在表B中,您希望基于公共“ID = Index_ID”的最高ID,并且RDD = 11305将不需要完整的子查询。但是,这已将“MAX()”移动到字段选择子句的上部。我会确保你有一个关于tblB的索引(index_id,rdd)。最后,通过执行STRAIGHT_JOIN将有助于根据具体列出的方式强制执行运行查询的顺序。

-- EDIT FROM COMMENT --

- 从评论编辑 -

It appears you are getting nulls from the tblB. This typically indicates a valid tblA record, but no tblB record by same ID that has an RDD = 11305. That said, it appears you are only concerned with those entries associated with 11305, so I'm adjusting the query accordingly. Please make sure you have an index on tblB based on the "RDD" column (at least in the first position in case multiple column index)

看来你从tblB得到了空值。这通常表示有效的tblA记录,但没有相同ID的RDB = 11305的tblB记录。也就是说,看起来你只关心那些与11305相关的条目,所以我正在相应地调整查询。请确保您在tblB上有一个基于“RDD”列的索引(至少在多列索引的情况下位于第一个位置)

As you can see in this one, I'm pre-querying from table B only for 11305 entries and pre-grouping by the index_ID (as linked to tblA). This gives me one record per index where they will exist... From THIS result, I'm joining back to A, then directly back to B again, but based on that highest match ID found, then D and C as was before. So NOW, you can get any column from any of the tables and get proper record in question... There should be no NULL values left in this query.

正如您在本文中所看到的,我只是从表B中查询11305个条目并通过index_ID进行预分组(与tblA链接)。这给了我一个记录,每个索引它们将存在...从这个结果,我加入回A,然后再次直接回到B,但根据找到的最高匹配ID,然后D和C一如既往。所以现在,您可以从任何表中获取任何列并获得正确的记录...此查询中应该没有剩余的NULL值。

Hopefully, I've clarified HOW I'm getting the pieces together for you.

希望,我已经澄清了我是如何为你准备的。

SELECT STRAIGHT_JOIN 
      PreQuery.HighestPerIndexID
      tblA.id, 
      tblA.AnotherAField,
      tblA.Etc,
      tblB.SomeOtherField,
      tblB.AnotherField,
      tblC.id, 
      tblD.id
   FROM 
      ( select PQ1.Index_ID,
               max( PQ1.ID ) as HighestPerIndexID
           from tblB PQ1
           where PQ1.RDD = 11305
           group by PQ1.Index_ID ) PreQuery

         JOIN tblA
            on PreQuery.Index_ID = tblA.ID

         join tblB
            on PreQuery.HighestPerIndexID = tblB.ID

         join tblD
            on tblA.s_Name = tblD.name

            join tblC
               on tblD.s_type = tblC.Name
    ORDER BY 
       tblA.s_Name

#2


2  

Unless I've misunderstood the information that you've provided I believe you could re-write the above query as follows

除非我误解了您提供的信息,否则我相信您可以重新编写上述查询,如下所示

EXPLAIN SELECT tblA.id, MAX(tblB.id), tblC.id, tblD.id
FROM tblA
LEFT JOIN tblD ON tblD.name = tblA.s_name
LEFT JOIN tblC ON tblC.name = tblD.s_type
LEFT JOIN tblB ON tblA.id = tblB.index_id
WHERE tblB.rdd = 11305
ORDER BY NULL;

Obviously I can't provide an explain for this as explain depends on the data in your database. It would be interesting to see the explain on this query.

显然我不能为此提供解释,因为解释取决于数据库中的数据。看到这个查询的解释会很有趣。

Obviously explain only gives you an estimate of what will happen. You can use SHOW SESSION STATUS to provide in details of what happened when you run an actual query. Make sure to run before you run the query that you are investigating so that you have clean data to read from. So in this case you would run

显然,解释只能估计会发生什么。您可以使用SHOW SESSION STATUS来详细说明运行实际查询时发生的情况。确保在运行您正在调查的查询之前运行,以便您可以读取干净的数据。所以在这种情况下你会跑

FLUSH STATUS;

EXPLAIN SELECT tblA.id, MAX(tblB.id), tblC.id, tblD.id
FROM tblA
LEFT JOIN tblD ON tblD.name = tblA.s_name
LEFT JOIN tblC ON tblC.name = tblD.s_type
LEFT JOIN tblB ON tblA.id = tblB.index_id
WHERE tblB.rdd = 11305
ORDER BY NULL;

SHOW SESSION STATUS LIKE 'ha%';

This gives you a number of indicators to show what actually happened when a query executed.

这为您提供了许多指标,以显示执行查询时实际发生的情况。

Handler_read_rnd_next - Number of requests to read next row in the data file
Handler_read_key - Number of requests to read a row based on a key
Handler_read_next - Number of requests to read the next row in key order

Using these values you can see exactly what is going on under the hood.

使用这些值,您可以准确地看到引擎盖下发生了什么。

Unfortunately without knowing the data in the tables, engine type and the data types used in the queries it is quite hard to advise on how you could optimize.

遗憾的是,如果不知道表中的数据,引擎类型和查询中使用的数据类型,很难就如何优化提出建议。

#1


1  

I have updated the query using joins instead of the join within the WHERE clause. Also, by looking at it, as a developer, you can directly see the relationship between the tables. A->B, A->D and D->C. Now, on table B where you want the highest ID based on the common "ID=Index_ID" AND the RDD = 11305 won't require a complete sub-query. However, this has moved the "MAX()" to the upper portion of the field selection clause. I would ensure you have an index on tblB on (index_id, rdd). Finally, by doing STRAIGHT_JOIN will help enforce the order to run the query based on how specifically listed.

我已使用连接而不是WHERE子句中的连接更新了查询。另外,通过查看它,作为开发人员,您可以直接查看表之间的关系。 A-> B,A-> D和D-> C.现在,在表B中,您希望基于公共“ID = Index_ID”的最高ID,并且RDD = 11305将不需要完整的子查询。但是,这已将“MAX()”移动到字段选择子句的上部。我会确保你有一个关于tblB的索引(index_id,rdd)。最后,通过执行STRAIGHT_JOIN将有助于根据具体列出的方式强制执行运行查询的顺序。

-- EDIT FROM COMMENT --

- 从评论编辑 -

It appears you are getting nulls from the tblB. This typically indicates a valid tblA record, but no tblB record by same ID that has an RDD = 11305. That said, it appears you are only concerned with those entries associated with 11305, so I'm adjusting the query accordingly. Please make sure you have an index on tblB based on the "RDD" column (at least in the first position in case multiple column index)

看来你从tblB得到了空值。这通常表示有效的tblA记录,但没有相同ID的RDB = 11305的tblB记录。也就是说,看起来你只关心那些与11305相关的条目,所以我正在相应地调整查询。请确保您在tblB上有一个基于“RDD”列的索引(至少在多列索引的情况下位于第一个位置)

As you can see in this one, I'm pre-querying from table B only for 11305 entries and pre-grouping by the index_ID (as linked to tblA). This gives me one record per index where they will exist... From THIS result, I'm joining back to A, then directly back to B again, but based on that highest match ID found, then D and C as was before. So NOW, you can get any column from any of the tables and get proper record in question... There should be no NULL values left in this query.

正如您在本文中所看到的,我只是从表B中查询11305个条目并通过index_ID进行预分组(与tblA链接)。这给了我一个记录,每个索引它们将存在...从这个结果,我加入回A,然后再次直接回到B,但根据找到的最高匹配ID,然后D和C一如既往。所以现在,您可以从任何表中获取任何列并获得正确的记录...此查询中应该没有剩余的NULL值。

Hopefully, I've clarified HOW I'm getting the pieces together for you.

希望,我已经澄清了我是如何为你准备的。

SELECT STRAIGHT_JOIN 
      PreQuery.HighestPerIndexID
      tblA.id, 
      tblA.AnotherAField,
      tblA.Etc,
      tblB.SomeOtherField,
      tblB.AnotherField,
      tblC.id, 
      tblD.id
   FROM 
      ( select PQ1.Index_ID,
               max( PQ1.ID ) as HighestPerIndexID
           from tblB PQ1
           where PQ1.RDD = 11305
           group by PQ1.Index_ID ) PreQuery

         JOIN tblA
            on PreQuery.Index_ID = tblA.ID

         join tblB
            on PreQuery.HighestPerIndexID = tblB.ID

         join tblD
            on tblA.s_Name = tblD.name

            join tblC
               on tblD.s_type = tblC.Name
    ORDER BY 
       tblA.s_Name

#2


2  

Unless I've misunderstood the information that you've provided I believe you could re-write the above query as follows

除非我误解了您提供的信息,否则我相信您可以重新编写上述查询,如下所示

EXPLAIN SELECT tblA.id, MAX(tblB.id), tblC.id, tblD.id
FROM tblA
LEFT JOIN tblD ON tblD.name = tblA.s_name
LEFT JOIN tblC ON tblC.name = tblD.s_type
LEFT JOIN tblB ON tblA.id = tblB.index_id
WHERE tblB.rdd = 11305
ORDER BY NULL;

Obviously I can't provide an explain for this as explain depends on the data in your database. It would be interesting to see the explain on this query.

显然我不能为此提供解释,因为解释取决于数据库中的数据。看到这个查询的解释会很有趣。

Obviously explain only gives you an estimate of what will happen. You can use SHOW SESSION STATUS to provide in details of what happened when you run an actual query. Make sure to run before you run the query that you are investigating so that you have clean data to read from. So in this case you would run

显然,解释只能估计会发生什么。您可以使用SHOW SESSION STATUS来详细说明运行实际查询时发生的情况。确保在运行您正在调查的查询之前运行,以便您可以读取干净的数据。所以在这种情况下你会跑

FLUSH STATUS;

EXPLAIN SELECT tblA.id, MAX(tblB.id), tblC.id, tblD.id
FROM tblA
LEFT JOIN tblD ON tblD.name = tblA.s_name
LEFT JOIN tblC ON tblC.name = tblD.s_type
LEFT JOIN tblB ON tblA.id = tblB.index_id
WHERE tblB.rdd = 11305
ORDER BY NULL;

SHOW SESSION STATUS LIKE 'ha%';

This gives you a number of indicators to show what actually happened when a query executed.

这为您提供了许多指标,以显示执行查询时实际发生的情况。

Handler_read_rnd_next - Number of requests to read next row in the data file
Handler_read_key - Number of requests to read a row based on a key
Handler_read_next - Number of requests to read the next row in key order

Using these values you can see exactly what is going on under the hood.

使用这些值,您可以准确地看到引擎盖下发生了什么。

Unfortunately without knowing the data in the tables, engine type and the data types used in the queries it is quite hard to advise on how you could optimize.

遗憾的是,如果不知道表中的数据,引擎类型和查询中使用的数据类型,很难就如何优化提出建议。