
时间:2022-07-13 15:21:24

I have two tables. In the table UnRevisedTable goes the raw data and in the table RevisedTable there is only entries of revised values which a human inserts when he finds a value in UnRevisedTable than he doesnt likes.


In UnRevisedTable there is one entry every minute and in RevisedTable there is one entry for every unwanted entry in UnRevisedTable.


The purpose of those two queries is the same: Show me the union of revised and unrevised data replacing the unrevised data when there is revised data.


I started writing the query2 but it was too slow. Then i wrote the query1 and the query1 is much, much faster.


My question is why query1 is faster than query2? Thx.


    SELECT o.start_date_time,
        CASE WHEN r.start_date_time IS NULL THEN o.value ELSE r.value END AS value,
        FROM UnRevisedTable o LEFT JOIN RevisedTable r ON o.start_date_time = r.start_date_time
        WHERE o.start_date_time >= '".$start."' AND o.start_date_time < '".$finish."' ORDER BY start_date_time ASC;

    select * from(
      select RevisedTable.* from RevisedTable where start_date_time between '".$start."' and '".$finish."' 
      select UnRevisedTable.* from UnRevisedTable where start_date_time between '".$start."' and '".$finish."'
    ) as t1 group by start_date_time;

As Abhik Chakraborty sugested here are the explain query1 and explain query2:

由于Abhik Chakraborty在这里是suginted解释query1并解释query2:


| id | select_type | table | type   | possible_keys | key     | key_len | ref                    | rows | Extra       |
|  1 | SIMPLE      | o     | range  | PRIMARY       | PRIMARY | 8       | NULL                   |    9 | Using where |
|  1 | SIMPLE      | r     | eq_ref | PRIMARY       | PRIMARY | 8       | Mydb.o.start_date_time |    1 |             |


| id | select_type  | table      | type  | possible_keys | key     | key_len | ref  | rows | Extra                           |
|  1 | PRIMARY      | <derived2> | ALL   | NULL          | NULL    | NULL    | NULL |   14 | Using temporary; Using filesort |
|  2 | DERIVED      | RevisedTable  | range | PRIMARY    | PRIMARY | 8       | NULL |    2 | Using where                     |
|  3 | UNION        | UnRevisedTable| range | PRIMARY    | PRIMARY | 8       | NULL |   10 | Using where                     |
| NULL | UNION RESULT | <union2,3> | ALL   | NULL        | NULL    | NULL    | NULL | NULL |                                 |

2 个解决方案



First, the two queries do not do the same thing.


  • The first query returns only times from UnrevisedTable. If there are times in the RevisedTable not in Unrevised, you won't get them.
  • 第一个查询仅返回UnrevisedTable的时间。如果RevisedTable中的某些时间没有在Unrevised中,您将无法获得它们。

  • The first query is using a join. And, if you have an index on the join key, start_date_time, then the index will be used. Aggregation is less likely to use indexes.
  • 第一个查询是使用连接。并且,如果您在连接键start_date_time上有索引,那么将使用索引。聚合不太可能使用索引。

  • The first query will not remove duplicates that might exist within each table. The second does.
  • 第一个查询不会删除每个表中可能存在的重复项。第二个。

  • The two queries will not necessarily return the same value when there is a match in the two tables.
  • 当两个表中存在匹配时,这两个查询不一定会返回相同的值。

The second query is doing two aggregations, but no joins. The first aggregation is for removing duplicate values (for the union) and the second for the outer group by.


I would start by revising the second query to be:


select start_date_time, max(value) as value
from ((select start_date_time, value
       from RevisedTable
       where start_date_time between '".$start."' and '".$finish."' 
      ) union all
      (select start_date_time, value
       from UnRevisedTable
       where start_date_time between '".$start."' and '".$finish."'
     ) t1
 group by start_date_time

You will probably also find that this is slower than the join. The MySQL engine does a better job of implementing joins than aggregations.

您可能还会发现这比连接慢。 MySQL引擎比聚合更好地实现连接。



To answer you question the slower query uses a UNION, which removes rows which are duplicated between the first and second tables. This will require a sort which is often expensive. You can see that in query 2's plan as a filesort. You can eliminate this sort by using a UNION ALL instead.

为了回答你的问题,较慢的查询使用UNION,它删除在第一个和第二个表之间重复的行。这将需要一种通常很昂贵的种类。您可以在查询2的计划中看到它作为文件排序。您可以使用UNION ALL来消除此类排序。

You may like to consider replacing the CASE statement in query 1 with COALESCE which returns the first non-null. It will make for an easier-read query and may run faster.


    COALESCE (o.value, r.value) AS value
FROM UnRevisedTable o LEFT JOIN RevisedTable r ON o.start_date_time = r.start_date_time
WHERE o.start_date_time >= '".$start."' 
AND o.start_date_time < '".$finish."' 
ORDER BY start_date_time ASC;



First, the two queries do not do the same thing.


  • The first query returns only times from UnrevisedTable. If there are times in the RevisedTable not in Unrevised, you won't get them.
  • 第一个查询仅返回UnrevisedTable的时间。如果RevisedTable中的某些时间没有在Unrevised中,您将无法获得它们。

  • The first query is using a join. And, if you have an index on the join key, start_date_time, then the index will be used. Aggregation is less likely to use indexes.
  • 第一个查询是使用连接。并且,如果您在连接键start_date_time上有索引,那么将使用索引。聚合不太可能使用索引。

  • The first query will not remove duplicates that might exist within each table. The second does.
  • 第一个查询不会删除每个表中可能存在的重复项。第二个。

  • The two queries will not necessarily return the same value when there is a match in the two tables.
  • 当两个表中存在匹配时,这两个查询不一定会返回相同的值。

The second query is doing two aggregations, but no joins. The first aggregation is for removing duplicate values (for the union) and the second for the outer group by.


I would start by revising the second query to be:


select start_date_time, max(value) as value
from ((select start_date_time, value
       from RevisedTable
       where start_date_time between '".$start."' and '".$finish."' 
      ) union all
      (select start_date_time, value
       from UnRevisedTable
       where start_date_time between '".$start."' and '".$finish."'
     ) t1
 group by start_date_time

You will probably also find that this is slower than the join. The MySQL engine does a better job of implementing joins than aggregations.

您可能还会发现这比连接慢。 MySQL引擎比聚合更好地实现连接。



To answer you question the slower query uses a UNION, which removes rows which are duplicated between the first and second tables. This will require a sort which is often expensive. You can see that in query 2's plan as a filesort. You can eliminate this sort by using a UNION ALL instead.

为了回答你的问题,较慢的查询使用UNION,它删除在第一个和第二个表之间重复的行。这将需要一种通常很昂贵的种类。您可以在查询2的计划中看到它作为文件排序。您可以使用UNION ALL来消除此类排序。

You may like to consider replacing the CASE statement in query 1 with COALESCE which returns the first non-null. It will make for an easier-read query and may run faster.


    COALESCE (o.value, r.value) AS value
FROM UnRevisedTable o LEFT JOIN RevisedTable r ON o.start_date_time = r.start_date_time
WHERE o.start_date_time >= '".$start."' 
AND o.start_date_time < '".$finish."' 
ORDER BY start_date_time ASC;