来自两个表的SQL查询：消耗时间

I have two tables. In the table UnRevisedTable goes the raw data and in the table RevisedTable there is only entries of revised values which a human inserts when he finds a value in UnRevisedTable than he doesnt likes.

我有两张桌子。在UnRevisedTable表中,原始数据和表RevisedTable中只有人类在UnRevisedTable中找到一个值时所插入的修订值条目,而不是他不喜欢的值。

In UnRevisedTable there is one entry every minute and in RevisedTable there is one entry for every unwanted entry in UnRevisedTable.

在UnRevisedTable中,每分钟有一个条目,在RevisedTable中,UnRevisedTable中的每个不需要的条目都有一个条目。

The purpose of those two queries is the same: Show me the union of revised and unrevised data replacing the unrevised data when there is revised data.

这两个查询的目的是相同的:在修改数据时,向我展示修订和未修改数据的联合,替换未修改的数据。

I started writing the query2 but it was too slow. Then i wrote the query1 and the query1 is much, much faster.

我开始编写query2但它太慢了。然后我写了query1,query1更快,更快。

My question is why query1 is faster than query2? Thx.

我的问题是为什么query1比query2更快?谢谢。

query1:
    SELECT o.start_date_time,
        CASE WHEN r.start_date_time IS NULL THEN o.value ELSE r.value END AS value,
        FROM UnRevisedTable o LEFT JOIN RevisedTable r ON o.start_date_time = r.start_date_time
        WHERE o.start_date_time >= '".$start."' AND o.start_date_time < '".$finish."' ORDER BY start_date_time ASC;

query2:
    select * from(
      select RevisedTable.* from RevisedTable where start_date_time between '".$start."' and '".$finish."' 
          union 
      select UnRevisedTable.* from UnRevisedTable where start_date_time between '".$start."' and '".$finish."'
    ) as t1 group by start_date_time;

As Abhik Chakraborty sugested here are the explain query1 and explain query2:

由于Abhik Chakraborty在这里是suginted解释query1并解释query2:

query1:

+----+-------------+-------+--------+---------------+---------+---------+------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                    | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+------------------------+------+-------------+
|  1 | SIMPLE      | o     | range  | PRIMARY       | PRIMARY | 8       | NULL                   |    9 | Using where |
|  1 | SIMPLE      | r     | eq_ref | PRIMARY       | PRIMARY | 8       | Mydb.o.start_date_time |    1 |             |
+----+-------------+-------+--------+---------------+---------+---------+------------------------+------+-------------+

query2:

+----+--------------+------------+-------+---------------+---------+---------+------+------+---------------------------------+    
| id | select_type  | table      | type  | possible_keys | key     | key_len | ref  | rows | Extra                           |
+----+--------------+------------+-------+---------------+---------+---------+------+------+---------------------------------+
|  1 | PRIMARY      | <derived2> | ALL   | NULL          | NULL    | NULL    | NULL |   14 | Using temporary; Using filesort |
|  2 | DERIVED      | RevisedTable  | range | PRIMARY    | PRIMARY | 8       | NULL |    2 | Using where                     |
|  3 | UNION        | UnRevisedTable| range | PRIMARY    | PRIMARY | 8       | NULL |   10 | Using where                     |
| NULL | UNION RESULT | <union2,3> | ALL   | NULL        | NULL    | NULL    | NULL | NULL |                                 |
+----+--------------+------------+-------+---------------+---------+---------+------+------+---------------------------------+

2 个解决方案

#1

First, the two queries do not do the same thing.

首先,这两个查询不会做同样的事情。

The first query returns only times from UnrevisedTable. If there are times in the RevisedTable not in Unrevised, you won't get them.

第一个查询仅返回UnrevisedTable的时间。如果RevisedTable中的某些时间没有在Unrevised中,您将无法获得它们。

The first query is using a join. And, if you have an index on the join key, start_date_time, then the index will be used. Aggregation is less likely to use indexes.

第一个查询是使用连接。并且,如果您在连接键start_date_time上有索引,那么将使用索引。聚合不太可能使用索引。

The first query will not remove duplicates that might exist within each table. The second does.

第一个查询不会删除每个表中可能存在的重复项。第二个。

The two queries will not necessarily return the same value when there is a match in the two tables.

当两个表中存在匹配时,这两个查询不一定会返回相同的值。

The second query is doing two aggregations, but no joins. The first aggregation is for removing duplicate values (for the union) and the second for the outer group by.

第二个查询是进行两次聚合,但没有连接。第一个聚合用于删除重复值(对于联合),第二个聚合用于删除外部组。

I would start by revising the second query to be:

我将从修改第二个查询开始:

select start_date_time, max(value) as value
from ((select start_date_time, value
       from RevisedTable
       where start_date_time between '".$start."' and '".$finish."' 
      ) union all
      (select start_date_time, value
       from UnRevisedTable
       where start_date_time between '".$start."' and '".$finish."'
      )
     ) t1
 group by start_date_time

You will probably also find that this is slower than the join. The MySQL engine does a better job of implementing joins than aggregations.

您可能还会发现这比连接慢。 MySQL引擎比聚合更好地实现连接。

#2

To answer you question the slower query uses a UNION, which removes rows which are duplicated between the first and second tables. This will require a sort which is often expensive. You can see that in query 2's plan as a filesort. You can eliminate this sort by using a UNION ALL instead.

为了回答你的问题,较慢的查询使用UNION,它删除在第一个和第二个表之间重复的行。这将需要一种通常很昂贵的种类。您可以在查询2的计划中看到它作为文件排序。您可以使用UNION ALL来消除此类排序。

You may like to consider replacing the CASE statement in query 1 with COALESCE which returns the first non-null. It will make for an easier-read query and may run faster.

您可以考虑使用COALESCE替换查询1中的CASE语句,该语句返回第一个非空值。它将使查询更容易阅读,并可以更快地运行。

SELECT 
    o.start_date_time,
    COALESCE (o.value, r.value) AS value
FROM UnRevisedTable o LEFT JOIN RevisedTable r ON o.start_date_time = r.start_date_time
WHERE o.start_date_time >= '".$start."' 
AND o.start_date_time < '".$finish."' 
ORDER BY start_date_time ASC;

#1