来自两个表的SQL查询:消耗时间

时间:2022-07-13 15:21:24

I have two tables. In the table UnRevisedTable goes the raw data and in the table RevisedTable there is only entries of revised values which a human inserts when he finds a value in UnRevisedTable than he doesnt likes.

我有两张桌子。在UnRevisedTable表中,原始数据和表RevisedTable中只有人类在UnRevisedTable中找到一个值时所插入的修订值条目,而不是他不喜欢的值。

In UnRevisedTable there is one entry every minute and in RevisedTable there is one entry for every unwanted entry in UnRevisedTable.

在UnRevisedTable中,每分钟有一个条目,在RevisedTable中,UnRevisedTable中的每个不需要的条目都有一个条目。

The purpose of those two queries is the same: Show me the union of revised and unrevised data replacing the unrevised data when there is revised data.

这两个查询的目的是相同的:在修改数据时,向我展示修订和未修改数据的联合,替换未修改的数据。

I started writing the query2 but it was too slow. Then i wrote the query1 and the query1 is much, much faster.

我开始编写query2但它太慢了。然后我写了query1,query1更快,更快。

My question is why query1 is faster than query2? Thx.

我的问题是为什么query1比query2更快?谢谢。

query1:
    SELECT o.start_date_time,
        CASE WHEN r.start_date_time IS NULL THEN o.value ELSE r.value END AS value,
        FROM UnRevisedTable o LEFT JOIN RevisedTable r ON o.start_date_time = r.start_date_time
        WHERE o.start_date_time >= '".$start."' AND o.start_date_time < '".$finish."' ORDER BY start_date_time ASC;

query2:
    select * from(
      select RevisedTable.* from RevisedTable where start_date_time between '".$start."' and '".$finish."' 
          union 
      select UnRevisedTable.* from UnRevisedTable where start_date_time between '".$start."' and '".$finish."'
    ) as t1 group by start_date_time;

As Abhik Chakraborty sugested here are the explain query1 and explain query2:

由于Abhik Chakraborty在这里是suginted解释query1并解释query2:

query1:

+----+-------------+-------+--------+---------------+---------+---------+------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                    | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+------------------------+------+-------------+
|  1 | SIMPLE      | o     | range  | PRIMARY       | PRIMARY | 8       | NULL                   |    9 | Using where |
|  1 | SIMPLE      | r     | eq_ref | PRIMARY       | PRIMARY | 8       | Mydb.o.start_date_time |    1 |             |
+----+-------------+-------+--------+---------------+---------+---------+------------------------+------+-------------+

query2:

+----+--------------+------------+-------+---------------+---------+---------+------+------+---------------------------------+    
| id | select_type  | table      | type  | possible_keys | key     | key_len | ref  | rows | Extra                           |
+----+--------------+------------+-------+---------------+---------+---------+------+------+---------------------------------+
|  1 | PRIMARY      | <derived2> | ALL   | NULL          | NULL    | NULL    | NULL |   14 | Using temporary; Using filesort |
|  2 | DERIVED      | RevisedTable  | range | PRIMARY    | PRIMARY | 8       | NULL |    2 | Using where                     |
|  3 | UNION        | UnRevisedTable| range | PRIMARY    | PRIMARY | 8       | NULL |   10 | Using where                     |
| NULL | UNION RESULT | <union2,3> | ALL   | NULL        | NULL    | NULL    | NULL | NULL |                                 |
+----+--------------+------------+-------+---------------+---------+---------+------+------+---------------------------------+

2 个解决方案

#1


2  

First, the two queries do not do the same thing.

首先,这两个查询不会做同样的事情。

  • The first query returns only times from UnrevisedTable. If there are times in the RevisedTable not in Unrevised, you won't get them.
  • 第一个查询仅返回UnrevisedTable的时间。如果RevisedTable中的某些时间没有在Unrevised中,您将无法获得它们。

  • The first query is using a join. And, if you have an index on the join key, start_date_time, then the index will be used. Aggregation is less likely to use indexes.
  • 第一个查询是使用连接。并且,如果您在连接键start_date_time上有索引,那么将使用索引。聚合不太可能使用索引。

  • The first query will not remove duplicates that might exist within each table. The second does.
  • 第一个查询不会删除每个表中可能存在的重复项。第二个。

  • The two queries will not necessarily return the same value when there is a match in the two tables.
  • 当两个表中存在匹配时,这两个查询不一定会返回相同的值。

The second query is doing two aggregations, but no joins. The first aggregation is for removing duplicate values (for the union) and the second for the outer group by.

第二个查询是进行两次聚合,但没有连接。第一个聚合用于删除重复值(对于联合),第二个聚合用于删除外部组。

I would start by revising the second query to be:

我将从修改第二个查询开始:

select start_date_time, max(value) as value
from ((select start_date_time, value
       from RevisedTable
       where start_date_time between '".$start."' and '".$finish."' 
      ) union all
      (select start_date_time, value
       from UnRevisedTable
       where start_date_time between '".$start."' and '".$finish."'
      )
     ) t1
 group by start_date_time

You will probably also find that this is slower than the join. The MySQL engine does a better job of implementing joins than aggregations.

您可能还会发现这比连接慢。 MySQL引擎比聚合更好地实现连接。

#2


1  

To answer you question the slower query uses a UNION, which removes rows which are duplicated between the first and second tables. This will require a sort which is often expensive. You can see that in query 2's plan as a filesort. You can eliminate this sort by using a UNION ALL instead.

为了回答你的问题,较慢的查询使用UNION,它删除在第一个和第二个表之间重复的行。这将需要一种通常很昂贵的种类。您可以在查询2的计划中看到它作为文件排序。您可以使用UNION ALL来消除此类排序。

You may like to consider replacing the CASE statement in query 1 with COALESCE which returns the first non-null. It will make for an easier-read query and may run faster.

您可以考虑使用COALESCE替换查询1中的CASE语句,该语句返回第一个非空值。它将使查询更容易阅读,并可以更快地运行。

SELECT 
    o.start_date_time,
    COALESCE (o.value, r.value) AS value
FROM UnRevisedTable o LEFT JOIN RevisedTable r ON o.start_date_time = r.start_date_time
WHERE o.start_date_time >= '".$start."' 
AND o.start_date_time < '".$finish."' 
ORDER BY start_date_time ASC;

#1


2  

First, the two queries do not do the same thing.

首先,这两个查询不会做同样的事情。

  • The first query returns only times from UnrevisedTable. If there are times in the RevisedTable not in Unrevised, you won't get them.
  • 第一个查询仅返回UnrevisedTable的时间。如果RevisedTable中的某些时间没有在Unrevised中,您将无法获得它们。

  • The first query is using a join. And, if you have an index on the join key, start_date_time, then the index will be used. Aggregation is less likely to use indexes.
  • 第一个查询是使用连接。并且,如果您在连接键start_date_time上有索引,那么将使用索引。聚合不太可能使用索引。

  • The first query will not remove duplicates that might exist within each table. The second does.
  • 第一个查询不会删除每个表中可能存在的重复项。第二个。

  • The two queries will not necessarily return the same value when there is a match in the two tables.
  • 当两个表中存在匹配时,这两个查询不一定会返回相同的值。

The second query is doing two aggregations, but no joins. The first aggregation is for removing duplicate values (for the union) and the second for the outer group by.

第二个查询是进行两次聚合,但没有连接。第一个聚合用于删除重复值(对于联合),第二个聚合用于删除外部组。

I would start by revising the second query to be:

我将从修改第二个查询开始:

select start_date_time, max(value) as value
from ((select start_date_time, value
       from RevisedTable
       where start_date_time between '".$start."' and '".$finish."' 
      ) union all
      (select start_date_time, value
       from UnRevisedTable
       where start_date_time between '".$start."' and '".$finish."'
      )
     ) t1
 group by start_date_time

You will probably also find that this is slower than the join. The MySQL engine does a better job of implementing joins than aggregations.

您可能还会发现这比连接慢。 MySQL引擎比聚合更好地实现连接。

#2


1  

To answer you question the slower query uses a UNION, which removes rows which are duplicated between the first and second tables. This will require a sort which is often expensive. You can see that in query 2's plan as a filesort. You can eliminate this sort by using a UNION ALL instead.

为了回答你的问题,较慢的查询使用UNION,它删除在第一个和第二个表之间重复的行。这将需要一种通常很昂贵的种类。您可以在查询2的计划中看到它作为文件排序。您可以使用UNION ALL来消除此类排序。

You may like to consider replacing the CASE statement in query 1 with COALESCE which returns the first non-null. It will make for an easier-read query and may run faster.

您可以考虑使用COALESCE替换查询1中的CASE语句,该语句返回第一个非空值。它将使查询更容易阅读,并可以更快地运行。

SELECT 
    o.start_date_time,
    COALESCE (o.value, r.value) AS value
FROM UnRevisedTable o LEFT JOIN RevisedTable r ON o.start_date_time = r.start_date_time
WHERE o.start_date_time >= '".$start."' 
AND o.start_date_time < '".$finish."' 
ORDER BY start_date_time ASC;