I have two tables. In the table UnRevisedTable goes the raw data and in the table RevisedTable there is only entries of revised values which a human inserts when he finds a value in UnRevisedTable than he doesnt likes.
我有两张桌子。在UnRevisedTable表中,原始数据和表RevisedTable中只有人类在UnRevisedTable中找到一个值时所插入的修订值条目,而不是他不喜欢的值。
In UnRevisedTable there is one entry every minute and in RevisedTable there is one entry for every unwanted entry in UnRevisedTable.
在UnRevisedTable中,每分钟有一个条目,在RevisedTable中,UnRevisedTable中的每个不需要的条目都有一个条目。
The purpose of those two queries is the same: Show me the union of revised and unrevised data replacing the unrevised data when there is revised data.
这两个查询的目的是相同的:在修改数据时,向我展示修订和未修改数据的联合,替换未修改的数据。
I started writing the query2 but it was too slow. Then i wrote the query1 and the query1 is much, much faster.
我开始编写query2但它太慢了。然后我写了query1,query1更快,更快。
My question is why query1 is faster than query2? Thx.
我的问题是为什么query1比query2更快?谢谢。
query1:
SELECT o.start_date_time,
CASE WHEN r.start_date_time IS NULL THEN o.value ELSE r.value END AS value,
FROM UnRevisedTable o LEFT JOIN RevisedTable r ON o.start_date_time = r.start_date_time
WHERE o.start_date_time >= '".$start."' AND o.start_date_time < '".$finish."' ORDER BY start_date_time ASC;
query2:
select * from(
select RevisedTable.* from RevisedTable where start_date_time between '".$start."' and '".$finish."'
union
select UnRevisedTable.* from UnRevisedTable where start_date_time between '".$start."' and '".$finish."'
) as t1 group by start_date_time;
As Abhik Chakraborty sugested here are the explain query1 and explain query2:
由于Abhik Chakraborty在这里是suginted解释query1并解释query2:
query1:
+----+-------------+-------+--------+---------------+---------+---------+------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------------------------+------+-------------+
| 1 | SIMPLE | o | range | PRIMARY | PRIMARY | 8 | NULL | 9 | Using where |
| 1 | SIMPLE | r | eq_ref | PRIMARY | PRIMARY | 8 | Mydb.o.start_date_time | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+------------------------+------+-------------+
query2:
+----+--------------+------------+-------+---------------+---------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+-------+---------------+---------+---------+------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 14 | Using temporary; Using filesort |
| 2 | DERIVED | RevisedTable | range | PRIMARY | PRIMARY | 8 | NULL | 2 | Using where |
| 3 | UNION | UnRevisedTable| range | PRIMARY | PRIMARY | 8 | NULL | 10 | Using where |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+-------+---------------+---------+---------+------+------+---------------------------------+
2 个解决方案
#1
2
First, the two queries do not do the same thing.
首先,这两个查询不会做同样的事情。
- The first query returns only times from
UnrevisedTable
. If there are times in theRevisedTable
not inUnrevised
, you won't get them. - The first query is using a
join
. And, if you have an index on the join key,start_date_time
, then the index will be used. Aggregation is less likely to use indexes. - The first query will not remove duplicates that might exist within each table. The second does.
- The two queries will not necessarily return the same
value
when there is a match in the two tables.
第一个查询仅返回UnrevisedTable的时间。如果RevisedTable中的某些时间没有在Unrevised中,您将无法获得它们。
第一个查询是使用连接。并且,如果您在连接键start_date_time上有索引,那么将使用索引。聚合不太可能使用索引。
第一个查询不会删除每个表中可能存在的重复项。第二个。
当两个表中存在匹配时,这两个查询不一定会返回相同的值。
The second query is doing two aggregations, but no joins. The first aggregation is for removing duplicate values (for the union
) and the second for the outer group by
.
第二个查询是进行两次聚合,但没有连接。第一个聚合用于删除重复值(对于联合),第二个聚合用于删除外部组。
I would start by revising the second query to be:
我将从修改第二个查询开始:
select start_date_time, max(value) as value
from ((select start_date_time, value
from RevisedTable
where start_date_time between '".$start."' and '".$finish."'
) union all
(select start_date_time, value
from UnRevisedTable
where start_date_time between '".$start."' and '".$finish."'
)
) t1
group by start_date_time
You will probably also find that this is slower than the join
. The MySQL engine does a better job of implementing joins than aggregations.
您可能还会发现这比连接慢。 MySQL引擎比聚合更好地实现连接。
#2
1
To answer you question the slower query uses a UNION
, which removes rows which are duplicated between the first and second tables. This will require a sort which is often expensive. You can see that in query 2's plan as a filesort. You can eliminate this sort by using a UNION ALL
instead.
为了回答你的问题,较慢的查询使用UNION,它删除在第一个和第二个表之间重复的行。这将需要一种通常很昂贵的种类。您可以在查询2的计划中看到它作为文件排序。您可以使用UNION ALL来消除此类排序。
You may like to consider replacing the CASE statement in query 1 with COALESCE
which returns the first non-null. It will make for an easier-read query and may run faster.
您可以考虑使用COALESCE替换查询1中的CASE语句,该语句返回第一个非空值。它将使查询更容易阅读,并可以更快地运行。
SELECT
o.start_date_time,
COALESCE (o.value, r.value) AS value
FROM UnRevisedTable o LEFT JOIN RevisedTable r ON o.start_date_time = r.start_date_time
WHERE o.start_date_time >= '".$start."'
AND o.start_date_time < '".$finish."'
ORDER BY start_date_time ASC;
#1
2
First, the two queries do not do the same thing.
首先,这两个查询不会做同样的事情。
- The first query returns only times from
UnrevisedTable
. If there are times in theRevisedTable
not inUnrevised
, you won't get them. - The first query is using a
join
. And, if you have an index on the join key,start_date_time
, then the index will be used. Aggregation is less likely to use indexes. - The first query will not remove duplicates that might exist within each table. The second does.
- The two queries will not necessarily return the same
value
when there is a match in the two tables.
第一个查询仅返回UnrevisedTable的时间。如果RevisedTable中的某些时间没有在Unrevised中,您将无法获得它们。
第一个查询是使用连接。并且,如果您在连接键start_date_time上有索引,那么将使用索引。聚合不太可能使用索引。
第一个查询不会删除每个表中可能存在的重复项。第二个。
当两个表中存在匹配时,这两个查询不一定会返回相同的值。
The second query is doing two aggregations, but no joins. The first aggregation is for removing duplicate values (for the union
) and the second for the outer group by
.
第二个查询是进行两次聚合,但没有连接。第一个聚合用于删除重复值(对于联合),第二个聚合用于删除外部组。
I would start by revising the second query to be:
我将从修改第二个查询开始:
select start_date_time, max(value) as value
from ((select start_date_time, value
from RevisedTable
where start_date_time between '".$start."' and '".$finish."'
) union all
(select start_date_time, value
from UnRevisedTable
where start_date_time between '".$start."' and '".$finish."'
)
) t1
group by start_date_time
You will probably also find that this is slower than the join
. The MySQL engine does a better job of implementing joins than aggregations.
您可能还会发现这比连接慢。 MySQL引擎比聚合更好地实现连接。
#2
1
To answer you question the slower query uses a UNION
, which removes rows which are duplicated between the first and second tables. This will require a sort which is often expensive. You can see that in query 2's plan as a filesort. You can eliminate this sort by using a UNION ALL
instead.
为了回答你的问题,较慢的查询使用UNION,它删除在第一个和第二个表之间重复的行。这将需要一种通常很昂贵的种类。您可以在查询2的计划中看到它作为文件排序。您可以使用UNION ALL来消除此类排序。
You may like to consider replacing the CASE statement in query 1 with COALESCE
which returns the first non-null. It will make for an easier-read query and may run faster.
您可以考虑使用COALESCE替换查询1中的CASE语句,该语句返回第一个非空值。它将使查询更容易阅读,并可以更快地运行。
SELECT
o.start_date_time,
COALESCE (o.value, r.value) AS value
FROM UnRevisedTable o LEFT JOIN RevisedTable r ON o.start_date_time = r.start_date_time
WHERE o.start_date_time >= '".$start."'
AND o.start_date_time < '".$finish."'
ORDER BY start_date_time ASC;