I am working in this query that runs succesfully
我正在这个成功运行的查询中工作
select
hash,
SUM(DATE(TIMESTAMP) = CURDATE()) as today,
sum(DATE(TIMESTAMP) between DATE_SUB(CURDATE( ), INTERVAL 7 DAY) and DATE_SUB(CURDATE( ), INTERVAL 1 DAY)) as last_week
from behaviour
group by hash
having last_week > 0 and today > last_week
order by today desc
and I am trying to optimize it.
我正在努力优化它。
I am trying this to avoid the last_week>0
into the having clause without any luck. I get an "invalid use of group function"
我试图避免last_week> 0进入having子句而没有任何运气。我得到“无效使用群组功能”
select
hash,
SUM(DATE(TIMESTAMP) = CURDATE()) as today,
sum(DATE(TIMESTAMP) between DATE_SUB(CURDATE( ), INTERVAL 7 DAY) and DATE_SUB(CURDATE( ), INTERVAL 1 DAY)) as last_week
from behaviour
where
and (sum(DATE(TIMESTAMP) between DATE_SUB(CURDATE( ), INTERVAL 4 DAY) and DATE_SUB(CURDATE( ), INTERVAL 1 DAY)) > 0)
group by hash
having today > last_week
order by today desc
How can I optimize it? Because in a big table it takes about 1 minute to execute.
我该如何优化它?因为在大表中执行大约需要1分钟。
1 个解决方案
#1
3
You want to filter before doing the aggregation:
您想在进行聚合之前进行过滤:
select hash,
sum(DATE(TIMESTAMP) = CURDATE()) as today,
sum(DATE(TIMESTAMP) between DATE_SUB(CURDATE( ), INTERVAL 7 DAY) and DATE_SUB(CURDATE( ), INTERVAL 1 DAY)) as last_week
from behaviour
where timestamp >= curdate() - interval 7 day
timestamp < curdate() + interval 1 day
group by hash
having today > last_week and last_week > 0
order by today desc;
This reduces the volume of data needed for the group by
-- and that should significantly improve performance. You might be able to further improve performance with an index on (timestamp, hash)
.
这减少了组所需的数据量 - 这应该可以显着提高性能。您可以通过索引(时间戳,哈希)进一步提高性能。
You still need the having
clause because you want additional filters on the results. The performance gain is from filtering before the aggregation, though.
您仍然需要having子句,因为您需要对结果进行其他过滤。但是,性能增益来自聚合之前的过滤。
#1
3
You want to filter before doing the aggregation:
您想在进行聚合之前进行过滤:
select hash,
sum(DATE(TIMESTAMP) = CURDATE()) as today,
sum(DATE(TIMESTAMP) between DATE_SUB(CURDATE( ), INTERVAL 7 DAY) and DATE_SUB(CURDATE( ), INTERVAL 1 DAY)) as last_week
from behaviour
where timestamp >= curdate() - interval 7 day
timestamp < curdate() + interval 1 day
group by hash
having today > last_week and last_week > 0
order by today desc;
This reduces the volume of data needed for the group by
-- and that should significantly improve performance. You might be able to further improve performance with an index on (timestamp, hash)
.
这减少了组所需的数据量 - 这应该可以显着提高性能。您可以通过索引(时间戳,哈希)进一步提高性能。
You still need the having
clause because you want additional filters on the results. The performance gain is from filtering before the aggregation, though.
您仍然需要having子句,因为您需要对结果进行其他过滤。但是,性能增益来自聚合之前的过滤。