By way of introduction...
I've run into this question: Difference between 2 adjacent fields - Date - PHP MYSQL and was trying to achieve the goal, i.e iterate through dates and get diff, with pure MySQL.
Another question there (Subtracting one row of data from another in SQL) helped me to understand how to make something similar with MySQL. It did not solve the problem, as the solutions are still depandant either on fixed values or on assumed order of data, but it did help me to understand the methodology.
There is one other question (How to get next/previous record in MySQL?) with answers describing how to get values from next/previous row. It's still dependand on some fixed values, but I learned how to use the technique.
通过介绍...我遇到了这个问题:2个相邻字段之间的差异 - 日期 - PHP MYSQL并试图实现目标,即迭代日期并获得差异,使用纯MySQL。那里的另一个问题(在SQL中从另一行中减去一行数据)帮助我理解了如何使用类似的东西。它没有解决问题,因为解决方案仍然对固定值或假定的数据顺序不利,但它确实帮助我理解了方法。还有一个问题(如何在MySQL中获取下一个/上一个记录?),其中的答案描述了如何从下一行/上一行获取值。它仍然依赖于某些固定值,但我学会了如何使用该技术。
Say I have this table foo
:
说我有这个表foo:
CREATE TABLE `foo` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`dateof` date NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
id | dateof
-----+------------
1 | 2012-01-01
2 | 2012-01-02
11 | 2012-01-04
12 | 2012-01-01
13 | 2012-01-02
14 | 2012-01-09
111 | 2012-01-01
112 | 2012-01-01
113 | 2012-01-01
There are two assumptions:
有两个假设:
- Primary key (
id
) ordered ascending and "holes" allowed. - 主键(id)按升序排序,允许“孔”。
- Every date in
dateof
column is valid, in the meaning: noNULL
s and no defaults (0000-00-00
). I want to iterate through every row and calculate number of days passed with previous entry, to receive this: - 列中dateof中的每个日期都有效,其含义为:no NULLs,no defaults(0000-00-00)。我想迭代每一行并计算上一次输入的天数,以便接收:
id | date | days_diff
-----+------------+-----------
1 | 2012-01-01 | 0
2 | 2012-01-02 | 1
11 | 2012-01-04 | 2
12 | 2012-01-01 | -3
13 | 2012-01-02 | 1
14 | 2012-01-09 | 7
111 | 2012-01-01 | -8
112 | 2012-01-01 | 0
113 | 2012-01-01 | 30
With all I have learned I came to this solution (say solution 1, as there is another):
凭借我所学到的一切,我来到了这个解决方案(说解决方案1,因为还有另一个解决方案):
SELECT
f.id,
DATE_FORMAT(f.dateof, '%b %e, %Y') AS date,
(SELECT DATEDIFF(f.dateof, f2.dateof)
FROM foo f2
WHERE f2.id = (
SELECT MAX(f3.id) FROM foo f3 WHERE f3.id < f.id
)
) AS days_diff
FROM foo f;
(example fiddle here: http://sqlfiddle.com/#!2/099fc/3).
(例如这里的小提琴:http://sqlfiddle.com/#!2 / 099fc / 3)。
This works like a charm... until there are just a couple of entries in db. It gets worse when more:
这就像一个魅力...直到db中只有几个条目。更多时候会变得更糟:
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY f ALL NULL NULL NULL NULL 17221
2 DEPENDENT SUBQUERY f2 eq_ref PRIMARY PRIMARY 4 func 1 Using where
3 DEPENDENT SUBQUERY f3 index PRIMARY PRIMARY 4 NULL 17221 Using where; Using index
18031 rows: duration: 8.672 sec. Fetch: 228.515 sec.
18031行:持续时间:8.672秒。获取:228.515秒。
I thought of adding index on dateof
column:
我想在dateof列上添加索引:
CREATE TABLE `foo` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`dateof` date DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `dateof` (`dateof`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
...and gained tiny improvement:
...并获得了微小的改进:
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY f index NULL dateof 4 NULL 18369 Using index
2 DEPENDENT SUBQUERY f2 eq_ref PRIMARY PRIMARY 4 func 1 Using where
3 DEPENDENT SUBQUERY f3 index PRIMARY dateof 4 NULL 18369 Using where; Using index
18031 rows: duration: 8.406 sec. Fetch: 219.281 sec.
18031行:持续时间:8.406秒。取:219.281秒。
I recalled reading somewhere about advantages of MyISAM over InnoDB, in some cases. So I changed the to MyISAM:
在某些情况下,我回想起在某处了解MyISAM优于InnoDB的优势。所以我改为MyISAM:
ALTER TABLE `foo` ENGINE = MyISAM;
18031 rows: duration: 5.671 sec. Fetch: 151.610 sec.
18031行:持续时间:5.671秒。获取:151.610秒。
Sure it's better but still slow.
当然它更好,但仍然很慢。
I tried with another algorithm (solution 2):
我尝试了另一种算法(解决方案2):
SELECT
f.id,
DATE_FORMAT(f.dateof, '%b %e, %Y') AS date,
(SELECT DATEDIFF(f.dateof, f2.dateof)
FROM foo f2
WHERE f2.id < f.id
ORDER BY f2.id DESC
LIMIT 1
) AS days_diff
FROM foo f;
...but it was even slower:
......但它甚至更慢:
18031 rows: duration: 15.609 sec. Fetch: 184.656 sec.
18031行:持续时间:15.609秒。获取:184.656秒。
Are there any other ways to optimize this query or data structure in order to have this task performed faster?
有没有其他方法来优化此查询或数据结构,以便更快地执行此任务?
1 个解决方案
#1
5
It is no surprise that your approaches are very slow even for a moderately sized table.
即使对于中等大小的桌子,您的方法也非常慢,这并不奇怪。
It should theoretically be possible to calculate the result in O(n) time using the LAG
analytical function, which unfortunately is not supported in MySQL. However you can emulate LAG
in MySQL using variables:
理论上应该可以使用LAG分析函数在O(n)时间内计算结果,遗憾的是MySQL不支持。但是,您可以使用变量在MySQL中模拟LAG:
SELECT
id,
DATE_FORMAT(f.dateof, '%b %e, %Y') AS date,
DATEDIFF(dateof, @prev) AS days_diff,
@prev := dateof
FROM FOO, (SELECT @prev := NULL) AS vars
ORDER BY id
This should be several orders of magnitude faster than what you are trying to do.
这应该比您尝试的速度快几个数量级。
#1
5
It is no surprise that your approaches are very slow even for a moderately sized table.
即使对于中等大小的桌子,您的方法也非常慢,这并不奇怪。
It should theoretically be possible to calculate the result in O(n) time using the LAG
analytical function, which unfortunately is not supported in MySQL. However you can emulate LAG
in MySQL using variables:
理论上应该可以使用LAG分析函数在O(n)时间内计算结果,遗憾的是MySQL不支持。但是,您可以使用变量在MySQL中模拟LAG:
SELECT
id,
DATE_FORMAT(f.dateof, '%b %e, %Y') AS date,
DATEDIFF(dateof, @prev) AS days_diff,
@prev := dateof
FROM FOO, (SELECT @prev := NULL) AS vars
ORDER BY id
This should be several orders of magnitude faster than what you are trying to do.
这应该比您尝试的速度快几个数量级。