MySQL日期差异迭代查询 - 简化查询或优化数据结构

时间:2022-05-06 01:27:39

By way of introduction...
I've run into this question: Difference between 2 adjacent fields - Date - PHP MYSQL and was trying to achieve the goal, i.e iterate through dates and get diff, with pure MySQL.
Another question there (Subtracting one row of data from another in SQL) helped me to understand how to make something similar with MySQL. It did not solve the problem, as the solutions are still depandant either on fixed values or on assumed order of data, but it did help me to understand the methodology.
There is one other question (How to get next/previous record in MySQL?) with answers describing how to get values from next/previous row. It's still dependand on some fixed values, but I learned how to use the technique.

通过介绍...我遇到了这个问题:2个相邻字段之间的差异 - 日期 - PHP MYSQL并试图实现目标,即迭代日期并获得差异,使用纯MySQL。那里的另一个问题(在SQL中从另一行中减去一行数据)帮助我理解了如何使用类似的东西。它没有解决问题,因为解决方案仍然对固定值或假定的数据顺序不利,但它确实帮助我理解了方法。还有一个问题(如何在MySQL中获取下一个/上一个记录?),其中的答案描述了如何从下一行/上一行获取值。它仍然依赖于某些固定值,但我学会了如何使用该技术。

Say I have this table foo:

说我有这个表foo:

CREATE TABLE `foo` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `dateof` date NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  id | dateof
-----+------------
   1 | 2012-01-01
   2 | 2012-01-02
  11 | 2012-01-04
  12 | 2012-01-01
  13 | 2012-01-02
  14 | 2012-01-09
 111 | 2012-01-01
 112 | 2012-01-01
 113 | 2012-01-01

There are two assumptions:

有两个假设:

  1. Primary key (id) ordered ascending and "holes" allowed.
  2. 主键(id)按升序排序,允许“孔”。
  3. Every date in dateof column is valid, in the meaning: no NULLs and no defaults (0000-00-00). I want to iterate through every row and calculate number of days passed with previous entry, to receive this:
  4. 列中dateof中的每个日期都有效,其含义为:no NULLs,no defaults(0000-00-00)。我想迭代每一行并计算上一次输入的天数,以便接收:
  id | date       | days_diff
-----+------------+-----------
   1 | 2012-01-01 |     0
   2 | 2012-01-02 |     1
  11 | 2012-01-04 |     2
  12 | 2012-01-01 |    -3
  13 | 2012-01-02 |     1
  14 | 2012-01-09 |     7
 111 | 2012-01-01 |    -8
 112 | 2012-01-01 |     0
 113 | 2012-01-01 |    30

With all I have learned I came to this solution (say solution 1, as there is another):

凭借我所学到的一切,我来到了这个解决方案(说解决方案1,因为还有另一个解决方案):

SELECT
    f.id,
    DATE_FORMAT(f.dateof, '%b %e, %Y') AS date,
    (SELECT DATEDIFF(f.dateof, f2.dateof)
        FROM foo f2
        WHERE f2.id = (
            SELECT MAX(f3.id) FROM foo f3 WHERE f3.id < f.id
        )
    ) AS days_diff
FROM foo f;

(example fiddle here: http://sqlfiddle.com/#!2/099fc/3).

(例如这里的小提琴:http://sqlfiddle.com/#!2 / 099fc / 3)。

This works like a charm... until there are just a couple of entries in db. It gets worse when more:

这就像一个魅力...直到db中只有几个条目。更多时候会变得更糟:

EXPLAIN:
id select_type        table type   possible_keys key     key_len ref    rows  Extra
1  PRIMARY            f     ALL    NULL          NULL    NULL    NULL   17221   
2  DEPENDENT SUBQUERY f2    eq_ref PRIMARY       PRIMARY 4       func   1     Using where
3  DEPENDENT SUBQUERY f3    index  PRIMARY       PRIMARY 4       NULL   17221 Using where; Using index

18031 rows: duration: 8.672 sec. Fetch: 228.515 sec.

18031行:持续时间:8.672秒。获取:228.515秒。

I thought of adding index on dateof column:

我想在dateof列上添加索引:

CREATE TABLE `foo` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `dateof` date DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `dateof` (`dateof`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

...and gained tiny improvement:

...并获得了微小的改进:

EXPLAIN:
id select_type        table type   possible_keys key     key_len ref  rows  Extra
1  PRIMARY            f     index  NULL          dateof  4       NULL 18369 Using index
2  DEPENDENT SUBQUERY f2    eq_ref PRIMARY       PRIMARY 4       func 1     Using where
3  DEPENDENT SUBQUERY f3    index  PRIMARY       dateof  4       NULL 18369 Using where; Using index

18031 rows: duration: 8.406 sec. Fetch: 219.281 sec.

18031行:持续时间:8.406秒。取:219.281秒。

I recalled reading somewhere about advantages of MyISAM over InnoDB, in some cases. So I changed the to MyISAM:

在某些情况下,我回想起在某处了解MyISAM优于InnoDB的优势。所以我改为MyISAM:

ALTER TABLE `foo` ENGINE = MyISAM;

18031 rows: duration: 5.671 sec. Fetch: 151.610 sec.

18031行:持续时间:5.671秒。获取:151.610秒。

Sure it's better but still slow.

当然它更好,但仍然很慢。

I tried with another algorithm (solution 2):

我尝试了另一种算法(解决方案2):

SELECT
  f.id,
  DATE_FORMAT(f.dateof, '%b %e, %Y') AS date,
  (SELECT DATEDIFF(f.dateof, f2.dateof)
    FROM foo f2
    WHERE f2.id < f.id
    ORDER BY f2.id DESC
    LIMIT 1
  ) AS days_diff
FROM foo f;

...but it was even slower:

......但它甚至更慢:

18031 rows: duration: 15.609 sec. Fetch: 184.656 sec.

18031行:持续时间:15.609秒。获取:184.656秒。


Are there any other ways to optimize this query or data structure in order to have this task performed faster?

有没有其他方法来优化此查询或数据结构,以便更快地执行此任务?

1 个解决方案

#1


5  

It is no surprise that your approaches are very slow even for a moderately sized table.

即使对于中等大小的桌子,您的方法也非常慢,这并不奇怪。

It should theoretically be possible to calculate the result in O(n) time using the LAG analytical function, which unfortunately is not supported in MySQL. However you can emulate LAG in MySQL using variables:

理论上应该可以使用LAG分析函数在O(n)时间内计算结果,遗憾的是MySQL不支持。但是,您可以使用变量在MySQL中模拟LAG:

SELECT
    id,
    DATE_FORMAT(f.dateof, '%b %e, %Y') AS date,
    DATEDIFF(dateof, @prev) AS days_diff,
    @prev := dateof
FROM FOO, (SELECT @prev := NULL) AS vars
ORDER BY id

This should be several orders of magnitude faster than what you are trying to do.

这应该比您尝试的速度快几个数量级。

#1


5  

It is no surprise that your approaches are very slow even for a moderately sized table.

即使对于中等大小的桌子,您的方法也非常慢,这并不奇怪。

It should theoretically be possible to calculate the result in O(n) time using the LAG analytical function, which unfortunately is not supported in MySQL. However you can emulate LAG in MySQL using variables:

理论上应该可以使用LAG分析函数在O(n)时间内计算结果,遗憾的是MySQL不支持。但是,您可以使用变量在MySQL中模拟LAG:

SELECT
    id,
    DATE_FORMAT(f.dateof, '%b %e, %Y') AS date,
    DATEDIFF(dateof, @prev) AS days_diff,
    @prev := dateof
FROM FOO, (SELECT @prev := NULL) AS vars
ORDER BY id

This should be several orders of magnitude faster than what you are trying to do.

这应该比您尝试的速度快几个数量级。