为什么MySQL优化器不使用所有列索引?

时间:2022-04-05 00:09:57

Percona MySQL 5.7

Percona MySQL 5.7

table scheeme:

表scheeme:

CREATE TABLE Developer.Rate (
  ID bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
  TIME datetime NOT NULL,
  BASE varchar(3) NOT NULL,
  QUOTE varchar(3) NOT NULL,
  BID double NOT NULL,
  ASK double NOT NULL,
  PRIMARY KEY (ID),
  INDEX IDX_TIME (TIME),
  UNIQUE INDEX IDX_UK (BASE, QUOTE, TIME)
)
ENGINE = INNODB
ROW_FORMAT = COMPRESSED;

I try to make request for latests data before selected period. The optimazer use no-complete unique key, only 2 columns of 3.

我尝试在选定的时间段之前请求获取最新的数据。optimazer使用不完整的唯一键,只有2列3。

If I do request in common way:

如我以共同的方式提出要求:

EXPLAIN FORMAT=JSON
SELECT
  BID
FROM 
  Rate
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  `TIME` DESC 
LIMIT 1
;

"Explain" shows that only 2 first columns of index are used: BASE, QUOTE

“Explain”表示只使用了索引的前两列:BASE, QUOTE

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "10231052.40"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "Rate",
        "access_type": "ref",
        "possible_keys": [
          "IDX_UK",
          "IDX_TIME"
        ],
        "key": "IDX_UK",
        "used_key_parts": [
          "BASE",
          "QUOTE"
        ],
        "key_length": "22",
        "ref": [
          "const",
          "const"
        ],
        "rows_examined_per_scan": 45966462,
        "rows_produced_per_join": 22983231,
        "filtered": "50.00",
        "cost_info": {
          "read_cost": "1037760.00",
          "eval_cost": "4596646.20",
          "prefix_cost": "10231052.40",
          "data_read_per_join": "1G"
        },
        "used_columns": [
          "ID",
          "TIME",
          "BASE",
          "QUOTE",
          "BID"
        ],
        "attached_condition": "((`Developer`.`Rate`.`BASE` <=> 'EUR') and (`Developer`.`Rate`.`QUOTE` <=> 'USD') and (`Developer`.`Rate`.`TIME` <= <cache>((now() - interval 1 month))))"
      }
    }
  }
}

But if you force the optimizer to use IDX_UK, MySQL uses all 3 columns in the request:

但是如果您强迫优化器使用IDX_UK,那么MySQL将使用请求中的所有3列:

EXPLAIN FORMAT=JSON
SELECT
  BID
FROM 
  Rate FORCE INDEX(IDX_UK)
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  `TIME` DESC 
LIMIT 1

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "10231052.40"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "Rate",
        "access_type": "range",
        "possible_keys": [
          "IDX_UK"
        ],
        "key": "IDX_UK",
        "used_key_parts": [
          "BASE",
          "QUOTE",
          "TIME"
        ],
        "key_length": "27",
        "rows_examined_per_scan": 45966462,
        "rows_produced_per_join": 15320621,
        "filtered": "100.00",
        "index_condition": "((`Developer`.`Rate`.`BASE` = 'EUR') and (`Developer`.`Rate`.`QUOTE` = 'USD') and (`Developer`.`Rate`.`TIME` <= <cache>((now() - interval 1 month))))",
        "cost_info": {
          "read_cost": "1037760.00",
          "eval_cost": "3064124.31",
          "prefix_cost": "10231052.40",
          "data_read_per_join": "818M"
        },
        "used_columns": [
          "ID",
          "TIME",
          "BASE",
          "QUOTE",
          "BID"
        ]
      }
    }
  }
}

Why the optimizer don't use all 3 columns without explicit declaration of index?

为什么优化器不使用所有3列而没有显式的索引声明?

Added:

补充道:

A'm I understanding right, I should to use request like this?

A我理解对了吗?我应该使用这样的请求吗?

Reuest example:

活的例子:

EXPLAIN FORMAT=JSON
SELECT
  BID
FROM 
  Rate
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  BASE DESC, QUOTE DESC, TIME DESC
LIMIT 1

If I understand it right, the output of Explain vouldn't be better. There are still only 2 columns are used without TIME

如果我理解正确的话,解释的输出就不会更好。仍然只有2列没有时间使用

Explain Output

解释输出

{ "query_block": { "select_id": 1, "cost_info": { "query_cost": "10384642.20" }, "ordering_operation": { "using_filesort": false, "table": { "table_name": "Rate", "access_type": "ref", "possible_keys": [ "IDX_UK", "IDX_TIME" ], "key": "IDX_UK", "used_key_parts": [ "BASE", "QUOTE" ], "key_length": "22", "ref": [ "const", "const" ], "rows_examined_per_scan": 46734411, "rows_produced_per_join": 23367205, "filtered": "50.00", "index_condition": "((Developer.Rate.BASE <=> 'EUR') and (Developer.Rate.QUOTE <=> 'USD') and (Developer.Rate.TIME <= ((now() - interval 1 month))))", "cost_info": { "read_cost": "1037760.00", "eval_cost": "4673441.10", "prefix_cost": "10384642.20", "data_read_per_join": "1G" }, "used_columns": [ "ID", "TIME", "BASE", "QUOTE", "BID" ] } } } }


" {" " per_7_block ": "select_id": 1, " _cost_info ": {"query_cost": "10384642.20"}, "ordering_operation": {"using_filesort": false, "table": " table_example ": "Rate", "access_type": " _ref "基础<=> 'EUR')和(Developer.Rate)。报价<=> 'USD')和(Developer.Rate)。时间<=(现在(现在()-间隔1个月))”、“cost_info”:{“read_cost”:“1037760.00”,“eval_cost”:“4673441.10”,“prefix_cost”:“10384642.20”,“data_read_per_join”:“1 g”},“used_columns”:[“ID”、“时间”、“基地”,“引用”、“收购”)} } } }


Added 2:

补充2:

I made these 4 requests:

我提出了以下4个要求:

— 1 —

- 1 -


<code>FLUSH STATUS;
SELECT
  BID
FROM 
  Rate
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>

— 2 —

- 2

<code>FLUSH STATUS;
SELECT
  BID
FROM 
  Rate FORCE INDEX (IDX_UK)
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';
</code>

— 3 —

- 3 -

<code>FLUSH STATUS;
SELECT
  BID
FROM 
  Rate
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  `TIME` DESC 
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>

— 4 —

- 4 -

<code>
FLUSH STATUS;
SELECT
  BID
FROM 
  Rate FORCE INDEX (IDX_UK)
WHERE 
  BASE = 'EUR' 
  AND QUOTE = 'USD' 
  AND `TIME` <= (NOW() - INTERVAL 1 MONTH) 
ORDER BY 
  `TIME` DESC 
LIMIT 1;
SHOW SESSION STATUS LIKE 'Handler%';</code>

The output of session_status is the same in all requests except request 3. In output of request 3: Handler_read_prev = 486474; In output of all ather requests: Handler_read_prev = 0;

在所有请求中,session_status的输出都是相同的,除了请求3。在请求的输出中:Handler_read_prev = 486474;在所有通气请求的输出中:Handler_read_prev = 0;

为什么MySQL优化器不使用所有列索引?

Added 3:

添加3:

I made a copy of the table, removed Id field, promoted UNIQUE key as PRIMARY.

我复制了表,删除了Id字段,将唯一键提升为主键。

The scheme:

的方案:

CREATE TABLE Developer.Rate2 (
  TIME datetime NOT NULL,
  BASE varchar(3) NOT NULL,
  QUOTE varchar(3) NOT NULL,
  BID double NOT NULL,
  ASK double NOT NULL,
  PRIMARY KEY (BASE, QUOTE, TIME),
  INDEX IDX_BID_ASK (BID, ASK)
)
ENGINE = INNODB
AVG_ROW_LENGTH = 26
CHARACTER SET utf8
COLLATE utf8_general_ci
ROW_FORMAT = COMPRESSED;

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "9673452.20"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "Rate2",
        "access_type": "range",
        "possible_keys": [
          "PRIMARY"
        ],
        "key": "PRIMARY",
        "used_key_parts": [
          "BASE",
          "QUOTE",
          "TIME"
        ],
        "key_length": "27",
        "rows_examined_per_scan": 48023345,
        "rows_produced_per_join": 16006180,
        "filtered": "100.00",
        "cost_info": {
          "read_cost": "68783.20",
          "eval_cost": "3201236.12",
          "prefix_cost": "9673452.20",
          "data_read_per_join": "732M"
        },
        "used_columns": [
          "TIME",
          "BASE",
          "QUOTE",
          "BID"
        ],
        "attached_condition": "((`Developer`.`Rate2`.`BASE` = 'EUR') and (`Developer`.`Rate2`.`QUOTE` = 'USD') and (`Developer`.`Rate2`.`TIME` <= <cache>((now() - interval 1 month))))"
      }
    }
  }
}

Now the request really works and Explain shows all 3 columns are used. This variant works.

现在这个请求确实有效,Explain显示了所有3列都被使用。这种变体。

2 个解决方案

#1


1  

Get rid of ID, it is of no use. Promote your UNIQUE key to be PRIMARY. Now, magically, the query will be faster, and the Question you posed will become moot. (You may also need the DESC trick that lorraine suggested.)

去掉ID就没用了。提升你独特的钥匙为主要。现在,神奇的是,查询将会更快,您提出的问题将变得毫无意义。(你也可能需要洛林建议的DESC技巧。)

Here's another technique to compare performance:

下面是另一种比较性能的技术:

FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';

I would be interested to see the output from the SHOW for with and without the DESC trick. And with/without the FORCE INDEX you alluded to.

我很想看看这个节目的输出有和没有DESC技巧。没有你提到的力指数。

Why faster? Your query was using a secondary index, but it needed bid, which was not 'covered' by the index. To get bid, the PRIMARY KEY needed to be drilled down in the 'data'. By changing it so that the PK is used, this extra drill-down is obviated.

为什么快?您的查询使用的是辅助索引,但它需要投标,而索引没有“覆盖”该投标。为了获得投标,需要在“数据”中钻取主键。通过改变它,使PK被使用,这个额外的钻取被排除。

#2


1  

The behavior you describe (ref access instead of range access over more columns) reminds me of Bug#81341 and Bug#87613. These bugs were fixed in MySQL 5.7.17 and 5.7.21, respectively. Which version are you using?

您描述的行为(引用访问而不是更多列上的范围访问)使我想起了Bug#81341和Bug#87613。这些错误分别在MySQL 5.7.17和5.7.21中被修复。你用的是哪个版本?

#1


1  

Get rid of ID, it is of no use. Promote your UNIQUE key to be PRIMARY. Now, magically, the query will be faster, and the Question you posed will become moot. (You may also need the DESC trick that lorraine suggested.)

去掉ID就没用了。提升你独特的钥匙为主要。现在,神奇的是,查询将会更快,您提出的问题将变得毫无意义。(你也可能需要洛林建议的DESC技巧。)

Here's another technique to compare performance:

下面是另一种比较性能的技术:

FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';

I would be interested to see the output from the SHOW for with and without the DESC trick. And with/without the FORCE INDEX you alluded to.

我很想看看这个节目的输出有和没有DESC技巧。没有你提到的力指数。

Why faster? Your query was using a secondary index, but it needed bid, which was not 'covered' by the index. To get bid, the PRIMARY KEY needed to be drilled down in the 'data'. By changing it so that the PK is used, this extra drill-down is obviated.

为什么快?您的查询使用的是辅助索引,但它需要投标,而索引没有“覆盖”该投标。为了获得投标,需要在“数据”中钻取主键。通过改变它,使PK被使用,这个额外的钻取被排除。

#2


1  

The behavior you describe (ref access instead of range access over more columns) reminds me of Bug#81341 and Bug#87613. These bugs were fixed in MySQL 5.7.17 and 5.7.21, respectively. Which version are you using?

您描述的行为(引用访问而不是更多列上的范围访问)使我想起了Bug#81341和Bug#87613。这些错误分别在MySQL 5.7.17和5.7.21中被修复。你用的是哪个版本?