为什么MySQL查询优化器会选择聚集主索引上的二级索引?

时间:2021-10-08 00:09:25

Why does Mysql optimizer choose the secondary index when doing a 'select * from lookup' with no order by clause.

为什么Mysql优化器在执行'select * from lookup'而没有order by子句时选择二级索引。

Is it just a fluke or is this a behind the scenes optimization that assumes since you added a secondary index its more important than the primary key.

它只是一个侥幸,或者这是一个幕后优化,假设你添加了一个二级索引,它比主键更重要。

I would expect the results to be ordered by primary key as a scan of all the leaf nodes can provide all the data necessary to answer this query.

我希望结果按主键排序,因为扫描所有叶节点可以提供回答此查询所需的所有数据。

To reproduce I create a simple key/value pair table (note not auto_increment)

为了重现我创建一个简单的键/值对表(注意不是auto_increment)

create table lookup (
id int not null,
primary key (id),
name varchar(25),
unique k_name (name)
) engine=innodb;

Insert some data in random non-alphabetical order

以随机非字母顺序插入一些数据

insert into lookup values(1, "Zebra"),(2, "Aardvark"),(3, "Fish"),(4,"Dog"),(5,"Cat"),(6,"Mouse");

Query the data (this is where I would expect the data to be returned in order of primary key)

查询数据(这是我希望以主键的顺序返回数据的地方)

mysql> select * from lookup;
+----+----------+
| id | name     |
+----+----------+
|  2 | Aardvark |
|  5 | Cat      |
|  4 | Dog      |
|  3 | Fish     |
|  6 | Mouse    |
|  1 | Zebra    |
+----+----------+
6 rows in set (0.00 sec)

Where as it is not - it appears that a scan of the k_name leaf nodes has been done. Shown here

但事实并非如此 - 似乎已经完成了对k_name叶节点的扫描。这里显示

mysql> explain select * from lookup;
+----+-------------+--------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys | key    | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+---------------+--------+---------+------+------+-------------+
|  1 | SIMPLE      | lookup | index | NULL          | k_name | 28      | NULL |    6 | Using index |
+----+-------------+--------+-------+---------------+--------+---------+------+------+-------------+
1 row in set (0.00 sec)

To me this says Mysql is using k_name as a covering index to return the data. If I drop the k_name index then data is returned in primary key order. If I add another un-indexed column data is returned in primary key order.

对我来说,这说Mysql使用k_name作为覆盖索引来返回数据。如果我删除k_name索引,则以主键顺序返回数据。如果我添加另一个未索引的列,则以主键顺序返回数据。

Some basic information about my setup.

有关我的设置的一些基本信息。

mysql> show table status like 'lookup'\G
*************************** 1. row ***************************
           Name: lookup
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 6
 Avg_row_length: 2730
    Data_length: 16384
Max_data_length: 0
   Index_length: 16384
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2011-11-15 10:42:35
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

 mysql> select version();
 +------------+
 | version()  |
 +------------+
 | 5.5.15-log |
 +------------+
 1 row in set (0.00 sec)

4 个解决方案

#1


4  

In reality, the clustered index (aka gen_clust_index) is populated in an order that has no rhyme or reason other than in rowid order. it is virtually impossible to order the rowids in id order.

实际上,聚集索引(aka gen_clust_index)的填充顺序除了以rowid顺序之外没有押韵或原因。几乎不可能按id顺序订购rowid。

In InnoDB, the records in nonclustered indexes (also called secondary indexes) contain the primary key columns for the row that are not in the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.

在InnoDB中,非聚簇索引(也称为二级索引)中的记录包含不在二级索引中的行的主键列。 InnoDB使用此主键值来搜索聚簇索引中的行。

The secondary index governs order. However, each secondary index entry has a primary key entry to the correct row. Also, think of the covering index scenario you mentioned for k_name.

二级索引管理订单。但是,每个辅助索引条目都有一个指向正确行的主键。另外,请考虑您为k_name提到的覆盖索引方案。

Now, let's switch gears for a moment and discusss the PRIMARY KEY and k_name:

现在,让我们暂时切换一下,讨论PRIMARY KEY和k_name:

QUESTION : Whose has more columns requested by your original query, the Primary Key or k_name ?

问题:原始查询,主键或k_name请求的列数更多?

ANSWER : k_name, because it has both name and id in it (id being internal because it is the PRIMARY KEY). The covering index k_name fulfills the query better than the primary key.

答案:k_name,因为它同时包含name和id(id是内部的,因为它是PRIMARY KEY)。覆盖索引k_name比主键更好地满足查询。

Now if the query was SELECT * FROM ORDER BY id, your EXPLAIN PLAN should look like this:

现在,如果查询是SELECT * FROM ORDER BY id,则EXPLAIN PLAN应如下所示:

mysql> explain select * from lookup order by id;
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------+
|  1 | SIMPLE      | lookup | index | NULL          | PRIMARY | 4       | NULL |    6 |       |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------+

1 row in set (0.00 sec)

Without specfiying order, the MySQL Query Optimizer picks the index that best fulfills your query. Of course, k_name has the unfair advantage because

如果没有特定的顺序,MySQL Query Optimizer会选择最能满足您查询的索引。当然,k_name具有不公平的优势,因为

  • every column in the table is individually indexed
  • 表中的每一列都是单独索引的
  • every column in the table is a Candidate Key
  • 表中的每一列都是候选键
  • k_name IS NOT A SECONDARY INDEX because it is a Candidate Key just like the PRIMARY KEY.
  • k_name不是SECONDARY INDEX,因为它是一个候选键,就像PRIMARY KEY一样。
  • user-defined clustered indexes cannot have the row order altered once established
  • 用户定义的聚簇索引一旦建立就不能更改行顺序

You cannot manipulate the order of the rows at all. Here is proof of that:

您根本无法操纵行的顺序。这是证明:

mysql> alter table lookup order by name;
Query OK, 6 rows affected, 1 warning (0.23 sec)
Records: 6  Duplicates: 0  Warnings: 1

mysql> show warnings;
+---------+------+-----------------------------------------------------------------------------------+
| Level   | Code | Message                                                                           |
+---------+------+-----------------------------------------------------------------------------------+
| Warning | 1105 | ORDER BY ignored as there is a user-defined clustered index in the table 'lookup' |
+---------+------+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> alter table lookup order by id;
Query OK, 6 rows affected, 1 warning (0.19 sec)
Records: 6  Duplicates: 0  Warnings: 1

mysql> show warnings;
+---------+------+-----------------------------------------------------------------------------------+
| Level   | Code | Message                                                                           |
+---------+------+-----------------------------------------------------------------------------------+
| Warning | 1105 | ORDER BY ignored as there is a user-defined clustered index in the table 'lookup' |
+---------+------+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

#2


1  

Well either index is equally efficient in terms of getting the data for that query, so I'm guessing the optimiser just dropped out with a "this'll do"

那么索引在获取该查询的数据方面同样有效,所以我猜测优化器刚刚退出了“这个会做”

Add another unique index, it might be as they are all equally efficient, some "FindBestIndex" routine drops out with the last one it read.

添加另一个唯一索引,可能是因为它们都同样有效,一些“FindBestIndex”例程会随着它读取的最后一个例程而退出。

It's not the behaviour I'd expect either though if I cared about the order, I'd add an order by id and them let the optimiser choose the primary key instead of going two pass and doing a sort.

这不是我期望的行为,但是如果我关心订单,我会通过id添加订单,他们让优化器选择主键而不是进行两次传递并进行排序。

#3


1  

It is because InnoDB secondary indexes also include the primary key column. Therefore MySQL is able to fetch all relevant data directly from the secondary index without touching the data rows and therefore it is saving disk IO.

这是因为InnoDB二级索引还包括主键列。因此,MySQL能够直接从二级索引获取所有相关数据,而不会触及数据行,因此可以节省磁盘IO。

References:

参考文献:

#4


0  

I think you didn't understand the type column. Type column 'index' means a full index scan. When this is the case and if the 'extra' column has 'using index', it means that mysql can get all the data required for the query from index, and needn't resort to the actual table rows. So here the engine, instead of going to the rows (which is costly usually) resorts to use the index which has all the data required by the query. Secondary indexes have the primary key (id, in your case) as the data. That is if you look up a key in the secondary index, you get the primary keys of the table records. Since you just asked for all the values, it's enough to iterate through the secondary index to get what you need.

我想你不明白类型栏。类型列'index'表示完整索引扫描。在这种情况下,如果'extra'列具有'using index',则意味着mysql可以从索引获取查询所需的所有数据,并且不需要求助于实际的表行。所以这里的引擎,而不是去行(通常是昂贵的),使用索引,而索引具有查询所需的所有数据。辅助索引具有主键(在您的情况下为id)作为数据。也就是说,如果您在辅助索引中查找某个键,则会获得表记录的主键。由于您刚刚询问了所有值,因此只需迭代二级索引即可获得所需的值。

If the engine chose to iterate over the primary key, the primary keys directly lead to the actual table rows. Mysql tries to avoid that behavior because it's usually inefficient. It's inefficient because usually rows contain more data than contained in the indexes and you potentially have to do more IO.

如果引擎选择迭代主键,则主键直接导致实际的表行。 Mysql试图避免这种行为,因为它通常效率低下。这是低效的,因为通常行包含的数据多于索引中包含的数据,并且您可能需要执行更多IO。

http://dev.mysql.com/doc/refman/5.0/en/explain-output.html

http://dev.mysql.com/doc/refman/5.0/en/explain-output.html

#1


4  

In reality, the clustered index (aka gen_clust_index) is populated in an order that has no rhyme or reason other than in rowid order. it is virtually impossible to order the rowids in id order.

实际上,聚集索引(aka gen_clust_index)的填充顺序除了以rowid顺序之外没有押韵或原因。几乎不可能按id顺序订购rowid。

In InnoDB, the records in nonclustered indexes (also called secondary indexes) contain the primary key columns for the row that are not in the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.

在InnoDB中,非聚簇索引(也称为二级索引)中的记录包含不在二级索引中的行的主键列。 InnoDB使用此主键值来搜索聚簇索引中的行。

The secondary index governs order. However, each secondary index entry has a primary key entry to the correct row. Also, think of the covering index scenario you mentioned for k_name.

二级索引管理订单。但是,每个辅助索引条目都有一个指向正确行的主键。另外,请考虑您为k_name提到的覆盖索引方案。

Now, let's switch gears for a moment and discusss the PRIMARY KEY and k_name:

现在,让我们暂时切换一下,讨论PRIMARY KEY和k_name:

QUESTION : Whose has more columns requested by your original query, the Primary Key or k_name ?

问题:原始查询,主键或k_name请求的列数更多?

ANSWER : k_name, because it has both name and id in it (id being internal because it is the PRIMARY KEY). The covering index k_name fulfills the query better than the primary key.

答案:k_name,因为它同时包含name和id(id是内部的,因为它是PRIMARY KEY)。覆盖索引k_name比主键更好地满足查询。

Now if the query was SELECT * FROM ORDER BY id, your EXPLAIN PLAN should look like this:

现在,如果查询是SELECT * FROM ORDER BY id,则EXPLAIN PLAN应如下所示:

mysql> explain select * from lookup order by id;
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------+
| id | select_type | table  | type  | possible_keys | key     | key_len | ref  | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------+
|  1 | SIMPLE      | lookup | index | NULL          | PRIMARY | 4       | NULL |    6 |       |
+----+-------------+--------+-------+---------------+---------+---------+------+------+-------+

1 row in set (0.00 sec)

Without specfiying order, the MySQL Query Optimizer picks the index that best fulfills your query. Of course, k_name has the unfair advantage because

如果没有特定的顺序,MySQL Query Optimizer会选择最能满足您查询的索引。当然,k_name具有不公平的优势,因为

  • every column in the table is individually indexed
  • 表中的每一列都是单独索引的
  • every column in the table is a Candidate Key
  • 表中的每一列都是候选键
  • k_name IS NOT A SECONDARY INDEX because it is a Candidate Key just like the PRIMARY KEY.
  • k_name不是SECONDARY INDEX,因为它是一个候选键,就像PRIMARY KEY一样。
  • user-defined clustered indexes cannot have the row order altered once established
  • 用户定义的聚簇索引一旦建立就不能更改行顺序

You cannot manipulate the order of the rows at all. Here is proof of that:

您根本无法操纵行的顺序。这是证明:

mysql> alter table lookup order by name;
Query OK, 6 rows affected, 1 warning (0.23 sec)
Records: 6  Duplicates: 0  Warnings: 1

mysql> show warnings;
+---------+------+-----------------------------------------------------------------------------------+
| Level   | Code | Message                                                                           |
+---------+------+-----------------------------------------------------------------------------------+
| Warning | 1105 | ORDER BY ignored as there is a user-defined clustered index in the table 'lookup' |
+---------+------+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> alter table lookup order by id;
Query OK, 6 rows affected, 1 warning (0.19 sec)
Records: 6  Duplicates: 0  Warnings: 1

mysql> show warnings;
+---------+------+-----------------------------------------------------------------------------------+
| Level   | Code | Message                                                                           |
+---------+------+-----------------------------------------------------------------------------------+
| Warning | 1105 | ORDER BY ignored as there is a user-defined clustered index in the table 'lookup' |
+---------+------+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

#2


1  

Well either index is equally efficient in terms of getting the data for that query, so I'm guessing the optimiser just dropped out with a "this'll do"

那么索引在获取该查询的数据方面同样有效,所以我猜测优化器刚刚退出了“这个会做”

Add another unique index, it might be as they are all equally efficient, some "FindBestIndex" routine drops out with the last one it read.

添加另一个唯一索引,可能是因为它们都同样有效,一些“FindBestIndex”例程会随着它读取的最后一个例程而退出。

It's not the behaviour I'd expect either though if I cared about the order, I'd add an order by id and them let the optimiser choose the primary key instead of going two pass and doing a sort.

这不是我期望的行为,但是如果我关心订单,我会通过id添加订单,他们让优化器选择主键而不是进行两次传递并进行排序。

#3


1  

It is because InnoDB secondary indexes also include the primary key column. Therefore MySQL is able to fetch all relevant data directly from the secondary index without touching the data rows and therefore it is saving disk IO.

这是因为InnoDB二级索引还包括主键列。因此,MySQL能够直接从二级索引获取所有相关数据,而不会触及数据行,因此可以节省磁盘IO。

References:

参考文献:

#4


0  

I think you didn't understand the type column. Type column 'index' means a full index scan. When this is the case and if the 'extra' column has 'using index', it means that mysql can get all the data required for the query from index, and needn't resort to the actual table rows. So here the engine, instead of going to the rows (which is costly usually) resorts to use the index which has all the data required by the query. Secondary indexes have the primary key (id, in your case) as the data. That is if you look up a key in the secondary index, you get the primary keys of the table records. Since you just asked for all the values, it's enough to iterate through the secondary index to get what you need.

我想你不明白类型栏。类型列'index'表示完整索引扫描。在这种情况下,如果'extra'列具有'using index',则意味着mysql可以从索引获取查询所需的所有数据,并且不需要求助于实际的表行。所以这里的引擎,而不是去行(通常是昂贵的),使用索引,而索引具有查询所需的所有数据。辅助索引具有主键(在您的情况下为id)作为数据。也就是说,如果您在辅助索引中查找某个键,则会获得表记录的主键。由于您刚刚询问了所有值,因此只需迭代二级索引即可获得所需的值。

If the engine chose to iterate over the primary key, the primary keys directly lead to the actual table rows. Mysql tries to avoid that behavior because it's usually inefficient. It's inefficient because usually rows contain more data than contained in the indexes and you potentially have to do more IO.

如果引擎选择迭代主键,则主键直接导致实际的表行。 Mysql试图避免这种行为,因为它通常效率低下。这是低效的,因为通常行包含的数据多于索引中包含的数据,并且您可能需要执行更多IO。

http://dev.mysql.com/doc/refman/5.0/en/explain-output.html

http://dev.mysql.com/doc/refman/5.0/en/explain-output.html