MySQL中多列索引字段的顺序重要吗

时间:2022-12-20 21:28:13

I know the importance of indexes and how order of joins can change performance. I've done a bunch of reading related to multi-column indexes and haven't found the answer to my question.

我知道索引的重要性,以及连接的顺序如何改变性能。我已经做了很多关于多列索引的阅读,但是还没有找到我问题的答案。

I'm curious if I do a multi-column index, if the order that they are specified matters at all. My guess is that it would not, and that the engine would treat them as a group, where ordering doesn't matter. But I wish to verify.

我很好奇如果我做一个多列索引,如果它们被指定的顺序有问题的话。我的猜测是,它不会,引擎会把它们当作一个群体来对待,而排序并不重要。但我想核实一下。

For example, from mysql's website (http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html)

例如,从mysql的网站(http://dev.mysql.com/doc/refman/5.0/en/multicolumn -indexes.html)

CREATE TABLE test (
    id         INT NOT NULL,
    last_name  CHAR(30) NOT NULL,
    first_name CHAR(30) NOT NULL,
    PRIMARY KEY (id),
    INDEX name (last_name,first_name)
);

Would there be any benifit in any cases where the following would be better, or is it equivalent?

在任何情况下,下面的情况会更好,或者是等价的,会有什么好处吗?

CREATE TABLE test (
    id         INT NOT NULL,
    last_name  CHAR(30) NOT NULL,
    first_name CHAR(30) NOT NULL,
    PRIMARY KEY (id),
    INDEX name (first_name,last_name)
);

Specificially:

Specificially:

INDEX name (last_name,first_name)

vs

vs

INDEX name (first_name,last_name)

3 个解决方案

#1


26  

When discussing multi-column indexes, I use an analogy to a telephone book. A telephone book is basically an index on last name, then first name. So the sort order is determined by which "column" is first. Searches fall into a few categories:

当讨论多列索引时,我把它比作电话簿。电话簿基本上是关于姓氏,然后是名字的索引。所以排序顺序是由哪个“列”优先决定的。搜索可分为以下几个类别:

  1. If you look up people whose last name is Smith, you can find them easily because the book is sorted by last name.

    如果你查一下姓史密斯的人,你会很容易找到他们,因为书是按姓排序的。

  2. If you look up people whose first name is John, the telephone book doesn't help because the Johns are scattered throughout the book. You have to scan the whole telephone book to find them all.

    如果你查一下名字叫约翰的人,电话簿不会有帮助,因为约翰在书中到处都是。你得把整个电话簿都翻一遍才能找到。

  3. If you look up people with a specific last name Smith and a specific first name John, the book helps because you find the Smiths sorted together, and within that group of Smiths, the Johns are also found in sorted order.

    如果你查一个姓史密斯的人和一个姓约翰的人,这本书会很有帮助,因为你会发现史密斯是按顺序排列的,在那组史密斯中,约翰也是按顺序排列的。

If you had a telephone book sorted by first name then by last name, the sorting of the book would assist you in the above cases #2 and #3, but not case #1.

如果你有一个按名字排序的电话簿,然后按姓氏排序,这本书的排序将帮助你在上面的情形2和3,但不是情形1。

That explains cases for looking up exact values, but what if you're looking up by ranges of values? Say you wanted to find all people whose first name is John and whose last name begins with 'S' (Smith, Saunders, Staunton, Sherman, etc.). The Johns are sorted under 'J' within each last name, but if you want all Johns for all last names starting with 'S', the Johns are not grouped together. They're scattered again, so you end up having to scan through all the names with last name starting with 'S'. Whereas if the telephone book were organized by first name then by last name, you'd find all the Johns together, then within the Johns, all the 'S' last names would be grouped together.

这就解释了查找精确值的情况,但是如果根据值的范围进行查找又会怎样呢?假设你想找到所有的人,他们的名字是约翰,他们的姓以S开头(史密斯、桑德斯、斯汤顿、谢尔曼等等)。Johns在每个姓氏中的“J”下排序,但是如果您想要以“S”开头的所有姓的“Johns”,那么Johns就不是放在一起的。它们又分散了,所以你不得不扫描所有以“S”开头的姓。如果电话簿是按名字排列的,然后按姓氏排列,你会发现所有的约翰都在一起,然后在约翰里面,所有的姓都在一起。

So the order of columns in a multi-column index definitely matters. One type of query may need a certain column order for the index. If you have several types of queries, you might need several indexes to help them, with columns in different orders.

所以多列索引中列的顺序肯定很重要。一种查询类型可能需要索引的某个列顺序。如果您有几种查询类型,您可能需要几个索引来帮助它们,以不同的顺序使用列。

You can read my presentation How to Design Indexes, Really for more information.

你可以阅读我的介绍如何设计索引,以获得更多信息。

#2


4  

The two indexes are different. This is true in MySQL and in other databases. MySQL does a pretty good job of explaining the different in the documentation.

这两个指标是不同的。在MySQL和其他数据库中也是如此。MySQL很好地解释了文档中的不同之处。

Consider the two indexes:

考虑到两个指标:

create index idx_lf on name(last_name, first_name);
create index idx_fl on name(first_name, last_name);

Both of these should work equally well on:

这两者在以下方面应该同样有效:

where last_name = XXX and first_name = YYY

idx_lf will be optimal for the following conditions:

idx_lf最适合下列条件:

where last_name = XXX
where last_name like 'X%'
where last_name = XXX and first_name like 'Y%'
where last_name = XXX order by first_name

idx_fl will be optimal for the following:

idx_fl将是以下方面的最佳选择:

where first_name = YYY
where first_name like 'Y%'
where first_name = YYY and last_name like 'X%'
where first_name = XXX order by last_name

For many of these cases, both indexes could possibly be used, but one is optimal. For instance, consider idx_lf with the query:

对于许多这样的情况,这两个索引都可以使用,但是一个是最优的。例如,用查询考虑idx_lf:

where first_name = XXX order by last_name

MySQL could read the entire table using idx_lf and then do the filtering after the order by. I don't think this is an optimization option in practice (for MySQL), but that can happen in other databases.

MySQL可以使用idx_lf读取整个表,然后在order by之后进行过滤。我不认为这在实践中是一种优化选项(对于MySQL),但这在其他数据库中也可能发生。

#3


2  

The general rule is that you want to put the most selective -- that is, the one that will give you fewest results -- first. So if you are creating a multiple-column index on a table with a status column of say 10 possible values, and also a dateAdded column, and you're typically writing queries like

一般的规则是,你要把最具选择性的——也就是能给你最少结果的——放在第一位。因此,如果您正在表上创建一个多列索引,其中的状态列有10个可能的值,还有一个dateadd列,那么您通常要编写这样的查询

SELECT * FROM myTable WHERE status='active' and dateAdded='2010-10-01'

...then you'd want dateAdded first, because that would limit the scan to just a few rows rather than 10% (or whatever proportion are 'active') of your rows.

…然后您需要先添加dateadd,因为这将限制扫描仅为几行,而不是10%(或任何“活动”的比例)。

This takes a fair bit of thought and tuning; you should check out the Lahdenmaki and Leach book.

这需要一些思考和调整;你应该看看《拉赫丹马克》和《过滤手册》。

#1


26  

When discussing multi-column indexes, I use an analogy to a telephone book. A telephone book is basically an index on last name, then first name. So the sort order is determined by which "column" is first. Searches fall into a few categories:

当讨论多列索引时,我把它比作电话簿。电话簿基本上是关于姓氏,然后是名字的索引。所以排序顺序是由哪个“列”优先决定的。搜索可分为以下几个类别:

  1. If you look up people whose last name is Smith, you can find them easily because the book is sorted by last name.

    如果你查一下姓史密斯的人,你会很容易找到他们,因为书是按姓排序的。

  2. If you look up people whose first name is John, the telephone book doesn't help because the Johns are scattered throughout the book. You have to scan the whole telephone book to find them all.

    如果你查一下名字叫约翰的人,电话簿不会有帮助,因为约翰在书中到处都是。你得把整个电话簿都翻一遍才能找到。

  3. If you look up people with a specific last name Smith and a specific first name John, the book helps because you find the Smiths sorted together, and within that group of Smiths, the Johns are also found in sorted order.

    如果你查一个姓史密斯的人和一个姓约翰的人,这本书会很有帮助,因为你会发现史密斯是按顺序排列的,在那组史密斯中,约翰也是按顺序排列的。

If you had a telephone book sorted by first name then by last name, the sorting of the book would assist you in the above cases #2 and #3, but not case #1.

如果你有一个按名字排序的电话簿,然后按姓氏排序,这本书的排序将帮助你在上面的情形2和3,但不是情形1。

That explains cases for looking up exact values, but what if you're looking up by ranges of values? Say you wanted to find all people whose first name is John and whose last name begins with 'S' (Smith, Saunders, Staunton, Sherman, etc.). The Johns are sorted under 'J' within each last name, but if you want all Johns for all last names starting with 'S', the Johns are not grouped together. They're scattered again, so you end up having to scan through all the names with last name starting with 'S'. Whereas if the telephone book were organized by first name then by last name, you'd find all the Johns together, then within the Johns, all the 'S' last names would be grouped together.

这就解释了查找精确值的情况,但是如果根据值的范围进行查找又会怎样呢?假设你想找到所有的人,他们的名字是约翰,他们的姓以S开头(史密斯、桑德斯、斯汤顿、谢尔曼等等)。Johns在每个姓氏中的“J”下排序,但是如果您想要以“S”开头的所有姓的“Johns”,那么Johns就不是放在一起的。它们又分散了,所以你不得不扫描所有以“S”开头的姓。如果电话簿是按名字排列的,然后按姓氏排列,你会发现所有的约翰都在一起,然后在约翰里面,所有的姓都在一起。

So the order of columns in a multi-column index definitely matters. One type of query may need a certain column order for the index. If you have several types of queries, you might need several indexes to help them, with columns in different orders.

所以多列索引中列的顺序肯定很重要。一种查询类型可能需要索引的某个列顺序。如果您有几种查询类型,您可能需要几个索引来帮助它们,以不同的顺序使用列。

You can read my presentation How to Design Indexes, Really for more information.

你可以阅读我的介绍如何设计索引,以获得更多信息。

#2


4  

The two indexes are different. This is true in MySQL and in other databases. MySQL does a pretty good job of explaining the different in the documentation.

这两个指标是不同的。在MySQL和其他数据库中也是如此。MySQL很好地解释了文档中的不同之处。

Consider the two indexes:

考虑到两个指标:

create index idx_lf on name(last_name, first_name);
create index idx_fl on name(first_name, last_name);

Both of these should work equally well on:

这两者在以下方面应该同样有效:

where last_name = XXX and first_name = YYY

idx_lf will be optimal for the following conditions:

idx_lf最适合下列条件:

where last_name = XXX
where last_name like 'X%'
where last_name = XXX and first_name like 'Y%'
where last_name = XXX order by first_name

idx_fl will be optimal for the following:

idx_fl将是以下方面的最佳选择:

where first_name = YYY
where first_name like 'Y%'
where first_name = YYY and last_name like 'X%'
where first_name = XXX order by last_name

For many of these cases, both indexes could possibly be used, but one is optimal. For instance, consider idx_lf with the query:

对于许多这样的情况,这两个索引都可以使用,但是一个是最优的。例如,用查询考虑idx_lf:

where first_name = XXX order by last_name

MySQL could read the entire table using idx_lf and then do the filtering after the order by. I don't think this is an optimization option in practice (for MySQL), but that can happen in other databases.

MySQL可以使用idx_lf读取整个表,然后在order by之后进行过滤。我不认为这在实践中是一种优化选项(对于MySQL),但这在其他数据库中也可能发生。

#3


2  

The general rule is that you want to put the most selective -- that is, the one that will give you fewest results -- first. So if you are creating a multiple-column index on a table with a status column of say 10 possible values, and also a dateAdded column, and you're typically writing queries like

一般的规则是,你要把最具选择性的——也就是能给你最少结果的——放在第一位。因此,如果您正在表上创建一个多列索引,其中的状态列有10个可能的值,还有一个dateadd列,那么您通常要编写这样的查询

SELECT * FROM myTable WHERE status='active' and dateAdded='2010-10-01'

...then you'd want dateAdded first, because that would limit the scan to just a few rows rather than 10% (or whatever proportion are 'active') of your rows.

…然后您需要先添加dateadd,因为这将限制扫描仅为几行,而不是10%(或任何“活动”的比例)。

This takes a fair bit of thought and tuning; you should check out the Lahdenmaki and Leach book.

这需要一些思考和调整;你应该看看《拉赫丹马克》和《过滤手册》。