MySQL索引——最佳实践是什么?

I've been using indexes on my MySQL databases for a while now but never properly learnt about them. Generally I put an index on any fields that I will be searching or selecting using a WHERE clause but sometimes it doesn't seem so black and white.

我在MySQL数据库上使用索引已经有一段时间了，但是从来没有正确地了解过它们。一般来说，我在任何要搜索或使用WHERE子句进行选择的字段上放置一个索引，但有时它看起来并不是那么黑白分明。

What are the best practices for MySQL indexes?

MySQL索引的最佳实践是什么?

Example situations/dilemmas:

例子的情况/困境:

If a table has six columns and all of them are searchable, should I index all of them or none of them?

如果一个表有6个列，并且所有这些列都是可搜索的，那么我应该索引所有这些列吗?

。

What are the negative performance impacts of indexing?

索引的负面性能影响是什么?

。

If I have a VARCHAR 2500 column which is searchable from parts of my site, should I index it?

如果我有一个VARCHAR 2500列，可以从我的站点的部分搜索，我应该索引它吗?

7 个解决方案

#1

199

You should definitely spend some time reading up on indexing, there's a lot written about it, and it's important to understand what's going on.

你一定要花点时间阅读索引，有很多关于它的文章，了解正在发生的事情是很重要的。

Broadly speaking, and index imposes an ordering on the rows of a table.

广义地说，index对表的行强制执行排序。

For simplicity's sake, imagine a table is just a big CSV file. Whenever a row is inserted, it's inserted at the end. So the "natural" ordering of the table is just the order in which rows were inserted.

为了简单起见，假设一个表只是一个大的CSV文件。每当插入一行时，它都被插入到末尾。因此，表的“自然”排序只是插入行的顺序。

Imagine you've got that CSV file loaded up in a very rudimentary spreadsheet application. All this spreadsheet does is display the data, and numbers the rows in sequential order.

假设您已经在一个非常基本的电子表格应用程序中加载了CSV文件。所有这些电子表格所做的就是显示数据，并按顺序对行进行编号。

Now imagine that you need to find all the rows that has some value "M" in the third column. Given what you have available, you have only one option. You scan the table checking the value of the third column for each row. If you've got a lot of rows, this method (a "table scan") can take a long time!

现在假设您需要找到在第三列中有某个值“M”的所有行。考虑到你所拥有的，你只有一个选择。扫描该表，检查每一行第三列的值。如果您有很多行，这个方法(“表扫描”)可能需要很长时间!

Now imagine that in addition to this table, you've got an index. This particular index is the index of values in the third column. The index lists all of the values from the third column, in some meaningful order (say, alphabetically) and for each of them, provides a list of row numbers where that value appears.

现在想象一下，除了这张表，你还有一个索引。这个特定的索引是第三列中值的索引。索引以某种有意义的顺序(例如，按字母顺序)列出第三列中的所有值，并为每个值提供一个行号列表。

Now you have a good strategy for finding all the rows where the value of the third column is "M". For instance, you can perform a binary search! Whereas the table scan requires you to look N rows (where N is the number of rows), the binary search only requires that you look at log-n index entries, in the very worst case. Wow, that's sure a lot easier!

现在您有了一个很好的策略，可以找到第三列的值为“M”的所有行。例如，您可以执行二进制搜索!而表扫描要求您查看N行(其中N是行数)，而二进制搜索只要求您在最坏的情况下查看log-n索引项。哇，那当然容易多了!

Of course, if you have this index, and you're adding rows to the table (at the end, since that's how our conceptual table works), you need to update the index each and every time. So you do a little more work while you're writing new rows, but you save a ton of time when you're searching for something.

当然，如果您有这个索引，并且正在向表中添加行(在最后，因为这是我们的概念性表的工作方式)，您需要每次都更新索引。当你在写新行时，你会做更多的工作，但是当你在搜索的时候，你会节省大量的时间。

So, in general, indexing creates a tradeoff between read efficiency and write efficiency. With no indexes, inserts can be very fast -- the database engine just adds a row to the table. As you add indexes, the engine must update each index while performing the insert.

因此，一般来说，索引会在读取效率和写入效率之间进行权衡。没有索引，插入可以非常快——数据库引擎只向表添加一行。在添加索引时，引擎必须在执行插入时更新每个索引。

On the other hand, reads become a lot faster.

另一方面，阅读变得快得多。

Hopefully that covers your first two questions (as others have answered -- you need to find the right balance).

希望这能涵盖你的前两个问题(正如其他人回答的那样——你需要找到正确的平衡)。

Your third scenario is a little more complicated. If you're using LIKE, indexing engines will typically help with your read speed up to the first "%". In other words, if you're SELECTing WHERE column LIKE 'foo%bar%', the database will use the index to find all the rows where column starts with "foo", and then need to scan that intermediate rowset to find the subset that contains "bar". SELECT ... WHERE column LIKE '%bar%' can't use the index. I hope you can see why.

第三种情况要复杂一些。如果您正在使用LIKE，索引引擎通常会帮助您将读取速度提高到第一个“%”。换句话说，如果您选择“foo%bar%”之类的列，那么数据库将使用索引查找列以“foo”开头的所有行，然后需要扫描中间行集，以找到包含“bar”的子集。选择……如'%bar%'之类的列不能使用索引。我希望你能明白为什么。

Finally, you need to start thinking about indexes on more than one column. The concept is the same, and behaves similarly to the LIKE stuff -- essentialy, if you have an index on (a,b,c), the engine will continue using the index from left to right as best it can. So a search on column a might use the (a,b,c) index, as would one on (a,b). However, the engine would need to do a full table scan if you were searching WHERE b=5 AND c=1)

最后，您需要开始考虑多个列上的索引。这个概念是相同的，它的行为类似于类似的东西——本质上，如果你有一个索引(a,b,c)，引擎将会继续使用从左到右的索引。因此，在a列上搜索可能会使用(a,b,c)索引，就像在(a,b)上搜索一样。但是，如果搜索的是b=5和c=1，那么引擎需要进行全表扫描。

Hopefully this helps shed a little light, but I must reiterate that you're best off spending a few hours digging around for good articles that explain these things in depth. It's also a good idea to read your particular database server's documentation. The way indices are implemented and used by query planners can vary pretty widely.

希望这能帮助您了解一些情况，但是我必须重申，您最好花几个小时的时间寻找能够深入解释这些情况的好文章。阅读特定数据库服务器的文档也是一个好主意。查询计划人员执行和使用索引的方式有很大的差别。

#2

Check out presentations like More Mastering the Art of Indexing.

看看更多关于掌握索引艺术的演讲。

Update 12/2012: I have posted a new presentation of mine: How to Design Indexes, Really. I presented this in October 2012 at ZendCon in Santa Clara, and in December 2012 at Percona Live London.

更新12/2012:我发布了我的一个新报告:如何设计索引，真的。我在2012年10月在圣克拉拉的ZendCon和2012年12月在Percona Live London展示了这个。

Designing the best indexes is a process that has to match the queries you run in your app.

设计最佳索引是一个必须与应用程序中运行的查询匹配的过程。

It's hard to recommend any general-purpose rules about which columns are best to index, or whether you should index all columns, no columns, which indexes should span multiple columns, etc. It depends on the queries you need to run.

很难推荐任何通用规则，关于哪些列是最好索引的，或者是否应该索引所有列，没有列，哪些索引应该跨多个列，等等。这取决于您需要运行的查询。

Yes, there is some overhead so you shouldn't create indexes needlessly. But you should create the indexes that give benefit to the queries you need to run quickly. The overhead of an index is usually far outweighed by its benefit.

是的，存在一些开销，所以不应该不必要地创建索引。但是，您应该创建索引，使您需要快速运行的查询受益。指数的开销通常远远超过其收益。

For a column that is VARCHAR(2500), you probably want to use a FULLTEXT index or a prefix index:

对于VARCHAR(2500)列，您可能需要使用全文索引或前缀索引:

CREATE INDEX i ON SomeTable(longVarchar(100));

Note that a conventional index can't help if you're searching for words that may be in the middle of that long varchar. For that, use a fulltext index.

注意，如果你搜索的单词可能在长varchar中，传统的索引并没有帮助。为此，使用全文索引。

#3

I won't repeat some of the good advice in other answers, but will add:

我不会在其他答案中重复一些好的建议，但我会补充:

Compound Indices

复合指标

You can create compound indices - an index that includes multiple columns. MySQL can use these from left to right. So if you have:

您可以创建复合索引——包含多个列的索引。MySQL可以从左到右使用这些。所以如果你有:

Table A
Id
Name
Category
Age
Description

if you have a compound index that includes Name/Category/Age in that order, these WHERE clauses would use the index:

如果您有一个复合索引，其中包含按该顺序排列的名称/类别/年龄，则这些子句将使用索引:

WHERE Name='Eric' and Category='A'

WHERE Name='Eric' and Category='A' and Age > 18

but

但

WHERE Category='A' and Age > 18

would not use that index because everything has to be used from left to right.

不会使用那个索引，因为所有的东西都要从左到右使用。

Explain

解释

Use Explain / Explain Extended to understand what indices are available to MySQL and which one it actually selects. MySQL will only use ONE key per query.

使用Explain / Explain Extended可以理解MySQL可用的索引以及它实际选择的索引。MySQL每次查询只使用一个键。

EXPLAIN EXTENDED SELECT * from Table WHERE Something='ABC'

Slow Query Log

慢速查询日志

Turn on the slow query log to see which queries are running slow.

打开慢速查询日志，查看哪些查询运行缓慢。

Wide Columns

宽栏

If you have a wide column where MOST of the distinction happens in the first several characters, you can use only the first N characters in your index. Example: We have a ReferenceNumber column defined as varchar(255) but 97% of the cases, the reference number is 10 characters or less. I changed the index to only look at the first 10 characters and improved performance quite a bit.

如果您有一个很宽的列，其中大多数区别发生在前几个字符中，那么您只能在索引中使用前N个字符。示例:我们有一个定义为varchar(255)的ReferenceNumber列，但97%的情况下，引用号为10个字符或更少。我修改了索引，只看前10个字符，并提高了性能。

#4

If a table has six columns and all of them are searchable, should i index all of them or none of them

如果一个表有6列，并且所有列都是可搜索的，那么我应该索引所有列还是不索引它们

Are you searching on a field by field basis or are some searches using multiple fields? Which fields are most being searched on? What are the field types? (Index works better on INTs than on VARCHARs for example) Have you tried using EXPLAIN on the queries that are being run?

您是按字段进行搜索，还是使用多个字段进行搜索?搜索最多的字段是什么?字段类型是什么?(例如，索引在INTs上比在VARCHARs上工作得更好)您试过在正在运行的查询中使用EXPLAIN吗?

What are the negetive performance impacts of indexing

索引的潜在性能影响是什么

UPDATEs and INSERTs will be slower. There's also the extra storage space requirments, but that's usual unimportant these days.

更新和插入将会更慢。还有额外的存储空间需求，但这在现在通常并不重要。

If i have a VARCHAR 2500 column which is searchable from parts of my site, should i index it

如果我有一个VARCHAR 2500列，它可以从我的站点上搜索，我应该索引它。

No, unless it's UNIQUE (which means it's already indexed) or you only search for exact matches on that field (not using LIKE or mySQL's fulltext search).

不，除非它是唯一的(这意味着它已经被索引了)，或者你只在该字段上搜索精确的匹配(不使用LIKE或mySQL的fulltext搜索)。

Generally I put an index on any fields that i will be searching or selecting using a WHERE clause

通常，我将索引放在任何要使用WHERE子句搜索或选择的字段上

I'd normally index the fields that are the most queried, and then INTs/BOOLEANs/ENUMs rather that fields that are VARCHARS. Don't forget, often you need to create an index on combined fields, rather than an index on an individual field. Use EXPLAIN, and check the slow log.

我通常会索引查询次数最多的字段，然后是INTs/ boolean /ENUMs，而不是VARCHARS字段。不要忘记，通常需要在组合字段上创建索引，而不是在单个字段上创建索引。使用EXPLAIN，并检查慢速日志。

#5

Load Data Efficiently: Indexes speed up retrievals but slow down inserts and deletes, as well as updates of values in indexed columns. That is, indexes slow down most operations that involve writing. This occurs because writing a row requires writing not only the data row, it requires changes to any indexes as well. The more indexes a table has, the more changes need to be made, and the greater the average performance degradation. Most tables receive many reads and few writes, but for a table with a high percentage of writes, the cost of index updating might be significant.

有效地加载数据:索引加快检索速度，但减慢插入和删除，以及索引列中值的更新。也就是说，索引会减慢大多数涉及写入的操作。这是因为编写一行不仅需要编写数据行，还需要修改任何索引。一个表拥有的索引越多，需要做的更改就越多，平均性能下降的幅度就越大。大多数表接收很多读操作，很少有写操作，但是对于一个写操作百分比很高的表，索引更新的成本可能非常高。

Avoid Indexes: If you don’t need a particular index to help queries perform better, don’t create it.

避免索引:如果您不需要一个特定的索引来帮助查询执行得更好，不要创建它。

Disk Space: An index takes up disk space, and multiple indexes take up correspondingly more space. This might cause you to reach a table size limit more quickly than if there are no indexes. Avoid indexes wherever possible.

磁盘空间:索引占用磁盘空间，多个索引相应地占用更多的空间。这可能会使您比没有索引时更快地到达一个表大小限制。尽可能避免索引。

Takeaway: Don't over index

导读:不要过度指数

#6

In general, indices help speedup database search, having the disadvantage of using extra disk space and slowing INSERT / UPDATE / DELETE queries. Use EXPLAIN and read the results to find out when MySQL uses your indices.

一般来说，索引有助于加速数据库搜索，有使用额外磁盘空间和减慢插入/更新/删除查询的缺点。使用EXPLAIN并阅读结果，找出MySQL何时使用索引。

If a table has six columns and all of them are searchable, should i index all of them or none of them?

如果一个表有6列，并且所有列都是可搜索的，我应该索引所有列还是不索引它们?

Indexing all six columns isn't always the best practice.

索引所有六列并不总是最佳实践。

(a) Are you going to use any of those columns when searching for specific information?

(a)在搜寻特定资料时，你会使用其中任何一栏吗?

(b) What is the selectivity of those columns (how many distinct values are there stored, in comparison to the total amount of records on the table)?

(b)这些列的选择性是什么(与表上的记录总数相比，存储了多少个不同的值)?

MySQL uses a cost-based optimizer, which tries to find the "cheapest" path when performing a query. And fields with low selectivity aren't good candidates.

MySQL使用基于成本的优化器，它在执行查询时试图找到“最便宜”的路径。选择性低的字段不是很好的候选者。

What are the negetive performance impacts of indexing?

索引的新性能影响是什么?

Already answered: extra disk space, lower performance during insert - update - delete.

已经回答:额外的磁盘空间，较低的性能期间插入-更新-删除。

If i have a VARCHAR 2500 column which is searchable from parts of my site, should i index it?

如果我有一个VARCHAR 2500列，它可以从我的站点上搜索到，我应该索引它吗?

Try the FULLTEXT Index.

全文索引。

#7

1/2) Indexes speed up certain select operations but they slow down other operations like insert, update and deletes. It can be a fine balance.

1/2)索引加速某些select操作，但会减慢插入、更新和删除等其他操作。它可以是一个很好的平衡。

3) use a full text index or perhaps sphinx

3)使用全文索引或sphinx。

#1

199