SQL Server集群索引——索引问题的顺序

I have a table like so:

我有一张这样的桌子:

keyA keyB data

keyA and keyB together are unique, are the primary key of my table and make up a clustered index.

keyA和keyB在一起是惟一的，它们是表的主键并组成集群索引。

There are 5 possible values of keyB but an unlimited number of possible values of keyA,. keyB generally increments.

keyB有5个可能的值，但是keyA有无限个可能的值。keyB一般增量。

For example, the following data can be ordered in 2 ways depending on which key column is ordered first:

例如，下面的数据可以按以下两种方式排序，这取决于首先要订购的键列:

keyA keyB data
A    1    X
B    1    X
A    3    X
B    3    X
A    5    X
B    5    X
A    7    X
B    7    X

或

keyA keyB data
A    1    X
A    3    X
A    5    X
A    7    X
B    1    X
B    3    X
B    5    X
B    7    X

Do I need to tell the clustered index which of the key columns has fewer possible values to allow it to order the data by that value first? Or does it not matter in terms of performance which is ordered first?

我是否需要告诉聚集索引哪些键列的可能值更少，以便让它先按该值对数据排序?还是说这与先排序的性能无关?

9 个解决方案

#1

You should order your composite clustered index with the most selective column first. This means the column with the most distinct values compared to total row count.

您应该先用最具选择性的列对组合群集索引进行排序。这意味着与总行计数相比，具有最不同值的列。

"B*TREE Indexes improve the performance of queries that select a small percentage of rows from a table." http://www.akadia.com/services/ora_index_selectivity.html?

“B*树索引提高了查询的性能，查询可以从表中选择很小百分比的行。”http://www.akadia.com/services/ora_index_selectivity.html?

This article is for Oracle, but still relevant.

本文是针对Oracle的，但仍然是相关的。

Also, if you have a query that runs constantly and returns few fields, you may consider creating a composite index that contains all the fields - it will not have to access the base table, but will instead pull data from the index.

此外，如果有一个查询持续运行且返回的字段很少，您可以考虑创建一个包含所有字段的复合索引——它不必访问基表，而是从索引中提取数据。

ligget78's comment on making sure to mention the first column in a composite index is important to remember.

ligget78关于确保在复合索引中提到第一列的评论是需要记住的。

#2

If you create an index (regardless clustered or not) with (keyA, keyB) then this is how values will be ordered, e.g. first keyA, then keyB (this is the second case in your question). If you want it the other way around, you need to specify (keyB, keyA).

如果您使用(keyA, keyB)创建一个索引(无论是否集群)，那么这就是将值排序的方式，例如，首先是keyA，然后是keyB(这是问题中的第二种情况)。如果您希望它反过来，您需要指定(keyB, keyA)。

It could matter performance-wise, depends on your query of course. For example, if you have (keyA, keyB) index and the query looks like WHERE keyB = ... (without mentioning keyA) then the index can't be utilized.

这可能会影响性能，当然这取决于您的查询。例如，如果您有索引(keyA, keyB)，查询看起来像keyB =…(不提keyA)那么索引就不能被利用。

#3

As others have said, the ordering is based on how you specify it in the index creation script (or PK constraint). One thing about clustered indexes though is that there is a lot to keep in mind.

正如其他人所说，排序是基于如何在索引创建脚本(或PK约束)中指定它。关于聚集索引，有一点需要记住。

You may get better overall performance by using your clustered index on something other than the PK. For example, if you are writing a financial system and reports are almost always based on date and time of an activity (all activity for the past year, etc.) then a clustered index on that date column might be better. As HLGEM says, sorting can also be affected by your selection of clustered index.

你可能获得更好的整体性能使用聚集索引在PK以外的东西。例如,如果您正在编写一个金融系统和报告几乎总是基于活动的日期和时间(所有活动在过去的一年里,等等),然后一个聚集索引日期列可能会更好。正如HLGEM所说，排序也会受到集群索引的选择的影响。

Clustered indexes can also affect inserts more than other indexes. If you have a high volume of inserts and your clustered index is on something like an IDENTITY column then there could be contention problems for that particular part of the disk since all of the new rows are being inserted into the same place.

集群索引也会比其他索引更影响插入。如果您有大量的插入，并且您的群集索引位于类似标识列的位置，那么由于所有的新行都被插入到相同的位置，因此可能会出现磁盘特定部分的争用问题。

For small look-up tables I always just put the clustered index on the PK. For high-impact tables though it's a good idea to spend the time thinking about (and testing) various possible clustered indexes before choosing the best one.

对于小的查找表，我总是把聚集索引放在PK上。对于高影响的表，在选择最好的索引之前，花时间考虑(和测试)各种可能的聚集索引是一个好主意。

#4

I believe that SQL Server orders it exactly the way you tell it. It assumes that you know best how to access your index.

我相信SQL Server是按照您所说的方式订购的。它假定您最了解如何访问索引。

In any case, I would say it's a good idea where possible to specify what you want exactly rather than hoping the database will figure it out.

在任何情况下，我都会说，最好尽可能准确地指定您想要的内容，而不是希望数据库能够解决它。

You can also try it both ways, run a bunch of representative queries and then compare the generated execution plans to determine which is best for you.

您也可以尝试这两种方式，运行一些有代表性的查询，然后比较生成的执行计划，以确定哪一种最适合您。

#5

Just in case this isn't obvious: the sort order of your index does not promise much about the the sort order of the results in a query.

以防这并不明显:索引的排序顺序对查询中结果的排序顺序没有多大的帮助。

In your queries, you must still add an

在查询中，仍然必须添加一个

ORDER BY KeyA, KeyB

或

ORDER BY KeyB, KeyA

The optimizer may be pleased to find the data already physically ordered in the index as desired and save some time, but every query that is supposed to deliver data in a particular order must have an ORDER BY clause at the end of it. Without an order by, SQL Server makes no promises with respect to the order of a recordset, or even that it will come back in the same order from query to query.

优化器可能会很高兴地发现索引中已经按需要对数据进行了物理排序，并节省了一些时间，但是每个应该以特定顺序交付数据的查询必须在索引的末尾有一个order BY子句。如果没有order by, SQL Server不会对记录集的顺序做出任何承诺，甚至不会在查询到查询时返回相同的顺序。

#6

The best thing you can do is to try both solutions and measure the execution time.

您所能做的最好的事情是尝试这两种解决方案并度量执行时间。

In my experience, index tuning is all but exact-science.

根据我的经验，索引调优几乎完全是一门科学。

Maybe having keyB before keyA in the index column order would be better

也许在索引列中keyA之前拥有keyB会更好

#7

You specify the columns in the order in which you would normally want them sorted in reports and queries.

您可以按照通常希望它们在报告和查询中排序的顺序指定列。

I would be wary of creating a multicolumn clustered index though. Depending on how wide this is, you could have a huge impact on the size of any other indexes you create because all non-clustered indexes contain the clustered index value in them. Also the rows have to be re-ordered if the values frequently change and it is my experience that non-surrogate keys tend to change more frequently. Therefore creating this as a clustered vice nonclustered index could be much more time consuming of server resources if you have values that are likely to change. I'm not saying you shouldn't do this as I don't know what type of data your columns actually contain (although I suspect they are more complex that A1, a2, etc); I'm saying you need to think about the ramifications of doing it. It would probably be a good idea to thoroughly read BOL about clustered vice nonclustered indexes before committing to doing this.

但是，我要小心地创建一个多色集群索引。根据它的宽度，您可以对创建的任何其他索引的大小产生巨大的影响，因为所有非聚集索引都包含聚集索引值。而且，如果值经常更改，则必须对行进行重新排序。根据我的经验，非代理键往往更改得更频繁。因此，如果您有可能更改的值，那么将其创建为集群副非集群索引将会耗费更多的服务器资源。我不是说你不应该这样做，因为我不知道你的列实际上包含什么类型的数据(尽管我怀疑它们比A1 a2等等更复杂);我是说，你需要考虑这样做的后果。在进行此操作之前，最好彻底阅读关于群集副非群集索引的BOL。

#8

Remember that the clustered index is the physical order in which the table is stored on disk.

记住，群集索引是表存储在磁盘上的物理顺序。

So if your clustered index is defined as ColA, ColB queries will be faster when order in the same order as your clustered index. If SQL has to order B,A it will require post execution sorting to achieve the correct order.

因此，如果您的集群索引被定义为ColA，那么当以与集群索引相同的顺序排序时，ColB查询将会更快。如果SQL必须要订购B，那么A需要执行后排序才能获得正确的顺序。

My suggestion is to add a second non-clustered index on B,A. Also depending on the size of your data column to INCLUDE(read included column) it to prevent the need for key lookups. That is, of course, provided that this table is not heavily inserted, as you always must balance query speed vs. write speed.

我的建议是在B a上添加第二个非聚集索引。还取决于要包含(读取包含的列)的数据列的大小，以防止需要进行键查找。当然，前提是这个表没有被大量插入，因为您必须始终平衡查询速度和写入速度。

Realistically, your clustered index should represent the order in which the data is most likely to be accessed as well as maintaining a delicate balance of insert\update IO cost. If your clustered index is such that you are constantly inserting into the middle of pages, you may suffer performance losses there.

实际上，您的聚集索引应该表示数据最有可能被访问的顺序，以及保持插入\更新IO成本的微妙平衡。如果您的集群索引经常插入到页面中间，那么您可能会在那里遭受性能损失。

Like others have said, without knowing the table length, column sizes, etc. there is no correct answer. Trial and error with a heavy dose of testing is your best bet.

正如其他人所说，不知道表的长度、列的大小等，就没有正确的答案。用大量的试验反复试验是你最好的选择。

#9

Yes you should suggest, normally query engine try to find out the best execution plan and the index to utilize, however sometime it is better to force query engine to use the specific index. There are some other consideration when planning for index as well as when utilizing the index in your query. for example, the column ordering in index, column ordering in where clause. you could refer following link to know about:

是的，您应该建议，查询引擎通常尝试找出最佳的执行计划和要使用的索引，但是有时最好强制查询引擎使用特定的索引。在规划索引时以及在查询中使用索引时，还需要考虑其他一些问题。例如，索引中的列排序，where子句中的列排序。你可参考以下连结了解:

http://ashishkhandelwal.arkutil.com/sql-server/quick-and-short-database-indexes/

Best Practices to use indexes
使用索引的最佳实践
How to get best performance form indexes
如何获得最佳性能指标
Clustered index Considerations
聚集索引的考虑
Nonclustered Indexes Considerations
非聚集索引的注意事项

I am sure this will help you when planning for index.

我相信这将有助于你规划索引。

#1