索引视图:如何选择聚集索引?

I'm going to do an indexed view, based on three tables with inner and outer joins between them (SQL Server 2005). I will run all kind of queries against this view. So, I wonder what is the best way to choose which index to be clustered. What are the criteria or is there any tools to help me around.

我将根据三个表(SQL Server 2005)，建立一个基于三个表的索引视图。我将对这个视图运行所有类型的查询。所以，我想知道什么是最好的方法来选择要聚集的索引。有什么标准或者有什么工具可以帮助我。

(Sorry if my question is dull, I don't have a lot of experience in designing databases).

(对不起，如果我的问题很无聊，我在设计数据库方面没有很多经验)。

Thanks in advance!

提前谢谢!

EDIT: I should make clarification here, that the tables I use in the view are with very intense use and any overhead I take for maintenance of the indexes, should be paid-off.

编辑:我应该在这里澄清一下，我在视图中使用的表使用得非常频繁，我为维护索引而承担的任何开销都应该付费。

3 个解决方案

#1

Since it's an index, you have to pick a column (or set of columns) which is guaranteed to be non-null and unique in all cases. That's the biggest and most stringent criteria - anything that might be NULL or duplicate is out of the question right from the get-go.

由于它是一个索引，所以您必须选择一个列(或一组列)，该列在所有情况下都保证为非空且唯一。这是最大、最严格的标准——任何可能为空或重复的内容从一开始就不存在。

Depending on the type of queries you'll be running on this indexed view, you might also want to see if you have any columns (e.g. a DATE or something) that you'll be running range queries against. That might make an interesting candidate for a clustering key.

根据您将在这个索引视图上运行的查询的类型，您可能还想看看是否有要运行范围查询的任何列(例如日期或其他)。这可能使集群键成为一个有趣的候选。

But the main thing is: your clustering key must be unique and non-null in any circumstance. And in my personal experience, to reduce index size (and thus increase the number of entries per page), I'd try to use as small a key as possible - a single INT is best, or a combination of two INTs - or possibly a GUID - but don't use VARCHAR(500) fields in your clustering key!

但是最重要的是:您的群集键在任何情况下都必须是唯一的和非空的。并以我个人的经验,减少索引大小(因此每个页面增加条目的数量),我尝试使用尽可能小的一个关键——一个INT是最好的,或两个整数的组合——或者可能是一个GUID——但不要在集群中使用VARCHAR(500)领域的关键!

UPDATE: to all those poster who keep telling us clustered indexes don't need to be unique - check out what the "Queen of Indexing", Kimberly Tripp, has to say on the topic:

更新:对于那些一直在告诉我们聚集索引的人来说，不需要是唯一的——看看“索引女王”，Kimberly Tripp，在这个话题上说:

Let's start with the key things that I look for in a clustering key:

让我们从我在聚类键中寻找的关键内容开始:
* Unique
* Narrow
* Static
Why Unique?
A clustering key should be unique because a clustering key (when one exists) is used as the lookup key from all non-clustered indexes. Take for example an index in the back of a book - if you need to find the data that an index entry points to - that entry (the index entry) must be unique otherwise, which index entry would be the one you're looking for? So, when you create the clustered index - it must be unique. But, SQL Server doesn't require that your clustering key is created on a unique column. You can create it on any column(s) you'd like. Internally, if the clustering key is not unique then SQL Server will “uniquify” it by adding a 4-byte integer to the data. So if the clustered index is created on something which is not unique then not only is there additional overhead at index creation, there's wasted disk space, additional costs on INSERTs and UPDATEs, and in SQL Server 2000, there's an added cost on a clustereD index rebuild (which because of the poor choice for the clustering key is now more likely).

为什么独特呢?群集键应该是唯一的，因为从所有非群集索引中使用群集键(当存在群集键时)作为查找键。例如，书后面的索引——如果您需要查找索引条目指向的数据——该条目(索引条目)必须是惟一的，否则，您要查找的是哪个索引条目?因此，当您创建群集索引时——它必须是唯一的。但是，SQL Server并不要求在唯一的列上创建集群密钥。您可以在您想要的任何列上创建它。在内部，如果群集键不是唯一的，那么SQL Server将通过向数据添加一个4字节的整数“uniquify”它。如果创建聚集索引在这并不是唯一的不仅有额外的开销在创建索引,浪费磁盘空间,额外的成本在插入和更新,在SQL Server 2000中,有一个额外的成本在一个聚集索引重建(因为穷人的选择集群关键是现在更有可能)。

Source: http://www.sqlskills.com/blogs/kimberly/post/Ever-increasing-clustering-key-the-Clustered-Index-Debateagain!.aspx

来源:http://www.sqlskills.com/blogs/kimberly/post/Ever-increasing-clustering-key-the-Clustered-Index-Debateagain ! . aspx

#2

The thumb rule: Select the columns which are you are probably going to use MOST in your queries as WHERE, GROUP etc. Those columns could be a good candidate for non-clustered indexes. Select a column (or a group of column) which would probably make your row unique, and that could be a good candidate for clustered index.

拇指规则:选择您可能会在查询中使用最多的列，如WHERE、GROUP等。这些列可能是非聚集索引的好候选。选择一个列(或一组列)，它可能使您的行具有惟一性，这可能是集群索引的一个很好的候选。

As mentioned by marc, a clustered index imposes a unique constraint, so it definately needed that the column you selct should not have any null and duplicate.

正如marc所提到的，集群索引施加了一个唯一的约束，因此必须确保您所选择的列不应该有任何null和duplicate。

#3

A clustered index does not have to be unique. The columns in it can even be nullable. For example, this will run without an error:

集群索引不必是唯一的。其中的列甚至可以为空。例如，这将运行没有错误:

create table  #test (col1 int identity, col2 int)
create clustered index ix_test on #test (col2)
insert into #test (col2) values (1)
insert into #test (col2) values (1) -- Duplicate in clustered index
insert into #test (col2) values (null)

A clustered index is part of the table structure on disk. As such, a clustered index uses no additional disk space.

集群索引是磁盘上表结构的一部分。因此，集群索引不使用额外的磁盘空间。

By default, SQL Server clusters on the primary key, which is usually a good choice. You can change that if you have intensive queries with a lot of table lookups. Changing which index is clustered can eliminate table lookups.

默认情况下，主键上的SQL Server集群，通常是一个不错的选择。如果您有密集的查询和大量的表查询，您可以更改它。改变该索引的集群可以消除表查找。

#1