如何在SQL Server中选择聚簇索引?

时间:2021-09-21 02:47:55

Usually the clustered index is created in SQL Server Management Studio by setting the primary key, however my recent question about PK <-> clustered index (Meaning of Primary Key to Microsoft SQL Server 2008) has shown that it is not necessary to set PK and clustered index to be equal.

通常,通过设置主键在SQL Server Management Studio中创建聚簇索引,但是我最近关于PK < - >聚簇索引的问题(Microsoft SQL Server 2008的主键的含义)表明没有必要设置PK和聚集索引是相等的。

So how should we choose clustered indexes then? Let's have the following example:

那么我们应该如何选择聚簇索引呢?我们有以下示例:

create table Customers (ID int, ...) create table Orders (ID int, CustomerID int)

create table Customers(ID int,...)create table Orders(ID int,CustomerID int)

We would usually create the PK/CI on both ID columns but i thought about creating it for Orders in CustomerID. Is that the best choice?

我们通常会在两个ID列上创建PK / CI,但我想为CustomerID中的Orders创建它。那是最好的选择吗?

3 个解决方案

#1


11  

According to The Queen Of Indexing - Kimberly Tripp - what she looks for in a clustered index is primarily:

根据索引女王 - 金佰利特里普 - 她在聚集索引中寻找的主要是:

  • Unique
  • 独特
  • Narrow
  • 狭窄
  • Static
  • 静态的

And if you can also guarantee:

如果你还能保证:

  • Ever-increasing pattern
  • 不断增加的模式

then you're pretty close to having your ideal clustering key!

然后你非常接近拥有理想的聚类键!

Check out her entire blog post here, and another really interesting one about clustering key impacts on table operations here: The Clustered Index Debate Continues.

查看她在这里的整个博客文章,以及另一个非常有趣的关于集群对表操作的关键影响:集群索引辩论继续。

Anything like an INT (esp. an INT IDENTITY) or possibly an INT and a DATETIME are ideal candiates. For other reasons, GUID's aren't good candidates at all - so you might have a GUID as your PK, but don't cluster your table on it - it'll be fragmented beyond recognition and performance will suffer.

任何类似INT(特别是INT IDENTITY)或可能是INT和DATETIME的东西都是理想的候选者。由于其他原因,GUID根本不是好的候选者 - 所以你可能有一个GUID作为你的PK,但不要将你的表聚集在它上面 - 它将被分割得无法识别并且性能会受到影响。

#2


6  

A best candidate for a CLUSTERED index is the key you use to refer to your records most often.

CLUSTERED索引的最佳候选者是您用来最常引用记录的关键。

Usually, this is a PRIMARY KEY, since it's what used in searches and/or FOREIGN KEY relationships.

通常,这是一个PRIMARY KEY,因为它是在搜索和/或FOREIGN KEY关系中使用的。

In your case, Orders.ID will most probably participate in the searches and references, so it is the best candidate for being a clustering expression.

在您的情况下,Orders.ID很可能会参与搜索和引用,因此它是作为聚类表达式的最佳候选者。

If you create the CLUSTERED index on Orders.CustomerID, the following things will happen:

如果在Orders.CustomerID上创建CLUSTERED索引,将发生以下情况:

  1. CustomerID is not unique. To ensure uniqueness, a special hidden 32-bit column known as uniquifier will be added to each record.

    CustomerID不是唯一的。为了确保唯一性,将在每条记录中添加一个称为uniquifier的特殊隐藏32位列。

  2. Records in the table will be stored according to this pair of columns (CustomerID, uniquifier).

    表中的记录将根据这对列(CustomerID,uniquifier)进行存储。

  3. A secondary index on Order.ID will be created, with (CustomerID, uniquifier) as the record pointers.

    将创建Order.ID上的辅助索引,其中(CustomerID,uniquifier)作为记录指针。

  4. Queries like this:

    像这样的查询:

    SELECT  *
    FROM    Orders
    WHERE   ID = 1234567
    

    will have to do an external operation, a Clustered Seek, since not all columns are stored in the index on ID. To retrieve all columns, the record should first be located in the clustered table.

    将不得不进行外部操作,Clustered Seek,因为并非所有列都存储在ID上的索引中。要检索所有列,记录应首先位于群集表中。

This additional operation requires IndexDepth as many page reads as a simple Clustered Seek, the IndexDepth beign O(log(n)) of total number of the records in your table.

此附加操作需要IndexDepth与简单的Clustered Seek一样多的页面读取,IndexDepth beign O(log(n))表中记录的总数。

#3


1  

If you're concerned about clustering it's usually to help improve data retrieval. In you example, you're probably going to want all records for a given customer at once. Clustering on customerID will keep those rows on the same physical page rather than scattered throughout multiple pages in your file.

如果您担心群集,通常是为了帮助改进数据检索。在您的示例中,您可能希望立即获得给定客户的所有记录。对customerID进行群集会将这些行保留在同一物理页面上,而不是分散在文件中的多个页面中。

ROT: Cluster on what you want to show a collection of. Line items in a purchase order is the classic example.

ROT:集群在你想要展示的集合上。采购订单中的行项目是典型示例。

#1


11  

According to The Queen Of Indexing - Kimberly Tripp - what she looks for in a clustered index is primarily:

根据索引女王 - 金佰利特里普 - 她在聚集索引中寻找的主要是:

  • Unique
  • 独特
  • Narrow
  • 狭窄
  • Static
  • 静态的

And if you can also guarantee:

如果你还能保证:

  • Ever-increasing pattern
  • 不断增加的模式

then you're pretty close to having your ideal clustering key!

然后你非常接近拥有理想的聚类键!

Check out her entire blog post here, and another really interesting one about clustering key impacts on table operations here: The Clustered Index Debate Continues.

查看她在这里的整个博客文章,以及另一个非常有趣的关于集群对表操作的关键影响:集群索引辩论继续。

Anything like an INT (esp. an INT IDENTITY) or possibly an INT and a DATETIME are ideal candiates. For other reasons, GUID's aren't good candidates at all - so you might have a GUID as your PK, but don't cluster your table on it - it'll be fragmented beyond recognition and performance will suffer.

任何类似INT(特别是INT IDENTITY)或可能是INT和DATETIME的东西都是理想的候选者。由于其他原因,GUID根本不是好的候选者 - 所以你可能有一个GUID作为你的PK,但不要将你的表聚集在它上面 - 它将被分割得无法识别并且性能会受到影响。

#2


6  

A best candidate for a CLUSTERED index is the key you use to refer to your records most often.

CLUSTERED索引的最佳候选者是您用来最常引用记录的关键。

Usually, this is a PRIMARY KEY, since it's what used in searches and/or FOREIGN KEY relationships.

通常,这是一个PRIMARY KEY,因为它是在搜索和/或FOREIGN KEY关系中使用的。

In your case, Orders.ID will most probably participate in the searches and references, so it is the best candidate for being a clustering expression.

在您的情况下,Orders.ID很可能会参与搜索和引用,因此它是作为聚类表达式的最佳候选者。

If you create the CLUSTERED index on Orders.CustomerID, the following things will happen:

如果在Orders.CustomerID上创建CLUSTERED索引,将发生以下情况:

  1. CustomerID is not unique. To ensure uniqueness, a special hidden 32-bit column known as uniquifier will be added to each record.

    CustomerID不是唯一的。为了确保唯一性,将在每条记录中添加一个称为uniquifier的特殊隐藏32位列。

  2. Records in the table will be stored according to this pair of columns (CustomerID, uniquifier).

    表中的记录将根据这对列(CustomerID,uniquifier)进行存储。

  3. A secondary index on Order.ID will be created, with (CustomerID, uniquifier) as the record pointers.

    将创建Order.ID上的辅助索引,其中(CustomerID,uniquifier)作为记录指针。

  4. Queries like this:

    像这样的查询:

    SELECT  *
    FROM    Orders
    WHERE   ID = 1234567
    

    will have to do an external operation, a Clustered Seek, since not all columns are stored in the index on ID. To retrieve all columns, the record should first be located in the clustered table.

    将不得不进行外部操作,Clustered Seek,因为并非所有列都存储在ID上的索引中。要检索所有列,记录应首先位于群集表中。

This additional operation requires IndexDepth as many page reads as a simple Clustered Seek, the IndexDepth beign O(log(n)) of total number of the records in your table.

此附加操作需要IndexDepth与简单的Clustered Seek一样多的页面读取,IndexDepth beign O(log(n))表中记录的总数。

#3


1  

If you're concerned about clustering it's usually to help improve data retrieval. In you example, you're probably going to want all records for a given customer at once. Clustering on customerID will keep those rows on the same physical page rather than scattered throughout multiple pages in your file.

如果您担心群集,通常是为了帮助改进数据检索。在您的示例中,您可能希望立即获得给定客户的所有记录。对customerID进行群集会将这些行保留在同一物理页面上,而不是分散在文件中的多个页面中。

ROT: Cluster on what you want to show a collection of. Line items in a purchase order is the classic example.

ROT:集群在你想要展示的集合上。采购订单中的行项目是典型示例。