SQL Server 2008非聚簇索引是否包含聚簇索引字段?

时间:2022-06-22 04:18:57

OK, I need this spelled out one more time. I've read the articles on-line, and I still haven't found a definitive answer.

好的,我需要再次拼出这个。我已经在线阅读了这些文章,但我还没有找到明确的答案。

In SQL Server 2008, I have a "core" table with about 50k records and lots of read activity that is used in the same way in all queries. This data is updated once a month, and read hundreds of times a second.

在SQL Server 2008中,我有一个“核心”表,其中包含大约50,000条记录和大量读取活动,在所有查询中以相同的方式使用。此数据每月更新一次,每秒读取数百次。

The data has a clustered index on the fields as they are frequently accessed. Let's say that the clustered index is:

数据在字段上具有聚簇索引,因为它们经常被访问。假设聚集索引是:

CLUSTERED INDEX

集群指数

Field1 int
Field2 int
Field3 int
Field4 int
Field5 int

Now, there is not a whole lot more data than that, so it would make sense to just put the extra couple of columns into "Included Columns", but SQL Server doesn't allow included columns on the Clustered Index.

现在,没有比这更多的数据,所以将额外的几列放入“包含的列”是有意义的,但SQL Server不允许在聚集索引上包含列。

So, we have a second index with essentially the same fields as the Clustered Index, with the other columns as "Included Columns". However, from what I've read, I believe this may be redundant?

因此,我们有第二个索引,其字段与Clustered Index基本相同,其他列为“Included Columns”。但是,从我读过的内容来看,我认为这可能是多余的?

COVERING INDEX (non-clustered)

覆盖索引(非群集)

Field1 int
Field2 int
Field3 int
Field4 int
Field5 int

INCLUDED COLUMNS

包括栏目

Field6 varchar(96)
Field7 varchar(96)

Does the non-clustered Index ALREADY have the columns from the clustered index defined in it?

非聚集索引ALREADY是否具有其中定义的聚簇索引的列?

If so, how could this second index be created with NO columns at all (besides what is already in the clustered index)? In other words, I'd like to say "This index is exactly the same as the clustered index... with a couple of Included Columns".

如果是这样,那么如何使用NO列创建第二个索引(除了已经在聚簇索引中的内容)?换句话说,我想说“这个索引与聚集索引完全相同......带有几个包含的列”。

Or, would it be better to just put ALL of the columns into the clustered index (including the two that don't identify the record)? The varchar columns do get updated more frequently (a few times a day instead of once a month), so I would have liked to keep them out of the clustered index, but I think that they are deep enough that they won't affect the index tree enough to cause any rebalancing when a change occurs.

或者,将所有列放入聚集索引(包括不识别记录的两个列)会更好吗? varchar列确实更频繁地更新(每天几次而不是每月一次),所以我希望将它们从聚集索引中删除,但我认为它们足够深,它们不会影响索引树足以在发生更改时导致任何重新平衡。

So, is there an efficient way to set up these indexes so that all of the columns of this table are available through the index without going back to the table?

那么,是否有一种有效的方法来设置这些索引,以便该表的所有列都可以通过索引获得而无需返回到表中?

4 个解决方案

#1


4  

A clustered index does not need includes. Includes means stored extra data at the lowest level of the index tree. This is the data in a clustered index. So you don't need an overlapping index

聚簇索引不需要包含。包括在索引树的最低级别存储额外数据的方法。这是聚簇索引中的数据。所以你不需要重叠索引

However if memory footprint is your concern, then you need to shrink the table. With 50k rows I would consider a smallint surrogate key starting at -32768. Then, you remove the overhead of the C key in every NC index. This means you can have a covering index as mentioned in your question.

但是,如果您关注内存占用,则需要缩小表。有50k行我会考虑从-32768开始的smallint代理键。然后,删除每个NC索引中C键的开销。这意味着您可以在问题中提到覆盖索引。

Note that once your execution plans are cached and the data is in cache, then your queries will come from memory. Your usage means it will stay in cache for some time. Lack of updates means you won't get statistics-driven recompiles.

请注意,一旦您的执行计划被缓存并且数据在缓存中,那么您的查询将来自内存。您的使用意味着它将在缓存中保留一段时间。缺少更新意味着您将无法获得统计驱动的重新编译。

However, if your data is almost static, then why call SQL Server at all if performance is a concern? Cache it. Remove the network round trip which is probably your biggest overhead based on my caching comments. We outsource some lookups and caching to our clients to reduce server load (we have 50k writes in around 20 seconds at peak load)

但是,如果您的数据几乎是静态的,那么如果性能受到关注,为什么要调用SQL Server呢?缓存它。根据我的缓存注释删除网络往返,这可能是您最大的开销。我们将一些查找和缓存外包给我们的客户端以减少服务器负载(我们在峰值负载下大约20秒内有50k写入)

#2


5  

Yes - a NonClustered Index accesses the data in the table via the clustered key (when the table has a clustered key, and the Row ID when it does not), so it will include the clustered index fields automatically. This is also a reason why a change to the clustered index, forces a rebuild of all non-clustered indexes.

是 - NonClustered Index通过聚簇键访问表中的数据(当表具有聚簇键时,以及当没有聚簇键时访问行ID),因此它将自动包含聚簇索引字段。这也是为什么更改聚簇索引,强制重建所有非聚簇索引的原因。

The additional NC index with the 2 included fields could be valid if that index satisfies a large number of queries, but I'm not sure that is solving the right problem.

如果该索引满足大量查询,则包含2个字段的附加NC索引可能有效,但我不确定是否正在解决正确的问题。

Including 2 more fields within the Clustered Key is not ideal, now it is confirmed within the NC index, you can see every index on that table includes the clustered key within it bulking each index out.

在Clustered Key中包含2个以上的字段并不理想,现在在NC索引中确认,您可以看到该表上的每个索引都包含其中的每个索引的群集密钥。

This is the main reason you want your clustered key to be as narrow as possible, if anything, you should examine your clustered key as ask why you are choosing a 5 field clustered key, and will that choice lead to fragmentation?

这是您希望群集密钥尽可能地缩小的主要原因,如果有的话,您应该检查群集密钥,并询问为什么要选择5字段群集密钥,并且该选择是否会导致碎片化?

You may be better off with an artificial value (Identity) for the clustered key, and use a unique NC index to enforce the uniqueness you have with the 5 field clustered key.

使用聚类键的人工值(Identity)可能会更好,并使用唯一的NC索引来强制使用5字段聚簇键的唯一性。

#3


1  

it would make sense to just put the extra couple of columns into "Included Columns", but SQL Server doesn't allow included columns on the Clustered Index

将额外的列添加到“包含的列”中是有意义的,但SQL Server不允许在聚簇索引上包含列

Including extra columns is impossible, because a clustered index already contains all columns. That's why the index is called clustered.

包含额外列是不可能的,因为聚簇索引已包含所有列。这就是索引被称为集群的原因。

So, we have a second index with essentially the same fields as the Clustered Index, with the other columns as "Included Columns". However, from what I've read, I believe this may be redundant?

因此,我们有第二个索引,其字段与Clustered Index基本相同,其他列为“Included Columns”。但是,从我读过的内容来看,我认为这可能是多余的?

Yeah, it's probably redundant. There are some rare exceptions where the clustered index does not fit into memory.

是的,这可能是多余的。有一些罕见的例外情况,聚集索引不适合内存。

Does the non-clustered Index ALREADY have the columns from the clustered index defined in it?

非聚集索引ALREADY是否具有其中定义的聚簇索引的列?

Probably: a non-clustered index contains a pointer to the clustered index. If the clustered index is unique, this pointer consists of all clustered index fields. (In most situations, these fields correspond with the primary key.)

可能:非聚集索引包含指向聚簇索引的指针。如果聚簇索引是唯一的,则此指针由所有聚簇索引字段组成。 (在大多数情况下,这些字段与主键对应。)

So, is there an efficient way to set up these indexes so that all of the columns of this table are available through the index without going back to the table?

那么,是否有一种有效的方法来设置这些索引,以便该表的所有列都可以通过索引获得而无需返回到表中?

In the example you post it looks like the clustered index is sufficient, and you do not need any other indexes to avoid a table lookup. You can verify this by running queries and looking for "key lookup" or "rid lookup" operations.

在示例中,您发布它看起来像聚簇索引就足够了,并且您不需要任何其他索引来避免表查找。您可以通过运行查询并查找“键查找”或“查找查找”操作来验证这一点。

#4


1  

I think you need a better understanding of CLUSTERED and NONCLUSTERED indexes. The clustered index is a balanced-tree (B-tree) where each node contains the key column(s) for the index. Normally, and frequently the best option, one column is the key column for the index. all data for each row is stored at the leaf level (i.e., the bottom level) of the clustered index. This is why, you can't have included columns in a clustered index; all columns are included by definition.

我认为你需要更好地理解CLUSTERED和NONCLUSTERED索引。聚簇索引是平衡树(B树),其中每个节点包含索引的键列。通常,通常是最佳选项,一列是索引的关键列。每行的所有数据都存储在聚集索引的叶级(即底层)。这就是为什么你不能在聚簇索引中包含列;根据定义包含所有列。

a non-clustered index is also a B-tree structure. Each node contains the key column(s) for the index. The leaf level for non-clustered indexes contains any included columns. Note the difference between a key column and the included column is that the key column values appear at each level of the index and included columns only appear at the leaf level. The leaf level also contains the key column(s) from the clustered index, which are used to link the index to the table data.

非聚集索引也是B树结构。每个节点都包含索引的键列。非聚集索引的叶级别包含任何包含的列。请注意,键列和包含列之间的区别在于键列值出现在索引的每个级别,并且包含的​​列仅出现在叶级别。叶级别还包含聚簇索引中的键列,用于将索引链接到表数据。

The more columns you include in any index, the larger the index becomes. And, this can degrade performance.

您在任何索引中包含的列越多,索引就越大。而且,这会降低性能。

So, for a clustered index, you don't need to include all columns, or even many columns, as keys in the index. The data is already part of the index.

因此,对于聚簇索引,您不需要将所有列甚至许多列都包含在索引中作为键。数据已经是索引的一部分。

#1


4  

A clustered index does not need includes. Includes means stored extra data at the lowest level of the index tree. This is the data in a clustered index. So you don't need an overlapping index

聚簇索引不需要包含。包括在索引树的最低级别存储额外数据的方法。这是聚簇索引中的数据。所以你不需要重叠索引

However if memory footprint is your concern, then you need to shrink the table. With 50k rows I would consider a smallint surrogate key starting at -32768. Then, you remove the overhead of the C key in every NC index. This means you can have a covering index as mentioned in your question.

但是,如果您关注内存占用,则需要缩小表。有50k行我会考虑从-32768开始的smallint代理键。然后,删除每个NC索引中C键的开销。这意味着您可以在问题中提到覆盖索引。

Note that once your execution plans are cached and the data is in cache, then your queries will come from memory. Your usage means it will stay in cache for some time. Lack of updates means you won't get statistics-driven recompiles.

请注意,一旦您的执行计划被缓存并且数据在缓存中,那么您的查询将来自内存。您的使用意味着它将在缓存中保留一段时间。缺少更新意味着您将无法获得统计驱动的重新编译。

However, if your data is almost static, then why call SQL Server at all if performance is a concern? Cache it. Remove the network round trip which is probably your biggest overhead based on my caching comments. We outsource some lookups and caching to our clients to reduce server load (we have 50k writes in around 20 seconds at peak load)

但是,如果您的数据几乎是静态的,那么如果性能受到关注,为什么要调用SQL Server呢?缓存它。根据我的缓存注释删除网络往返,这可能是您最大的开销。我们将一些查找和缓存外包给我们的客户端以减少服务器负载(我们在峰值负载下大约20秒内有50k写入)

#2


5  

Yes - a NonClustered Index accesses the data in the table via the clustered key (when the table has a clustered key, and the Row ID when it does not), so it will include the clustered index fields automatically. This is also a reason why a change to the clustered index, forces a rebuild of all non-clustered indexes.

是 - NonClustered Index通过聚簇键访问表中的数据(当表具有聚簇键时,以及当没有聚簇键时访问行ID),因此它将自动包含聚簇索引字段。这也是为什么更改聚簇索引,强制重建所有非聚簇索引的原因。

The additional NC index with the 2 included fields could be valid if that index satisfies a large number of queries, but I'm not sure that is solving the right problem.

如果该索引满足大量查询,则包含2个字段的附加NC索引可能有效,但我不确定是否正在解决正确的问题。

Including 2 more fields within the Clustered Key is not ideal, now it is confirmed within the NC index, you can see every index on that table includes the clustered key within it bulking each index out.

在Clustered Key中包含2个以上的字段并不理想,现在在NC索引中确认,您可以看到该表上的每个索引都包含其中的每个索引的群集密钥。

This is the main reason you want your clustered key to be as narrow as possible, if anything, you should examine your clustered key as ask why you are choosing a 5 field clustered key, and will that choice lead to fragmentation?

这是您希望群集密钥尽可能地缩小的主要原因,如果有的话,您应该检查群集密钥,并询问为什么要选择5字段群集密钥,并且该选择是否会导致碎片化?

You may be better off with an artificial value (Identity) for the clustered key, and use a unique NC index to enforce the uniqueness you have with the 5 field clustered key.

使用聚类键的人工值(Identity)可能会更好,并使用唯一的NC索引来强制使用5字段聚簇键的唯一性。

#3


1  

it would make sense to just put the extra couple of columns into "Included Columns", but SQL Server doesn't allow included columns on the Clustered Index

将额外的列添加到“包含的列”中是有意义的,但SQL Server不允许在聚簇索引上包含列

Including extra columns is impossible, because a clustered index already contains all columns. That's why the index is called clustered.

包含额外列是不可能的,因为聚簇索引已包含所有列。这就是索引被称为集群的原因。

So, we have a second index with essentially the same fields as the Clustered Index, with the other columns as "Included Columns". However, from what I've read, I believe this may be redundant?

因此,我们有第二个索引,其字段与Clustered Index基本相同,其他列为“Included Columns”。但是,从我读过的内容来看,我认为这可能是多余的?

Yeah, it's probably redundant. There are some rare exceptions where the clustered index does not fit into memory.

是的,这可能是多余的。有一些罕见的例外情况,聚集索引不适合内存。

Does the non-clustered Index ALREADY have the columns from the clustered index defined in it?

非聚集索引ALREADY是否具有其中定义的聚簇索引的列?

Probably: a non-clustered index contains a pointer to the clustered index. If the clustered index is unique, this pointer consists of all clustered index fields. (In most situations, these fields correspond with the primary key.)

可能:非聚集索引包含指向聚簇索引的指针。如果聚簇索引是唯一的,则此指针由所有聚簇索引字段组成。 (在大多数情况下,这些字段与主键对应。)

So, is there an efficient way to set up these indexes so that all of the columns of this table are available through the index without going back to the table?

那么,是否有一种有效的方法来设置这些索引,以便该表的所有列都可以通过索引获得而无需返回到表中?

In the example you post it looks like the clustered index is sufficient, and you do not need any other indexes to avoid a table lookup. You can verify this by running queries and looking for "key lookup" or "rid lookup" operations.

在示例中,您发布它看起来像聚簇索引就足够了,并且您不需要任何其他索引来避免表查找。您可以通过运行查询并查找“键查找”或“查找查找”操作来验证这一点。

#4


1  

I think you need a better understanding of CLUSTERED and NONCLUSTERED indexes. The clustered index is a balanced-tree (B-tree) where each node contains the key column(s) for the index. Normally, and frequently the best option, one column is the key column for the index. all data for each row is stored at the leaf level (i.e., the bottom level) of the clustered index. This is why, you can't have included columns in a clustered index; all columns are included by definition.

我认为你需要更好地理解CLUSTERED和NONCLUSTERED索引。聚簇索引是平衡树(B树),其中每个节点包含索引的键列。通常,通常是最佳选项,一列是索引的关键列。每行的所有数据都存储在聚集索引的叶级(即底层)。这就是为什么你不能在聚簇索引中包含列;根据定义包含所有列。

a non-clustered index is also a B-tree structure. Each node contains the key column(s) for the index. The leaf level for non-clustered indexes contains any included columns. Note the difference between a key column and the included column is that the key column values appear at each level of the index and included columns only appear at the leaf level. The leaf level also contains the key column(s) from the clustered index, which are used to link the index to the table data.

非聚集索引也是B树结构。每个节点都包含索引的键列。非聚集索引的叶级别包含任何包含的列。请注意,键列和包含列之间的区别在于键列值出现在索引的每个级别,并且包含的​​列仅出现在叶级别。叶级别还包含聚簇索引中的键列,用于将索引链接到表数据。

The more columns you include in any index, the larger the index becomes. And, this can degrade performance.

您在任何索引中包含的列越多,索引就越大。而且,这会降低性能。

So, for a clustered index, you don't need to include all columns, or even many columns, as keys in the index. The data is already part of the index.

因此,对于聚簇索引,您不需要将所有列甚至许多列都包含在索引中作为键。数据已经是索引的一部分。