包含列的索引,有什么区别?

时间:2023-01-25 10:52:41

I've never really understood the difference between these two indexes, can someone please explain what the difference is (performance-wise, how the index structure will look like in db, storage-wise etc)?

我从来没有真正理解这两个索引之间的区别,有人可以解释一下差异是什么(性能方面,如何在db中存储索引结构,存储方式等)?

I understand this question is broad, please bear with me on this. I don't really know how to scope it down. Perhaps if you guys start explaining your know-hows I'll get pointers in the right direction enabling me to make the question more narrow?

我理解这个问题很广泛,请耐心等待。我真的不知道如何限制它。也许如果你们开始解释你的诀窍,我会在正确的方向上得到指示,这使我能够使问题更加狭窄?

Included index

包含的索引

CREATE NONCLUSTERED INDEX IX_Address_PostalCode  
ON Person.Address (PostalCode) 
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID); 

'Normal' index

'正常'指数

CREATE NONCLUSTERED INDEX IX_Address_PostalCode  
ON Person.Address (PostalCode, AddressLine1, AddressLine2, City, StateProvinceID);

3 个解决方案

#1


10  

The internal storage of indexes uses a B-Tree structure and consists of "index pages" (the root and all intermediate pages) and "index data pages" (the leaf pages only).

索引的内部存储使用B树结构,由“索引页”(根页和所有中间页)和“索引数据页”(仅限叶页)组成。

Note do not confuse "index data pages" with the "data pages" (leaf pages of clustered indexes) which store most of the columns of actual data.

注意不要将“索引数据页”与存储大多数实际数据列的“数据页”(聚簇索引的叶页)混淆。

  • Only the index columns are stored on the index pages.
  • 只有索引列存储在索引页面上。
  • By placing some columns in the INCLUDE section, less data per index key is stored on each page.
  • 通过在INCLUDE部分中放置一些列,每个索引键的数据存储在每个页面上。
  • Meaning fewer pages are needed to hold the index keys. (Making it easier to cache these frequently used pages in memory for longer.)
  • 意味着需要更少的页面来保存索引键。 (更容易将这些常用页面缓存在内存中更长时间。)
  • And possibly fewer levels in the tree. (In such a case performance benefits can be much bigger because every tree level traversal is another disk access.)
  • 树中的级别可能更少。 (在这种情况下,性能优势可能会大得多,因为每个树级别遍历都是另一个磁盘访问。)

When an index is used, the index key is used to navigate through the index pages to the correct index data page.

使用索引时,索引键用于将索引页面导航到正确的索引数据页面。

  • If the index has INCLUDE columns, that data is immediately available should the query need it.
  • 如果索引具有INCLUDE列,则在查询需要时,该数据立即可用。
  • If the query requires columns not available in either the index keys or the INCLUDE columns, then an additional "bookmark lookup" is required to the correct row in the clustered index (or heap if no clustered index defined).
  • 如果查询需要索引键或INCLUDE列中不可用的列,则聚簇索引中的正确行(如果未定义聚簇索引,则为堆)需要额外的“书签查找”。

Some things to note that hopefully addresses some of your confusion:

有些事情要注意,希望能解决你的一些困惑:

  • If the keys of your index and filters in your query are not selective enough, then the index will be ignored (regardless of what's in your INCLUDE columns).
  • 如果查询中的索引和过滤器的键不够有选择性,那么将忽略索引(无论INCLUDE列中的内容是什么)。
  • Every index you create has overhead for INSERT and UPDATE statements; more so for "bigger" indexes. (Bigger applies to INCLUDE columns as well.)
  • 您创建的每个索引都有INSERT和UPDATE语句的开销;对于“更大”的指数更是如此。 (更大也适用于INCLUDE列。)
  • So while you could in theory create a multitude of big indexes with include columns to match all the permutations of access paths: it would be very counter-productive.
  • 因此,虽然理论上可以创建包含列的多个大索引来匹配访问路径的所有排列:它会非常适得其反。

It's worth noting that before INCLUDE columns were added as a feature:

值得注意的是,在将INCLUDE列添加为功能之前:

  • It was a common index tuning 'trick' to expand the keys of an index to include columns that weren't needed in the index/filter. (Known as a covering index.)
  • 这是一个常见的索引调整“技巧”,可以扩展索引的键,以包含索引/过滤器中不需要的列。 (称为覆盖指数。)
  • These columns were commonly required in output columns or as reference columns for joins to other tables.
  • 这些列通常在输出列中是必需的,或者作为参考列用于连接到其他表。
  • This would avoid the infamous "bookmark lookups", but had the disadvantage of making the index 'wider' than strictly necessary.
  • 这将避免臭名昭着的“书签查找”,但缺点是使索引“更宽”而不是严格必要。
  • In fact very often the earlier columns in the index would already identify a unique row meaning the extra included columns would be completely redundant if not for the "avoiding bookmark lookups" benefit.
  • 事实上,索引中较早的列通常已经识别出唯一的行,这意味着如果不是“避免书签查找”的好处,额外包含的列将是完全冗余的。
  • INCLUDE columns basically allow the same benefit more efficiently.
  • INCLUDE列基本上可以更有效地提供相同的好处。

NB Something very important to point out. You generally get zero benefit out of INCLUDE columns in your indexes if you're in the lazy habit of always writing your queries as SELECT * .... By returning all columns you're basically ensuring a bookmark lookup is required in any case.

注意事项非常重要。如果你总是把你的查询写成SELECT *的懒惰习惯,你通常会从索引中的INCLUDE列中获得零利益。通过返回所有列,你基本上确保在任何情况下都需要书签查找。

#2


4  

In first Index, in Index page only PostalCode is the key column and AddressLine1, AddressLine2, City, StateProvinceID are part of leaf node to avoid key/RID lookup

在第一个索引中,在索引页面中,只有PostalCode是键列,AddressLine1,AddressLine2,City,StateProvinceID是叶节点的一部分,以避免键/ RID查找

I will prefer first index when my table will be filtered always on PostalCode and any of this columns AddressLine1, AddressLine2, City, StateProvinceID will be part of select and not filtration

我会更喜欢第一个索引,当我的表总是在PostalCode上过滤时,这些列中的任何一个AddressLine1,AddressLine2,City,StateProvinceID都将成为select而不是过滤的一部分

select AddressLine1, AddressLine2, City, StateProvinceID
from Person.Address 
Where PostalCode=  

In second index, in Index page there will be five key columns PostalCode, AddressLine1, AddressLine2, City, StateProvinceID

在第二个索引中,在索引页面中将有五个关键列PostalCode,AddressLine1,AddressLine2,City,StateProvinceID

I will prefer second index when I have possiblity to filter data like

当我有可能过滤数据时,我会更喜欢第二个索引

Where PostalCode = And AddressLine1 = 

or

要么

Where PostalCode = And AddressLine2 = 

or

要么

Where PostalCode = And AddressLine1  = and AddressLine2 = 

and so on..

等等..

At any case the first column in index should be part of filtration to utilize the index

在任何情况下,索引中的第一列应该是过滤的一部分以利用该指数

#3


1  

In the first example, only the index column: PostalCode is stored in the index tree with all the other columns stored in leaf level of the index. This makes the index smaller in size and is useful if you wouldn't be using a where, Join, group by against the other columns but only against the PostalCode.

在第一个示例中,只有索引列:PostalCode存储在索引树中,其他所有列都存储在索引的叶级中。这使得索引的大小更小,并且如果您不使用where,Join,group by对其他列但仅针对PostalCode,则该索引非常有用。

In the second index, all the data for all the columns are stored in the index tree, this makes the index much bigger but is useful if you would be using any of the columns in a WHERE/JOIN/GROUP BY/ORDER By statements.

在第二个索引中,所有列的所有数据都存储在索引树中,这使得索引更大,但是如果要使用WHERE / JOIN / GROUP BY / ORDER By语句中的任何列,则非常有用。

Include columns makes it faster to retrieve the data when they are specified in the select list.

包含列使得在选择列表中指定数据时检索数据的速度更快。

For example if you are running:

例如,如果您正在运行:

SELECT PostalCode, AddressLine1, AddressLine2, City, StateProvinceID 
FROM Person.Address 
Where PostalCode= 'A1234'

This will benefit from creating an index on PostalCode and including all the other columns

这将受益于在PostalCode上创建索引并包括所有其他列

On the other hand, if you are running:

另一方面,如果您正在运行:

SELECT PostalCode, AddressLine1, AddressLine2, City, StateProvinceID 
FROM Person.Address 
Where PostalCode= 'A1234' or City = 'London' or StateProvinceID = 1 or AddressLine1 = 'street A' or AddressLine2 = 'StreetB'

This would benefit more from having all the columns in the index

这将从索引中的所有列中获益更多

Have a look at the links below, these might help more with your query

看看下面的链接,这些可能对您的查询有所帮助

Index with Included Column: https://msdn.microsoft.com/en-us/library/ms190806(v=sql.105).aspx

包含列的索引:https://msdn.microsoft.com/en-us/library/ms190806(v = sql.105).aspx

Table and Index Organization: https://msdn.microsoft.com/en-us/library/ms189051(v=sql.105).aspx

表和索引组织:https://msdn.microsoft.com/en-us/library/ms189051(v = sql.105).aspx

#1


10  

The internal storage of indexes uses a B-Tree structure and consists of "index pages" (the root and all intermediate pages) and "index data pages" (the leaf pages only).

索引的内部存储使用B树结构,由“索引页”(根页和所有中间页)和“索引数据页”(仅限叶页)组成。

Note do not confuse "index data pages" with the "data pages" (leaf pages of clustered indexes) which store most of the columns of actual data.

注意不要将“索引数据页”与存储大多数实际数据列的“数据页”(聚簇索引的叶页)混淆。

  • Only the index columns are stored on the index pages.
  • 只有索引列存储在索引页面上。
  • By placing some columns in the INCLUDE section, less data per index key is stored on each page.
  • 通过在INCLUDE部分中放置一些列,每个索引键的数据存储在每个页面上。
  • Meaning fewer pages are needed to hold the index keys. (Making it easier to cache these frequently used pages in memory for longer.)
  • 意味着需要更少的页面来保存索引键。 (更容易将这些常用页面缓存在内存中更长时间。)
  • And possibly fewer levels in the tree. (In such a case performance benefits can be much bigger because every tree level traversal is another disk access.)
  • 树中的级别可能更少。 (在这种情况下,性能优势可能会大得多,因为每个树级别遍历都是另一个磁盘访问。)

When an index is used, the index key is used to navigate through the index pages to the correct index data page.

使用索引时,索引键用于将索引页面导航到正确的索引数据页面。

  • If the index has INCLUDE columns, that data is immediately available should the query need it.
  • 如果索引具有INCLUDE列,则在查询需要时,该数据立即可用。
  • If the query requires columns not available in either the index keys or the INCLUDE columns, then an additional "bookmark lookup" is required to the correct row in the clustered index (or heap if no clustered index defined).
  • 如果查询需要索引键或INCLUDE列中不可用的列,则聚簇索引中的正确行(如果未定义聚簇索引,则为堆)需要额外的“书签查找”。

Some things to note that hopefully addresses some of your confusion:

有些事情要注意,希望能解决你的一些困惑:

  • If the keys of your index and filters in your query are not selective enough, then the index will be ignored (regardless of what's in your INCLUDE columns).
  • 如果查询中的索引和过滤器的键不够有选择性,那么将忽略索引(无论INCLUDE列中的内容是什么)。
  • Every index you create has overhead for INSERT and UPDATE statements; more so for "bigger" indexes. (Bigger applies to INCLUDE columns as well.)
  • 您创建的每个索引都有INSERT和UPDATE语句的开销;对于“更大”的指数更是如此。 (更大也适用于INCLUDE列。)
  • So while you could in theory create a multitude of big indexes with include columns to match all the permutations of access paths: it would be very counter-productive.
  • 因此,虽然理论上可以创建包含列的多个大索引来匹配访问路径的所有排列:它会非常适得其反。

It's worth noting that before INCLUDE columns were added as a feature:

值得注意的是,在将INCLUDE列添加为功能之前:

  • It was a common index tuning 'trick' to expand the keys of an index to include columns that weren't needed in the index/filter. (Known as a covering index.)
  • 这是一个常见的索引调整“技巧”,可以扩展索引的键,以包含索引/过滤器中不需要的列。 (称为覆盖指数。)
  • These columns were commonly required in output columns or as reference columns for joins to other tables.
  • 这些列通常在输出列中是必需的,或者作为参考列用于连接到其他表。
  • This would avoid the infamous "bookmark lookups", but had the disadvantage of making the index 'wider' than strictly necessary.
  • 这将避免臭名昭着的“书签查找”,但缺点是使索引“更宽”而不是严格必要。
  • In fact very often the earlier columns in the index would already identify a unique row meaning the extra included columns would be completely redundant if not for the "avoiding bookmark lookups" benefit.
  • 事实上,索引中较早的列通常已经识别出唯一的行,这意味着如果不是“避免书签查找”的好处,额外包含的列将是完全冗余的。
  • INCLUDE columns basically allow the same benefit more efficiently.
  • INCLUDE列基本上可以更有效地提供相同的好处。

NB Something very important to point out. You generally get zero benefit out of INCLUDE columns in your indexes if you're in the lazy habit of always writing your queries as SELECT * .... By returning all columns you're basically ensuring a bookmark lookup is required in any case.

注意事项非常重要。如果你总是把你的查询写成SELECT *的懒惰习惯,你通常会从索引中的INCLUDE列中获得零利益。通过返回所有列,你基本上确保在任何情况下都需要书签查找。

#2


4  

In first Index, in Index page only PostalCode is the key column and AddressLine1, AddressLine2, City, StateProvinceID are part of leaf node to avoid key/RID lookup

在第一个索引中,在索引页面中,只有PostalCode是键列,AddressLine1,AddressLine2,City,StateProvinceID是叶节点的一部分,以避免键/ RID查找

I will prefer first index when my table will be filtered always on PostalCode and any of this columns AddressLine1, AddressLine2, City, StateProvinceID will be part of select and not filtration

我会更喜欢第一个索引,当我的表总是在PostalCode上过滤时,这些列中的任何一个AddressLine1,AddressLine2,City,StateProvinceID都将成为select而不是过滤的一部分

select AddressLine1, AddressLine2, City, StateProvinceID
from Person.Address 
Where PostalCode=  

In second index, in Index page there will be five key columns PostalCode, AddressLine1, AddressLine2, City, StateProvinceID

在第二个索引中,在索引页面中将有五个关键列PostalCode,AddressLine1,AddressLine2,City,StateProvinceID

I will prefer second index when I have possiblity to filter data like

当我有可能过滤数据时,我会更喜欢第二个索引

Where PostalCode = And AddressLine1 = 

or

要么

Where PostalCode = And AddressLine2 = 

or

要么

Where PostalCode = And AddressLine1  = and AddressLine2 = 

and so on..

等等..

At any case the first column in index should be part of filtration to utilize the index

在任何情况下,索引中的第一列应该是过滤的一部分以利用该指数

#3


1  

In the first example, only the index column: PostalCode is stored in the index tree with all the other columns stored in leaf level of the index. This makes the index smaller in size and is useful if you wouldn't be using a where, Join, group by against the other columns but only against the PostalCode.

在第一个示例中,只有索引列:PostalCode存储在索引树中,其他所有列都存储在索引的叶级中。这使得索引的大小更小,并且如果您不使用where,Join,group by对其他列但仅针对PostalCode,则该索引非常有用。

In the second index, all the data for all the columns are stored in the index tree, this makes the index much bigger but is useful if you would be using any of the columns in a WHERE/JOIN/GROUP BY/ORDER By statements.

在第二个索引中,所有列的所有数据都存储在索引树中,这使得索引更大,但是如果要使用WHERE / JOIN / GROUP BY / ORDER By语句中的任何列,则非常有用。

Include columns makes it faster to retrieve the data when they are specified in the select list.

包含列使得在选择列表中指定数据时检索数据的速度更快。

For example if you are running:

例如,如果您正在运行:

SELECT PostalCode, AddressLine1, AddressLine2, City, StateProvinceID 
FROM Person.Address 
Where PostalCode= 'A1234'

This will benefit from creating an index on PostalCode and including all the other columns

这将受益于在PostalCode上创建索引并包括所有其他列

On the other hand, if you are running:

另一方面,如果您正在运行:

SELECT PostalCode, AddressLine1, AddressLine2, City, StateProvinceID 
FROM Person.Address 
Where PostalCode= 'A1234' or City = 'London' or StateProvinceID = 1 or AddressLine1 = 'street A' or AddressLine2 = 'StreetB'

This would benefit more from having all the columns in the index

这将从索引中的所有列中获益更多

Have a look at the links below, these might help more with your query

看看下面的链接,这些可能对您的查询有所帮助

Index with Included Column: https://msdn.microsoft.com/en-us/library/ms190806(v=sql.105).aspx

包含列的索引:https://msdn.microsoft.com/en-us/library/ms190806(v = sql.105).aspx

Table and Index Organization: https://msdn.microsoft.com/en-us/library/ms189051(v=sql.105).aspx

表和索引组织:https://msdn.microsoft.com/en-us/library/ms189051(v = sql.105).aspx