While studying for the 70-433 exam I noticed you can create a covering index in one of the following two ways.
在为70-433考试做准备时,我注意到您可以用以下两种方法之一创建一个覆盖索引。
CREATE INDEX idx1 ON MyTable (Col1, Col2, Col3)
-- OR --
——或
CREATE INDEX idx1 ON MyTable (Col1) INCLUDE (Col2, Col3)
The INCLUDE clause is new to me. Why would you use it and what guidelines would you suggest in determining whether to create a covering index with or without the INCLUDE clause?
包含条款对我来说是新的。为什么要使用它?在决定是否创建包含或不包含包含子句的覆盖索引时,您建议使用什么指导原则?
7 个解决方案
#1
303
If the column is not in the WHERE/JOIN/GROUP BY/ORDER BY
, but only in the column list in the SELECT
clause.
如果列不在WHERE/JOIN/GROUP BY/ORDER BY中,而是在SELECT子句中的列列表中。
The INCLUDE
clause adds the data at the lowest/leaf level, rather than in the index tree. This makes the index smaller because it's not part of the tree
INCLUDE子句将数据添加到最低/叶级,而不是在索引树中。这使得索引更小,因为它不是树的一部分
INCLUDE columns
are not key columns in the index, so they are not ordered. This means it isn't really useful for predicates, sorting etc as I mentioned above. However, it may be useful if you have a residual lookup in a few rows from the key column(s)
包含列不是索引中的关键列,因此它们不是有序的。这意味着它对谓词、排序等等并不是很有用。但是,如果从键列(s)中对几行进行残留查找,可能会很有用
Another MSDN article with a worked example
另一篇MSDN文章,有一个工作示例
#2
183
You would use the INCLUDE to add one or more columns to the leaf level of a non-clustered index, if by doing so, you can "cover" your queries.
您将使用INCLUDE向非聚集索引的叶级添加一个或多个列,如果这样做,您可以“覆盖”您的查询。
Imagine you need to query for an employee's ID, department ID, and lastname.
假设您需要查询员工的ID、部门ID和姓。
SELECT EmployeeID, DepartmentID, LastName
FROM Employee
WHERE DepartmentID = 5
If you happen to have a non-clustered index on (EmployeeID, DepartmentID), once you find the employees for a given department, you now have to do "bookmark lookup" to get the actual full employee record, just to get the lastname column. That can get pretty expensive in terms of performance, if you find a lot of employees.
如果您碰巧有一个非集群索引(EmployeeID, DepartmentID),一旦您找到了某个部门的员工,您现在必须进行“书签查找”,以获取实际的完整员工记录,仅获取lastname列。如果你找到很多员工,就会发现这在绩效上是非常昂贵的。
If you had included that lastname in your index:
如果你在索引中包含了姓:
CREATE NONCLUSTERED INDEX NC_EmpDep
ON Employee(EmployeeID, DepartmentID)
INCLUDE (Lastname)
then all the information you need is available in the leaf level of the non-clustered index. Just by seeking in the non-clustered index and finding your employees for a given department, you have all the necessary information, and the bookmark lookup for each employee found in the index is no longer necessary --> you save a lot of time.
然后,您需要的所有信息都可以在非聚集索引的叶级中获得。只要在非聚集索引中查找并为给定的部门查找您的员工,您就拥有了所有必要的信息,并且不再需要为索引中找到的每个员工查找书签——>可以节省大量时间。
Obviously, you cannot include every column in every non-clustered index - but if you do have queries which are missing just one or two columns to be "covered" (and that get used a lot), it can be very helpful to INCLUDE those into a suitable non-clustered index.
显然,不能将每个非聚集索引中的每一列都包含在内——但是如果您确实有只缺少一两个要“覆盖”的列的查询(而且这些列经常被使用),那么将它们包含到合适的非聚集索引中是非常有用的。
#3
17
This discussion is missing out on the important point: The question is not if the "non-key-columns" are better to include as index-columns or as included-columns.
这一讨论忽略了重要的一点:问题不在于“非键列”是否更适合作为索引列或作为包含列。
The question is how expensive it is to use the include-mechanism to include columns that are not really needed in index? (typically not part of where-clauses, but often included in selects). So your dilemma is always:
问题是使用包含机制来包含索引中不需要的列的代价有多大?(通常不属于where子句,但通常包含在select中)。所以你的困境总是:
- Use index on id1, id2 ... idN alone or
- 使用id1, id2…印度尼西亚的单独或
- Use index on id1, id2 ... idN plus include col1, col2 ... colN
- 使用id1, id2…idN +包括col1, col2…colN
Where: id1, id2 ... idN are columns often used in restrictions and col1, col2 ... colN are columns often selected, but typically not used in restrictions
地点:id1,id2……idN是限制和col1、col2中经常使用的列。colN是通常选择的列,但通常不用于限制。
(The option to include all of these columns as part of the index-key is just always silly (unless they are also used in restrictions) - cause it would always be more expensive to maintain since the index must be updated and sorted even when the "keys" have not changed).
(将所有这些列作为索引键的一部分包含在索引键中总是很愚蠢的(除非它们也用于限制中)——因为维护索引总是很昂贵,因为即使“键”没有更改,索引也必须更新和排序)。
So use option 1 or 2?
使用选项1还是选项2?
Answer: If your table is rarely updated - mostly inserted into/deleted from - then it is relatively inexpensive to use the include-mechanism to include some "hot columns" (that are often used in selects - but not often used on restrictions) since inserts/deletes require the index to be updated/sorted anyway and thus little extra overhead is associated with storing off a few extra columns while already updating the index. The overhead is the extra memory and CPU used to store redundant info on the index.
答:如果你的表是很少更新——主要是插入/删除然后相对便宜使用include-mechanism包括一些“热列”(经常用于选择——但不是经常使用限制)后插入/删除需要更新/索引排序,因此一些额外的开销与存储了一些额外的列,而已经更新索引。开销是用于在索引上存储冗余信息的额外内存和CPU。
If the columns you consider to add as included-columns are often updated (without the index-key-columns being updated) - or - if it is so many of them that the index becomes close to a copy of your table - use option 1 I'd suggest! Also if adding certain include-column(s) turns out to make no performance-difference - you might want to skip the idea of adding them:) Verify that they are useful!
如果您认为要添加为包含列的列经常被更新(不更新索引键列)——或者——如果其中有很多列,以至于索引变得接近于表的副本——我建议使用选项1 !另外,如果添加某些包含列结果没有性能差异——您可能想要跳过添加它们的想法:)验证它们是否有用!
The average number of rows per same values in keys (id1, id2 ... idN) can be of some importance as well.
键中每个相同值的平均行数(id1, id2…)idN)也可能具有一定的重要性。
Notice that if a column - that is added as an included-column of index - is used in the restriction: As long as the index as such can be used (based on restriction against index-key-columns) - then SQL Server is matching the column-restriction against the index (leaf-node-values) instead of going the expensive way around the table itself.
注意,如果添加一个列,作为included-column指数——用于限制:只要指数同样可以使用(基于对索引键列的限制),那么SQL服务器匹配column-restriction反对指数(leaf-node-values)而不是昂贵的在表本身。
#4
16
Basic index columns are sorted, but included columns are not sorted. This saves resources in maintaining the index, while still making it possible to provide the data in the included columns to cover a query. So, if you want to cover queries, you can put the search criteria to locate rows into the sorted columns of the index, but then "include" additional, unsorted columns with non-search data. It definitely helps with reducing the amount of sorting and fragmentation in index maintenance.
基本的索引列被排序,但是包含的列没有排序。这节省了维护索引的资源,同时仍然可以在包含的列中提供数据以覆盖查询。因此,如果您想要覆盖查询,可以将搜索条件放置到索引的已排序列中,然后使用非搜索数据“包含”附加的未排序列。它肯定有助于减少索引维护中的排序和碎片数量。
#5
5
The reasons why (including the data in the leaf level of the index) have been nicely explained. The reason that you give two shakes about this, is that when you run your query, if you don't have the additional columns included (new feature in SQL 2005) the SQL Server has to go to the clustered index to get the additional columns which takes more time, and adds more load to the SQL Server service, the disks, and the memory (buffer cache to be specific) as new data pages are loaded into memory, potentially pushing other more often needed data out of the buffer cache.
为什么(包括索引叶级的数据)已经被很好地解释了。的原因,你给两个震动,当您运行您的查询,如果没有额外的列包括(SQL 2005新功能)的SQL服务器去聚集索引得到额外的列,这需要更多的时间,并添加更多的负载到SQL Server服务,磁盘,和内存(缓冲区缓存具体)作为新数据页面加载到内存,可能会推动其他经常需要数据缓冲区缓存。
#6
4
An additional consideraion that I have not seen in the answers already given, is that included columns can be of data types that are not allowed as index key columns, such as varchar(max).
我在已经给出的答案中没有看到的另一个考虑因素是,包含的列可以是不允许作为索引键列(如varchar(max))的数据类型。
This allows you to include such columns in a covering index. I recently had to do this to provide a nHibernate generated query, which had a lot of columns in the SELECT, with a useful index.
这允许您在覆盖索引中包含此类列。我最近不得不这样做,以提供一个nHibernate生成的查询,该查询在SELECT中有很多列,并带有一个有用的索引。
#7
2
There is a limit to the total size of all columns inlined into the index definition. That said though, I have never had to create index that wide. To me, the bigger advantage is the fact that you can cover more queries with one index that has included columns as they don't have to be defined in any particular order. Think about is as an index within the index. One example would be the StoreID (where StoreID is low selectivity meaning that each store is associated with a lot of customers) and then customer demographics data (LastName, FirstName, DOB): If you just inline those columns in this order (StoreID, LastName, FirstName, DOB), you can only efficiently search for customers for which you know StoreID and LastName.
索引定义中所有列的总大小都是有限制的。尽管如此,我从来没有创建过这么广泛的索引。对我来说,更大的优势在于,您可以使用包含列的索引来覆盖更多的查询,因为它们不需要以任何特定的顺序来定义。可以把它看作是索引中的一个索引。一个例子将StoreID(StoreID低选择性意味着每个商店与很多客户),然后客户人口统计数据(姓、名、强加于人):如果你只是内联那些列在这个秩序(StoreID、姓、名、捐赠),你只能有效地搜索客户你知道StoreID和LastName。
On the other hand, defining the index on StoreID and including LastName, FirstName, DOB columns would let you in essence do two seeks- index predicate on StoreID and then seek predicate on any of the included columns. This would let you cover all possible search permutationsas as long as it starts with StoreID.
另一方面,定义StoreID上的索引,并包括LastName、FirstName、DOB列,实际上可以让您在StoreID上执行两个查找- index谓词,然后在任何包含的列上查找谓词。这将允许您覆盖所有可能的搜索permutationsas,只要它从StoreID开始。
#1
303
If the column is not in the WHERE/JOIN/GROUP BY/ORDER BY
, but only in the column list in the SELECT
clause.
如果列不在WHERE/JOIN/GROUP BY/ORDER BY中,而是在SELECT子句中的列列表中。
The INCLUDE
clause adds the data at the lowest/leaf level, rather than in the index tree. This makes the index smaller because it's not part of the tree
INCLUDE子句将数据添加到最低/叶级,而不是在索引树中。这使得索引更小,因为它不是树的一部分
INCLUDE columns
are not key columns in the index, so they are not ordered. This means it isn't really useful for predicates, sorting etc as I mentioned above. However, it may be useful if you have a residual lookup in a few rows from the key column(s)
包含列不是索引中的关键列,因此它们不是有序的。这意味着它对谓词、排序等等并不是很有用。但是,如果从键列(s)中对几行进行残留查找,可能会很有用
Another MSDN article with a worked example
另一篇MSDN文章,有一个工作示例
#2
183
You would use the INCLUDE to add one or more columns to the leaf level of a non-clustered index, if by doing so, you can "cover" your queries.
您将使用INCLUDE向非聚集索引的叶级添加一个或多个列,如果这样做,您可以“覆盖”您的查询。
Imagine you need to query for an employee's ID, department ID, and lastname.
假设您需要查询员工的ID、部门ID和姓。
SELECT EmployeeID, DepartmentID, LastName
FROM Employee
WHERE DepartmentID = 5
If you happen to have a non-clustered index on (EmployeeID, DepartmentID), once you find the employees for a given department, you now have to do "bookmark lookup" to get the actual full employee record, just to get the lastname column. That can get pretty expensive in terms of performance, if you find a lot of employees.
如果您碰巧有一个非集群索引(EmployeeID, DepartmentID),一旦您找到了某个部门的员工,您现在必须进行“书签查找”,以获取实际的完整员工记录,仅获取lastname列。如果你找到很多员工,就会发现这在绩效上是非常昂贵的。
If you had included that lastname in your index:
如果你在索引中包含了姓:
CREATE NONCLUSTERED INDEX NC_EmpDep
ON Employee(EmployeeID, DepartmentID)
INCLUDE (Lastname)
then all the information you need is available in the leaf level of the non-clustered index. Just by seeking in the non-clustered index and finding your employees for a given department, you have all the necessary information, and the bookmark lookup for each employee found in the index is no longer necessary --> you save a lot of time.
然后,您需要的所有信息都可以在非聚集索引的叶级中获得。只要在非聚集索引中查找并为给定的部门查找您的员工,您就拥有了所有必要的信息,并且不再需要为索引中找到的每个员工查找书签——>可以节省大量时间。
Obviously, you cannot include every column in every non-clustered index - but if you do have queries which are missing just one or two columns to be "covered" (and that get used a lot), it can be very helpful to INCLUDE those into a suitable non-clustered index.
显然,不能将每个非聚集索引中的每一列都包含在内——但是如果您确实有只缺少一两个要“覆盖”的列的查询(而且这些列经常被使用),那么将它们包含到合适的非聚集索引中是非常有用的。
#3
17
This discussion is missing out on the important point: The question is not if the "non-key-columns" are better to include as index-columns or as included-columns.
这一讨论忽略了重要的一点:问题不在于“非键列”是否更适合作为索引列或作为包含列。
The question is how expensive it is to use the include-mechanism to include columns that are not really needed in index? (typically not part of where-clauses, but often included in selects). So your dilemma is always:
问题是使用包含机制来包含索引中不需要的列的代价有多大?(通常不属于where子句,但通常包含在select中)。所以你的困境总是:
- Use index on id1, id2 ... idN alone or
- 使用id1, id2…印度尼西亚的单独或
- Use index on id1, id2 ... idN plus include col1, col2 ... colN
- 使用id1, id2…idN +包括col1, col2…colN
Where: id1, id2 ... idN are columns often used in restrictions and col1, col2 ... colN are columns often selected, but typically not used in restrictions
地点:id1,id2……idN是限制和col1、col2中经常使用的列。colN是通常选择的列,但通常不用于限制。
(The option to include all of these columns as part of the index-key is just always silly (unless they are also used in restrictions) - cause it would always be more expensive to maintain since the index must be updated and sorted even when the "keys" have not changed).
(将所有这些列作为索引键的一部分包含在索引键中总是很愚蠢的(除非它们也用于限制中)——因为维护索引总是很昂贵,因为即使“键”没有更改,索引也必须更新和排序)。
So use option 1 or 2?
使用选项1还是选项2?
Answer: If your table is rarely updated - mostly inserted into/deleted from - then it is relatively inexpensive to use the include-mechanism to include some "hot columns" (that are often used in selects - but not often used on restrictions) since inserts/deletes require the index to be updated/sorted anyway and thus little extra overhead is associated with storing off a few extra columns while already updating the index. The overhead is the extra memory and CPU used to store redundant info on the index.
答:如果你的表是很少更新——主要是插入/删除然后相对便宜使用include-mechanism包括一些“热列”(经常用于选择——但不是经常使用限制)后插入/删除需要更新/索引排序,因此一些额外的开销与存储了一些额外的列,而已经更新索引。开销是用于在索引上存储冗余信息的额外内存和CPU。
If the columns you consider to add as included-columns are often updated (without the index-key-columns being updated) - or - if it is so many of them that the index becomes close to a copy of your table - use option 1 I'd suggest! Also if adding certain include-column(s) turns out to make no performance-difference - you might want to skip the idea of adding them:) Verify that they are useful!
如果您认为要添加为包含列的列经常被更新(不更新索引键列)——或者——如果其中有很多列,以至于索引变得接近于表的副本——我建议使用选项1 !另外,如果添加某些包含列结果没有性能差异——您可能想要跳过添加它们的想法:)验证它们是否有用!
The average number of rows per same values in keys (id1, id2 ... idN) can be of some importance as well.
键中每个相同值的平均行数(id1, id2…)idN)也可能具有一定的重要性。
Notice that if a column - that is added as an included-column of index - is used in the restriction: As long as the index as such can be used (based on restriction against index-key-columns) - then SQL Server is matching the column-restriction against the index (leaf-node-values) instead of going the expensive way around the table itself.
注意,如果添加一个列,作为included-column指数——用于限制:只要指数同样可以使用(基于对索引键列的限制),那么SQL服务器匹配column-restriction反对指数(leaf-node-values)而不是昂贵的在表本身。
#4
16
Basic index columns are sorted, but included columns are not sorted. This saves resources in maintaining the index, while still making it possible to provide the data in the included columns to cover a query. So, if you want to cover queries, you can put the search criteria to locate rows into the sorted columns of the index, but then "include" additional, unsorted columns with non-search data. It definitely helps with reducing the amount of sorting and fragmentation in index maintenance.
基本的索引列被排序,但是包含的列没有排序。这节省了维护索引的资源,同时仍然可以在包含的列中提供数据以覆盖查询。因此,如果您想要覆盖查询,可以将搜索条件放置到索引的已排序列中,然后使用非搜索数据“包含”附加的未排序列。它肯定有助于减少索引维护中的排序和碎片数量。
#5
5
The reasons why (including the data in the leaf level of the index) have been nicely explained. The reason that you give two shakes about this, is that when you run your query, if you don't have the additional columns included (new feature in SQL 2005) the SQL Server has to go to the clustered index to get the additional columns which takes more time, and adds more load to the SQL Server service, the disks, and the memory (buffer cache to be specific) as new data pages are loaded into memory, potentially pushing other more often needed data out of the buffer cache.
为什么(包括索引叶级的数据)已经被很好地解释了。的原因,你给两个震动,当您运行您的查询,如果没有额外的列包括(SQL 2005新功能)的SQL服务器去聚集索引得到额外的列,这需要更多的时间,并添加更多的负载到SQL Server服务,磁盘,和内存(缓冲区缓存具体)作为新数据页面加载到内存,可能会推动其他经常需要数据缓冲区缓存。
#6
4
An additional consideraion that I have not seen in the answers already given, is that included columns can be of data types that are not allowed as index key columns, such as varchar(max).
我在已经给出的答案中没有看到的另一个考虑因素是,包含的列可以是不允许作为索引键列(如varchar(max))的数据类型。
This allows you to include such columns in a covering index. I recently had to do this to provide a nHibernate generated query, which had a lot of columns in the SELECT, with a useful index.
这允许您在覆盖索引中包含此类列。我最近不得不这样做,以提供一个nHibernate生成的查询,该查询在SELECT中有很多列,并带有一个有用的索引。
#7
2
There is a limit to the total size of all columns inlined into the index definition. That said though, I have never had to create index that wide. To me, the bigger advantage is the fact that you can cover more queries with one index that has included columns as they don't have to be defined in any particular order. Think about is as an index within the index. One example would be the StoreID (where StoreID is low selectivity meaning that each store is associated with a lot of customers) and then customer demographics data (LastName, FirstName, DOB): If you just inline those columns in this order (StoreID, LastName, FirstName, DOB), you can only efficiently search for customers for which you know StoreID and LastName.
索引定义中所有列的总大小都是有限制的。尽管如此,我从来没有创建过这么广泛的索引。对我来说,更大的优势在于,您可以使用包含列的索引来覆盖更多的查询,因为它们不需要以任何特定的顺序来定义。可以把它看作是索引中的一个索引。一个例子将StoreID(StoreID低选择性意味着每个商店与很多客户),然后客户人口统计数据(姓、名、强加于人):如果你只是内联那些列在这个秩序(StoreID、姓、名、捐赠),你只能有效地搜索客户你知道StoreID和LastName。
On the other hand, defining the index on StoreID and including LastName, FirstName, DOB columns would let you in essence do two seeks- index predicate on StoreID and then seek predicate on any of the included columns. This would let you cover all possible search permutationsas as long as it starts with StoreID.
另一方面,定义StoreID上的索引,并包括LastName、FirstName、DOB列,实际上可以让您在StoreID上执行两个查找- index谓词,然后在任何包含的列上查找谓词。这将允许您覆盖所有可能的搜索permutationsas,只要它从StoreID开始。