表扫描和聚集索引扫描有什么区别?

时间:2021-12-27 02:46:53

Since both a Table Scan and a Clustered Index Scan essentially scan all records in the table, why is a Clustered Index Scan supposedly better?

由于表扫描和群集索引扫描实质上都扫描表中的所有记录,为什么群集索引扫描被认为是更好的呢?

As an example - what's the performance difference between the following when there are many records?:

举个例子——当有很多记录时,下面的性能有什么不同?

declare @temp table(
    SomeColumn varchar(50)
)

insert into @temp
select 'SomeVal'

select * from @temp

-----------------------------

declare @temp table(
    RowID int not null identity(1,1) primary key,
    SomeColumn varchar(50)
)

insert into @temp
select 'SomeVal'

select * from @temp

3 个解决方案

#1


73  

In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map.

在没有集群索引(堆表)的表中,数据页不会链接在一起——因此遍历页需要查找索引分配映射。

A clustered table, however, has it's data pages linked in a doubly linked list - making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on INSERT, UPDATE, and DELETE. A heap table, however, requires a second write to the IAM.

但是,集群表将数据页链接到一个双链表中,这样可以更快地进行顺序扫描。当然,作为交换,您需要处理在插入、更新和删除时保持数据页的顺序的开销。然而,堆表需要对IAM进行第二次写入。

If your query has a RANGE operator (e.g.: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.

如果查询有一个RANGE操作符(例如:SELECT * FROM表,其中Id在1到100之间),那么集群表(按保证的顺序)将更有效——因为它可以使用索引页查找相关的数据页。堆必须扫描所有的行,因为它不能依赖于排序。

And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan.

当然,集群索引允许您进行集群索引查找,这对于性能是非常理想的……没有索引的堆总是会导致表扫描。

So:

所以:

  • For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.

    对于选择所有行的示例查询,惟一的区别是集群索引维护的双链接列表。这将使您的集群表仅比具有大量行数的堆快一点。

  • For a query with a WHERE clause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.

    对于具有WHERE子句的查询,集群索引可以(至少部分地)满足该查询,由于排序的原因,您将获得领先——因此您不必扫描整个表。

  • For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.

    对于没有被聚集索引所满足的查询,几乎是均匀的……再一次,唯一的区别是连续扫描的双链表。无论哪种情况,你都是次优的。

  • For INSERT, UPDATE, and DELETE a heap may or may not win. The heap doesn't have to maintain order, but does require a second write to the IAM. I think the relative performance difference would be negligible, but also pretty data dependent.

    对于插入、更新和删除堆,可能会成功,也可能不会成功。堆不需要保持顺序,但是需要对IAM进行第二次写入。我认为相对性能差异可以忽略不计,但也与数据相关。

Microsoft has a whitepaper which compares a clustered index to an equivalent non-clustered index on a heap (not exactly the same as I discussed above, but close). Their conclusion is basically to put a clustered index on all tables. I'll do my best to summarize their results (again, note that they're really comparing a non-clustered index to a clustered index here - but I think it's relatively comparable):

微软有一份白皮书,将集群索引与堆上等效的非集群索引进行比较(与我上面讨论的不完全相同,但很接近)。他们的结论基本上是在所有表上放置集群索引。我将尽最大努力总结它们的结果(再次指出,它们实际上是在比较非聚集索引和聚集索引——但我认为它们是相对可比的):

  • INSERT performance: clustered index wins by about 3% due to the second write needed for a heap.
  • 插入性能:由于堆需要进行第二次写操作,集群索引获胜大约3%。
  • UPDATE performance: clustered index wins by about 8% due to the second lookup needed for a heap.
  • 更新性能:由于堆需要进行第二次查找,集群索引获胜的概率大约为8%。
  • DELETE performance: clustered index wins by about 18% due to the second lookup needed and the second delete needed from the IAM for a heap.
  • 删除性能:由于需要第二次查找和第二次从IAM中删除堆,集群索引获胜18%。
  • single SELECT performance: clustered index wins by about 16% due to the second lookup needed for a heap.
  • 单选择性能:由于堆需要进行第二次查找,集群索引获胜约16%。
  • range SELECT performance: clustered index wins by about 29% due to the random ordering for a heap.
  • 范围选择性能:集群索引由于堆的随机排序而获得约29%的优势。
  • concurrent INSERT: heap table wins by 30% under load due to page splits for the clustered index.
  • 并发插入:由于集群索引的页分割,在负载下堆表获胜30%。

#2


4  

http://msdn.microsoft.com/en-us/library/aa216840(SQL.80).aspx

http://msdn.microsoft.com/en-us/library/aa216840(SQL.80). aspx

The Clustered Index Scan logical and physical operator scans the clustered index specified in the Argument column. When an optional WHERE:() predicate is present, only those rows that satisfy the predicate are returned. If the Argument column contains the ORDERED clause, the query processor has requested that the rows' output be returned in the order in which the clustered index has sorted them. If the ORDERED clause is not present, the storage engine will scan the index in the optimal way (not guaranteeing the output to be sorted).

群集索引扫描逻辑和物理操作符扫描参数列中指定的群集索引。当一个可选的WHERE:()谓词存在时,只返回满足谓词的那些行。如果参数列包含有序子句,查询处理器请求按照集群索引对其排序的顺序返回行输出。如果不存在order子句,则存储引擎将以最佳方式扫描索引(不保证输出被排序)。

http://msdn.microsoft.com/en-us/library/aa178416(SQL.80).aspx

http://msdn.microsoft.com/en-us/library/aa178416(SQL.80). aspx

The Table Scan logical and physical operator retrieves all rows from the table specified in the Argument column. If a WHERE:() predicate appears in the Argument column, only those rows that satisfy the predicate are returned.

表扫描逻辑和物理操作符从参数列中指定的表中检索所有行。如果参数列中出现了WHERE:()谓词,则只返回满足谓词的行。

#3


-2  

A table scan has to examine every single row of the table. The clustered index scan only needs to scan the index. It doesn't scan every record in the table. That's the point, really, of indices.

表扫描必须检查表的每一行。群集索引扫描只需要扫描索引。它不会扫描表中的每条记录。这就是指标的重点。

#1


73  

In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map.

在没有集群索引(堆表)的表中,数据页不会链接在一起——因此遍历页需要查找索引分配映射。

A clustered table, however, has it's data pages linked in a doubly linked list - making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on INSERT, UPDATE, and DELETE. A heap table, however, requires a second write to the IAM.

但是,集群表将数据页链接到一个双链表中,这样可以更快地进行顺序扫描。当然,作为交换,您需要处理在插入、更新和删除时保持数据页的顺序的开销。然而,堆表需要对IAM进行第二次写入。

If your query has a RANGE operator (e.g.: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.

如果查询有一个RANGE操作符(例如:SELECT * FROM表,其中Id在1到100之间),那么集群表(按保证的顺序)将更有效——因为它可以使用索引页查找相关的数据页。堆必须扫描所有的行,因为它不能依赖于排序。

And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan.

当然,集群索引允许您进行集群索引查找,这对于性能是非常理想的……没有索引的堆总是会导致表扫描。

So:

所以:

  • For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.

    对于选择所有行的示例查询,惟一的区别是集群索引维护的双链接列表。这将使您的集群表仅比具有大量行数的堆快一点。

  • For a query with a WHERE clause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.

    对于具有WHERE子句的查询,集群索引可以(至少部分地)满足该查询,由于排序的原因,您将获得领先——因此您不必扫描整个表。

  • For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.

    对于没有被聚集索引所满足的查询,几乎是均匀的……再一次,唯一的区别是连续扫描的双链表。无论哪种情况,你都是次优的。

  • For INSERT, UPDATE, and DELETE a heap may or may not win. The heap doesn't have to maintain order, but does require a second write to the IAM. I think the relative performance difference would be negligible, but also pretty data dependent.

    对于插入、更新和删除堆,可能会成功,也可能不会成功。堆不需要保持顺序,但是需要对IAM进行第二次写入。我认为相对性能差异可以忽略不计,但也与数据相关。

Microsoft has a whitepaper which compares a clustered index to an equivalent non-clustered index on a heap (not exactly the same as I discussed above, but close). Their conclusion is basically to put a clustered index on all tables. I'll do my best to summarize their results (again, note that they're really comparing a non-clustered index to a clustered index here - but I think it's relatively comparable):

微软有一份白皮书,将集群索引与堆上等效的非集群索引进行比较(与我上面讨论的不完全相同,但很接近)。他们的结论基本上是在所有表上放置集群索引。我将尽最大努力总结它们的结果(再次指出,它们实际上是在比较非聚集索引和聚集索引——但我认为它们是相对可比的):

  • INSERT performance: clustered index wins by about 3% due to the second write needed for a heap.
  • 插入性能:由于堆需要进行第二次写操作,集群索引获胜大约3%。
  • UPDATE performance: clustered index wins by about 8% due to the second lookup needed for a heap.
  • 更新性能:由于堆需要进行第二次查找,集群索引获胜的概率大约为8%。
  • DELETE performance: clustered index wins by about 18% due to the second lookup needed and the second delete needed from the IAM for a heap.
  • 删除性能:由于需要第二次查找和第二次从IAM中删除堆,集群索引获胜18%。
  • single SELECT performance: clustered index wins by about 16% due to the second lookup needed for a heap.
  • 单选择性能:由于堆需要进行第二次查找,集群索引获胜约16%。
  • range SELECT performance: clustered index wins by about 29% due to the random ordering for a heap.
  • 范围选择性能:集群索引由于堆的随机排序而获得约29%的优势。
  • concurrent INSERT: heap table wins by 30% under load due to page splits for the clustered index.
  • 并发插入:由于集群索引的页分割,在负载下堆表获胜30%。

#2


4  

http://msdn.microsoft.com/en-us/library/aa216840(SQL.80).aspx

http://msdn.microsoft.com/en-us/library/aa216840(SQL.80). aspx

The Clustered Index Scan logical and physical operator scans the clustered index specified in the Argument column. When an optional WHERE:() predicate is present, only those rows that satisfy the predicate are returned. If the Argument column contains the ORDERED clause, the query processor has requested that the rows' output be returned in the order in which the clustered index has sorted them. If the ORDERED clause is not present, the storage engine will scan the index in the optimal way (not guaranteeing the output to be sorted).

群集索引扫描逻辑和物理操作符扫描参数列中指定的群集索引。当一个可选的WHERE:()谓词存在时,只返回满足谓词的那些行。如果参数列包含有序子句,查询处理器请求按照集群索引对其排序的顺序返回行输出。如果不存在order子句,则存储引擎将以最佳方式扫描索引(不保证输出被排序)。

http://msdn.microsoft.com/en-us/library/aa178416(SQL.80).aspx

http://msdn.microsoft.com/en-us/library/aa178416(SQL.80). aspx

The Table Scan logical and physical operator retrieves all rows from the table specified in the Argument column. If a WHERE:() predicate appears in the Argument column, only those rows that satisfy the predicate are returned.

表扫描逻辑和物理操作符从参数列中指定的表中检索所有行。如果参数列中出现了WHERE:()谓词,则只返回满足谓词的行。

#3


-2  

A table scan has to examine every single row of the table. The clustered index scan only needs to scan the index. It doesn't scan every record in the table. That's the point, really, of indices.

表扫描必须检查表的每一行。群集索引扫描只需要扫描索引。它不会扫描表中的每条记录。这就是指标的重点。