为什么/何时/如何选择整个聚簇索引扫描而不是全表扫描?

时间:2020-12-18 02:47:21

IMO, please correct me...
the leaf of clustered index contains the real table row, so full clustered index, with intermediate leaves, contain much more data than the full table(?)
Why/when/how is ever whole clustered index scan chosen over the full table scan?

IMO,请纠正我...聚集索引的叶子包含真正的表行,所以带有中间叶子的完整聚簇索引包含比完整表更多的数据(?)为什么/何时/如何进行整个聚簇索引扫描选择全表扫描?

How is clustered index on CUSTOMER_ID column used in SELECT query which does not contain it in either SELECT list or in WHERE condition [1]?

SELECT查询中使用的CUSTOMER_ID列上的聚簇索引如何在SELECT列表或WHERE条件[1]中不包含它?

Update:
Should I understand that full clustered scan is faster than full table scan because "Each data page contains pointers to the next and previous leaf node page so the scan does not need to use the higher level pages in the index"?
Are there any other reasons like (non-participating in query) clustered index is used in sorting?

更新:我是否应该了解完整集群扫描比全表扫描更快,因为“每个数据页都包含指向下一个和上一个叶节点页面的指针,因此扫描不需要使用索引中的更高级别页面”?有没有其他原因如(非参与查询)聚集索引用于排序?

Update2:
As afterthought, consecutive access cannot give performance boost while loading table through IAM pointers can be parallelized.
Does clustered index scan imply consecutive page reading?
Does clustered table imply absence of IAM pointers (impossibility of full table scan)?
Why cannot clustered table be full table scanned?
I still do not understand how/why clustered index full scan can be "better" over full table scan.
Does it mean that having clustered index can result in performance worsening?

Update2:由于事后的想法,连续访问不能提高性能,而通过IAM指针加载表可以并行化。聚簇索引扫描是否意味着连续页面读取?集群表是否意味着没有IAM指针(全表扫描不可能)?为什么群集表不能被全表扫描?我仍然不明白为什么/为什么聚簇索引全扫描可以比全表扫描“更好”。这是否意味着拥有聚集索引会导致性能恶化?

The question is about clustered table not heap (non-indexed) table.

问题是关于聚簇表而不是堆(非索引)表。

Update3:
Is "full clustered index scan" really synonym to "full table scan"?
What are differences?

Update3:“完全聚集索引扫描”真的是“全表扫描”的同义词吗?有什么区别?

[1] Index Covering Boosts SQL Server Query Performance
http://www.devx.com/dbzone/Article/29530

[1]索引覆盖提升SQL Server查询性能http://www.devx.com/dbzone/Article/29530

3 个解决方案

#1


0  

Please read my answer under "No direct access to data row in clustered table - why?", first.

请首先阅读我的回答“无法直接访问群集表中的数据行 - 为什么?”。

"the leaf of clustered index contains the real table row, so full clustered index, with intermediate leaves, contain much more data than the full table(?)"

“聚簇索引的叶子包含真实的表行,所以带有中间叶子的完整聚簇索引包含比完整表(?)更多的数据”

See you are mixing up "Table" with storage structures. In the context of your question, eg. thinking about the size of the CI as opposed to the "table", well then you must think about the CI minus the leaf level (which is the data row). The CI, index portion only, is tiny. The intermediate levels (like any B-Tree) contain partial (not full) key entries; it excludes the lowest level, which is the full key entry, which sits in the row itself, and is not duplicated.

看到你正在混合“表”与存储结构。在您的问题的背景下,例如。考虑CI的大小而不是“表”,那么你必须考虑CI减去叶级(这是数据行)。仅CI,索引部分很小。中间级别(如任何B树)包含部分(非完整)密钥条目;它排除了最低级别,即完整的键入口,它位于行本身,并且不重复。

The table (full CI) may be 10GB. The CI only may be 10MB. There is an awful lot that can be determined from the 10MB without having to go to the 100GB.

表(完整CI)可能是10GB。 CI仅为10MB。可以从10MB确定很多,而不必去100GB。

For understanding: the equivalent NCI on the same table (CI) may be 22MB; the equivalent NCI on the same table if you removed the CI may be 21.5MB (assuming the CI key is reasonable, not fat wide).

为了解:同一个表(CI)上的等效NCI可能是22MB;如果删除CI,则同一表上的等效NCI可能为21.5MB(假设CI密钥合理,不宽胖)。

"Why/when/how is ever whole clustered index scan chosen over the full table scan?"

“为什么/何时/如何在全表扫描中选择整个聚簇索引扫描?”

Quite often. Again the context is, we are talking about the CI-minus-Leaf levels. For queries that use only the columns in the CI, the presence of those columns in the CI (any index actually) allow the query to be a "covered query", which means it can by serviced wholly from the index, no need to go to the data rows. Think range scans on partial keys: BETWEEN x AND yY; x <= y; etc.

经常。在上下文中,我们正在谈论CI-minus-Leaf水平。对于仅使用CI中的列的查询,CI中的那些列(实际上是任何索引)的存在允许查询成为“覆盖查询”,这意味着它可以完全从索引服务,无需去到数据行。思考部分键的范围扫描:BETWEEN x和yY; x <= y;等等

(There is always the chance that the optimiser will choose a table scan, when you think it should choose an index scan, bu t that is a different story.)

(当您认为应该选择索引扫描时,优化器总是有可能选择表扫描,这是一个不同的故事。)

"I still do not understand how/why clustered index full scan can be "better" over full table scan."

“我仍然不明白为什么/为什么聚集索引全扫描可以比全表扫描”更好“。”

(The terms used by MS are less precise than my answers here.) For any query that can be answered from the 10MB CI, I would much rather churn 10MB through the data cache, than 100GB. For the same queries, bounded by a range on the CI key, that's a fraction of the 10MB.

(MS使用的术语不如我在这里的答案精确。)对于任何可以从10MB CI回答的查询,我宁愿通过数据缓存流失10MB,而不是100GB。对于相同的查询,以CI键上的范围为界,这只是10MB的一小部分。

For queries that requires a "full table scan", well yes, you must read all the Leaf pages of the CI, which is the 100GB.

对于需要“全表扫描”的查询,是的,您必须阅读CI的所有Leaf页面,即100GB。

#2


2  

The clustered index - or more precisely: its leaf pages ARE the table data - so a clustered index scan really is the same as a table scan (for a table with a clustered index).

聚簇索引 - 或者更确切地说:它的叶子页面是表数据 - 因此聚簇索引扫描实际上与表扫描相同(对于具有聚簇索引的表)。

If you don't have a clustered index, then your table is a heap - obviously, in this case, if you need to look at all the data, you cannot do a clustered index scan since there is no clustered index, so you'll end up with a table scan which just touches all data pages for that heap table.

如果你没有聚集索引,那么你的表就是一个堆 - 显然,在这种情况下,如果你需要查看所有数据,你就不能进行聚簇索引扫描,因为没有聚簇索引,所以你'最后会有一个表扫描,它只触及该堆表的所有数据页。

#3


2  

The leaf level of a clustered index is the table. "Table Scan" refers to a heap without a clustered index.

聚簇索引的叶级别是表。 “表扫描”是指没有聚簇索引的堆。

Each data page contains pointers to the next and previous leaf node page so the scan does not need to use the higher level pages in the index.

每个数据页面都包含指向下一个和上一个叶节点页面的指针,因此扫描不需要使用索引中的更高级别页面。

#1


0  

Please read my answer under "No direct access to data row in clustered table - why?", first.

请首先阅读我的回答“无法直接访问群集表中的数据行 - 为什么?”。

"the leaf of clustered index contains the real table row, so full clustered index, with intermediate leaves, contain much more data than the full table(?)"

“聚簇索引的叶子包含真实的表行,所以带有中间叶子的完整聚簇索引包含比完整表(?)更多的数据”

See you are mixing up "Table" with storage structures. In the context of your question, eg. thinking about the size of the CI as opposed to the "table", well then you must think about the CI minus the leaf level (which is the data row). The CI, index portion only, is tiny. The intermediate levels (like any B-Tree) contain partial (not full) key entries; it excludes the lowest level, which is the full key entry, which sits in the row itself, and is not duplicated.

看到你正在混合“表”与存储结构。在您的问题的背景下,例如。考虑CI的大小而不是“表”,那么你必须考虑CI减去叶级(这是数据行)。仅CI,索引部分很小。中间级别(如任何B树)包含部分(非完整)密钥条目;它排除了最低级别,即完整的键入口,它位于行本身,并且不重复。

The table (full CI) may be 10GB. The CI only may be 10MB. There is an awful lot that can be determined from the 10MB without having to go to the 100GB.

表(完整CI)可能是10GB。 CI仅为10MB。可以从10MB确定很多,而不必去100GB。

For understanding: the equivalent NCI on the same table (CI) may be 22MB; the equivalent NCI on the same table if you removed the CI may be 21.5MB (assuming the CI key is reasonable, not fat wide).

为了解:同一个表(CI)上的等效NCI可能是22MB;如果删除CI,则同一表上的等效NCI可能为21.5MB(假设CI密钥合理,不宽胖)。

"Why/when/how is ever whole clustered index scan chosen over the full table scan?"

“为什么/何时/如何在全表扫描中选择整个聚簇索引扫描?”

Quite often. Again the context is, we are talking about the CI-minus-Leaf levels. For queries that use only the columns in the CI, the presence of those columns in the CI (any index actually) allow the query to be a "covered query", which means it can by serviced wholly from the index, no need to go to the data rows. Think range scans on partial keys: BETWEEN x AND yY; x <= y; etc.

经常。在上下文中,我们正在谈论CI-minus-Leaf水平。对于仅使用CI中的列的查询,CI中的那些列(实际上是任何索引)的存在允许查询成为“覆盖查询”,这意味着它可以完全从索引服务,无需去到数据行。思考部分键的范围扫描:BETWEEN x和yY; x <= y;等等

(There is always the chance that the optimiser will choose a table scan, when you think it should choose an index scan, bu t that is a different story.)

(当您认为应该选择索引扫描时,优化器总是有可能选择表扫描,这是一个不同的故事。)

"I still do not understand how/why clustered index full scan can be "better" over full table scan."

“我仍然不明白为什么/为什么聚集索引全扫描可以比全表扫描”更好“。”

(The terms used by MS are less precise than my answers here.) For any query that can be answered from the 10MB CI, I would much rather churn 10MB through the data cache, than 100GB. For the same queries, bounded by a range on the CI key, that's a fraction of the 10MB.

(MS使用的术语不如我在这里的答案精确。)对于任何可以从10MB CI回答的查询,我宁愿通过数据缓存流失10MB,而不是100GB。对于相同的查询,以CI键上的范围为界,这只是10MB的一小部分。

For queries that requires a "full table scan", well yes, you must read all the Leaf pages of the CI, which is the 100GB.

对于需要“全表扫描”的查询,是的,您必须阅读CI的所有Leaf页面,即100GB。

#2


2  

The clustered index - or more precisely: its leaf pages ARE the table data - so a clustered index scan really is the same as a table scan (for a table with a clustered index).

聚簇索引 - 或者更确切地说:它的叶子页面是表数据 - 因此聚簇索引扫描实际上与表扫描相同(对于具有聚簇索引的表)。

If you don't have a clustered index, then your table is a heap - obviously, in this case, if you need to look at all the data, you cannot do a clustered index scan since there is no clustered index, so you'll end up with a table scan which just touches all data pages for that heap table.

如果你没有聚集索引,那么你的表就是一个堆 - 显然,在这种情况下,如果你需要查看所有数据,你就不能进行聚簇索引扫描,因为没有聚簇索引,所以你'最后会有一个表扫描,它只触及该堆表的所有数据页。

#3


2  

The leaf level of a clustered index is the table. "Table Scan" refers to a heap without a clustered index.

聚簇索引的叶级别是表。 “表扫描”是指没有聚簇索引的堆。

Each data page contains pointers to the next and previous leaf node page so the scan does not need to use the higher level pages in the index.

每个数据页面都包含指向下一个和上一个叶节点页面的指针,因此扫描不需要使用索引中的更高级别页面。