为什么在计算表中的所有行时是否存在非聚集索引扫描?

时间:2021-12-27 02:47:11

As far as I understand it, each transaction sees its own version of the database, so the system cannot get the total number of rows from some counter and thus needs to scan an index. But I thought it would be the clustered index on the primary key, not the additional indexes. If I had more than one additional index, which one will be chosen, anyway?

据我所知,每个事务都会看到自己的数据库版本,因此系统无法从某个计数器获取总行数,因此需要扫描索引。但我认为这将是主键上的聚簇索引,而不是其他索引。如果我有一个以上的附加索引,那么将选择哪一个?

When digging into the matter, I've noticed another strange thing. Suppose there are two identical tables, Articles and Articles2, each with three columns: Id, View_Count, and Title. The first has only a clustered PK-based index, while the second one has an additional non-clustered, non-unique index on view_count. The query SELECT COUNT(1) FROM Articles runs 2 times faster for the table with the additional index.

在深入研究此事时,我注意到了另一件奇怪的事情。假设有两个相同的表,Article和Articles2,每个表有三列:Id,View_Count和Title。第一个只有一个基于PK的聚簇索引,而第二个索引在view_count上有一个额外的非聚集,非唯一索引。对于具有附加索引的表,查询SELECT COUNT(1)FROM Articles的运行速度提高了2倍。

1 个解决方案

#1


9  

SQL Server will optimize your query - if it needs to count the rows in a table, it will choose the smallest possible set of data to do so.

SQL Server将优化您的查询 - 如果需要计算表中的行,它将选择尽可能小的数据集。

So if you consider your clustered index - it contains the actual data pages - possibly several thousand bytes per row. To load all those bytes just to count the rows would be wasteful - even just in terms of disk I/O.

因此,如果您考虑聚集索引 - 它包含实际数据页 - 每行可能有几千个字节。要加载所有这些字节只是为了计算行将是浪费 - 即使只是在磁盘I / O方面。

Therefore, it there is a non-clustered index that's not filtered or restricted in any way, SQL Server will pick that data structure to count - since the non-clustered index basically contains the columns you've put into the NC index (plus the clustered index key) - much less data to load just to count the number of rows.

因此,有一个非聚集索引没有以任何方式过滤或限制,SQL Server将选择该数据结构进行计数 - 因为非聚集索引基本上包含您放入NC索引的列(加上聚集索引键) - 加载的数据少得多,只计算行数。

#1


9  

SQL Server will optimize your query - if it needs to count the rows in a table, it will choose the smallest possible set of data to do so.

SQL Server将优化您的查询 - 如果需要计算表中的行,它将选择尽可能小的数据集。

So if you consider your clustered index - it contains the actual data pages - possibly several thousand bytes per row. To load all those bytes just to count the rows would be wasteful - even just in terms of disk I/O.

因此,如果您考虑聚集索引 - 它包含实际数据页 - 每行可能有几千个字节。要加载所有这些字节只是为了计算行将是浪费 - 即使只是在磁盘I / O方面。

Therefore, it there is a non-clustered index that's not filtered or restricted in any way, SQL Server will pick that data structure to count - since the non-clustered index basically contains the columns you've put into the NC index (plus the clustered index key) - much less data to load just to count the number of rows.

因此,有一个非聚集索引没有以任何方式过滤或限制,SQL Server将选择该数据结构进行计数 - 因为非聚集索引基本上包含您放入NC索引的列(加上聚集索引键) - 加载的数据少得多,只计算行数。