向表中添加非聚集索引以提高性能

时间:2020-12-18 02:47:27

I have table structure as below

我有下面的表格结构

CREATE TABLE [dbo].[AIRQUALITYTS2]
(
    [FeatureID] [nvarchar](20) NOT NULL,
    [ParameterID] [nvarchar](20) NOT NULL,
    [MeasurementDateTime] [datetime2](7) NOT NULL,
    [ParameterValue] [numeric](38, 8) NULL,
    [Remarks] [nvarchar](150) NULL,

    CONSTRAINT [PK_AIRQUALITYTS2] 
        PRIMARY KEY CLUSTERED ([FeatureID] ASC, [ParameterID] ASC, [MeasurementDateTime] ASC)
                    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
                          IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, 
                          ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

When I execute this query:

当我执行这个查询时:

set statistics io on

SELECT 
    COUNT(featureid), featureid 
FROM
    AIRQUALITYTS2 
WHERE
    FeatureID LIKE 'AS%' 
    AND ParameterID = 'AP2' 
    AND YEAR(MeasurementDateTime) = 2015
GROUP BY 
    FeatureID
ORDER BY 
    FeatureID

I see the logical records 101871 and query execution plan is

我看到逻辑记录101871和查询执行计划是

向表中添加非聚集索引以提高性能

But when I add non-clustered index on this table as

但是当我在这个表上添加非聚集索引as时

 CREATE NONCLUSTERED INDEX non_fidpidmdate
     ON [dbo].[AIRQUALITYTS2] ([ParameterID], [FeatureID])
     INCLUDE ([MeasurementDateTime])

When I execute same query I see logical records reads 4636 only and is very fast and query execution plan is

当我执行相同的查询时,我看到逻辑记录只读取4636,并且非常快,而查询执行计划是

向表中添加非聚集索引以提高性能

Question 1: when there is less logical records in second query.

问题1:当第二个查询中逻辑记录较少时。

Question 2: Why first query is using clustered index scan as displayed in first image though it has clustered index on featureid,ParameterID and MeasurementDateTime while after adding non-cluster index it uses Index Seek (Non-Clustered) second image as displayed in images

问题2:为什么第一个查询使用第一个图像中显示的聚集索引扫描,尽管它在特征id、参数id和MeasurementDateTime上有聚集索引,而在添加非聚集索引之后,它使用的是图像中显示的索引查找(非聚集)第二个图像

Note: I have change where clause to

注意:我已经更改了where子句。

MeasurementDateTime >= '2004-01-01 00:00:00' 
and MeasurementDateTime <= '2004-12-31 00:00:00' 

to make it sargable but still the results are the same.

为了使它具有可sargable,但结果仍然是相同的。

2 个解决方案

#1


1  

  1. In your original CREATE TABLE where you create the PRIMARY KEY CLUSTERED, it specifies the columns to cluster on, in the order they are clustered (stored) in.
  2. 在创建主键集群的原始创建表中,它指定要集群的列,按照它们被集群(存储)的顺序。
[FeatureID]
[ParameterID]
[MeasurementDateTime]

If you run a query with a WHERE clause that includes a specific FeatureID then it would be able to seek to that part of the index.

如果您使用包含特定特性id的WHERE子句运行查询,那么它将能够查找索引的那个部分。

But you haven't done that in the query. You've used WHERE FeatureID LIKE 'AS%' ...

但在查询中没有这样做。您已经使用了FeatureID如“AS%”……

The query engine cannot seek, because that LIKE with a trailing wildcard % means it has to scan across all the FeatureIDs that start with the letters AS and then within each of those nodes in the tree see if there are records that match ParameterID = 'AP2' AND YEAR(MeasurementDateTime) = 2015.

查询引擎不能寻求,因为像拖通配符%,意味着它必须扫描所有FeatureIDs从字母开始,然后在树中的每个节点是否有记录匹配ParameterID = AP2和年(MeasurementDateTime)= 2015。

  1. In your Non-Clustered index, you've used a different column order:
  2. 在非聚集索引中,使用了不同的列顺序:
[ParameterID]
[FeatureID]

When you run the same query, it can seek because you've specified an exact ParameterID in the WHERE clause.

当您运行相同的查询时,它可以查找,因为您在WHERE子句中指定了一个确切的参数id。

Ordering is important! SQL Indexes are sortof B-Tree data structures, and you can't physically store them (or traverse them) in different orderings without creating multiple indexes. Creating too many indexes can be too much overhead for the database, so yes create ones that help the majority of your queries, but don't create too many. Mostly this involves knowing what sort of queries are frequently run against your database and tuning accordingly.

顺序很重要!SQL索引是B-Tree数据结构的排序,如果不创建多个索引,就无法在不同的排序中物理地存储(或遍历)它们。创建过多的索引可能会给数据库带来过多的开销,所以yes创建的索引可以帮助大多数查询,但是不要创建过多的索引。这主要涉及了解在数据库中经常运行的查询类型,并相应地进行调优。

#2


2  

For question 1: since your index is covering (it contains all the data the query wants to retrieve, and needs for querying and ordering), the query can be run entirely against the index (and its data pages) and use a seek, which obviously loads a great deal fewer pages from disk than when scanning the whole table (clustered index scan = table scan) with all its data.

问题1:因为你的指数覆盖(它包含的所有数据查询想要检索,和需要查询和排序),查询可以完全运行指数(及其数据页)和使用要求,这显然页面从磁盘加载大量少于当扫描整个表(集群索引扫描=表扫描)和所有的数据。

Not sure what you mean with your question #2 ....

无法确定你的意思,你的问题# 2 ....

#1


1  

  1. In your original CREATE TABLE where you create the PRIMARY KEY CLUSTERED, it specifies the columns to cluster on, in the order they are clustered (stored) in.
  2. 在创建主键集群的原始创建表中,它指定要集群的列,按照它们被集群(存储)的顺序。
[FeatureID]
[ParameterID]
[MeasurementDateTime]

If you run a query with a WHERE clause that includes a specific FeatureID then it would be able to seek to that part of the index.

如果您使用包含特定特性id的WHERE子句运行查询,那么它将能够查找索引的那个部分。

But you haven't done that in the query. You've used WHERE FeatureID LIKE 'AS%' ...

但在查询中没有这样做。您已经使用了FeatureID如“AS%”……

The query engine cannot seek, because that LIKE with a trailing wildcard % means it has to scan across all the FeatureIDs that start with the letters AS and then within each of those nodes in the tree see if there are records that match ParameterID = 'AP2' AND YEAR(MeasurementDateTime) = 2015.

查询引擎不能寻求,因为像拖通配符%,意味着它必须扫描所有FeatureIDs从字母开始,然后在树中的每个节点是否有记录匹配ParameterID = AP2和年(MeasurementDateTime)= 2015。

  1. In your Non-Clustered index, you've used a different column order:
  2. 在非聚集索引中,使用了不同的列顺序:
[ParameterID]
[FeatureID]

When you run the same query, it can seek because you've specified an exact ParameterID in the WHERE clause.

当您运行相同的查询时,它可以查找,因为您在WHERE子句中指定了一个确切的参数id。

Ordering is important! SQL Indexes are sortof B-Tree data structures, and you can't physically store them (or traverse them) in different orderings without creating multiple indexes. Creating too many indexes can be too much overhead for the database, so yes create ones that help the majority of your queries, but don't create too many. Mostly this involves knowing what sort of queries are frequently run against your database and tuning accordingly.

顺序很重要!SQL索引是B-Tree数据结构的排序,如果不创建多个索引,就无法在不同的排序中物理地存储(或遍历)它们。创建过多的索引可能会给数据库带来过多的开销,所以yes创建的索引可以帮助大多数查询,但是不要创建过多的索引。这主要涉及了解在数据库中经常运行的查询类型,并相应地进行调优。

#2


2  

For question 1: since your index is covering (it contains all the data the query wants to retrieve, and needs for querying and ordering), the query can be run entirely against the index (and its data pages) and use a seek, which obviously loads a great deal fewer pages from disk than when scanning the whole table (clustered index scan = table scan) with all its data.

问题1:因为你的指数覆盖(它包含的所有数据查询想要检索,和需要查询和排序),查询可以完全运行指数(及其数据页)和使用要求,这显然页面从磁盘加载大量少于当扫描整个表(集群索引扫描=表扫描)和所有的数据。

Not sure what you mean with your question #2 ....

无法确定你的意思,你的问题# 2 ....