为什么查询中没有使用特定索引?

时间:2021-04-23 00:09:44

I have a table named Workflow. It has 37M rows in it. There is a primary key on the ID column (int) plus an additional column. The ID column is the first column in the index.

我有一个名为Workflow的表。它有37M行。 ID列(int)上有一个主键,还有一个附加列。 ID列是索引中的第一列。

If I execute the following query, the PK is not used (unless I use an index hint)

如果我执行以下查询,则不使用PK(除非我使用索引提示)

Select Distinct(SubID) From Workflow Where ID >= @LastSeenWorkflowID

If I execute this query instead, the PK is used

如果我执行此查询,则使用PK

Select Distinct(SubID) From Workflow Where ID >= 786400000

I suspect the problem is with using the parameter value in the query (which I have to do). I really don't want to use an index hint. Is there a workaround for this?

我怀疑问题是在查询中使用参数值(我必须这样做)。我真的不想使用索引提示。这有解决方法吗?

3 个解决方案

#1


3  

Please post the execution plan(s), as well as the exact table definition, including all indexes.

请发布执行计划以及确切的表定义,包括所有索引。

When you use a variable the optimizer does no know what selectivity the query will have, the @LastSeenWorkflowID may filter out all but very last few rows in Workflow, or it may include them all. The generated plan has to work in both situations. There is a threshold at which the range seek over the clustered index is becoming more expensive than a full scan over a non-clustered index, simply because the clustered index is so much wider (it includes every column in the leaf levels) and thus has so much more pages to iterate over. The plan generated, which considers an unknown value for @LastSeenWorkflowID, is likely crossing that threshold in estimating the cost of the clustered index seek and as such it chooses the scan over the non-clustered index.

当您使用变量时,优化器不知道查询将具有什么选择性,@ LastSeenWorkflowID可能会过滤掉工作流中除了最后几行之外的所有行,或者它可能包含所有行。生成的计划必须适用于这两种情况。有一个阈值,在该阈值上,聚集索引的范围搜索变得比非聚簇索引上的完全扫描更昂贵,这仅仅是因为聚簇索引更宽(它包括叶级中的每一列),因此具有要迭代的页面要多得多。生成的计划考虑了@LastSeenWorkflowID的未知值,可能在估计聚簇索引查找的成本时超过该阈值,因此它选择扫描非聚集索引。

You could provide a narrow index that is aimed specifically at this query:

您可以提供专门针对此查询的窄索引:

CREATE INDEX WorkflowSubId ON Workflow(ID, SubId);

or:

CREATE INDEX WorkflowSubId ON Workflow(ID) INCLUDE (SubId);

Such an index is too-good-to-pass for your query, no matter the value of @LastSeenWorkflowID.

无论@LastSeenWorkflowID的值如何,这样的索引对于您的查询来说太好了。

#2


2  

Assuming your PK is an identity OR is always greater than 0, perhaps you could try this:

假设你的PK是一个身份或者总是大于0,也许你可以试试这个:

Select Distinct(SubID) 
From Workflow 
Where ID >= @LastSeenWorkflowID
    And ID > 0

By adding the 2nd condition, it may cause the optimizer to use an index seek.

通过添加第二个条件,可能会导致优化器使用索引查找。

#3


0  

This is a classic example of local variable producing a sub-optimal plan.

这是产生次优计划的局部变量的典型示例。

You should use OPTION (RECOMPILE) in order to compile your query with the actual parameter value of ID.

您应该使用OPTION(RECOMPILE)以使用ID的实际参数值编译查询。

See my blog post for more information: http://www.sqlbadpractices.com/using-local-variables-in-t-sql-queries/

有关更多信息,请参阅我的博客文章:http://www.sqlbadpractices.com/using-local-variables-in-t-sql-queries/

#1


3  

Please post the execution plan(s), as well as the exact table definition, including all indexes.

请发布执行计划以及确切的表定义,包括所有索引。

When you use a variable the optimizer does no know what selectivity the query will have, the @LastSeenWorkflowID may filter out all but very last few rows in Workflow, or it may include them all. The generated plan has to work in both situations. There is a threshold at which the range seek over the clustered index is becoming more expensive than a full scan over a non-clustered index, simply because the clustered index is so much wider (it includes every column in the leaf levels) and thus has so much more pages to iterate over. The plan generated, which considers an unknown value for @LastSeenWorkflowID, is likely crossing that threshold in estimating the cost of the clustered index seek and as such it chooses the scan over the non-clustered index.

当您使用变量时,优化器不知道查询将具有什么选择性,@ LastSeenWorkflowID可能会过滤掉工作流中除了最后几行之外的所有行,或者它可能包含所有行。生成的计划必须适用于这两种情况。有一个阈值,在该阈值上,聚集索引的范围搜索变得比非聚簇索引上的完全扫描更昂贵,这仅仅是因为聚簇索引更宽(它包括叶级中的每一列),因此具有要迭代的页面要多得多。生成的计划考虑了@LastSeenWorkflowID的未知值,可能在估计聚簇索引查找的成本时超过该阈值,因此它选择扫描非聚集索引。

You could provide a narrow index that is aimed specifically at this query:

您可以提供专门针对此查询的窄索引:

CREATE INDEX WorkflowSubId ON Workflow(ID, SubId);

or:

CREATE INDEX WorkflowSubId ON Workflow(ID) INCLUDE (SubId);

Such an index is too-good-to-pass for your query, no matter the value of @LastSeenWorkflowID.

无论@LastSeenWorkflowID的值如何,这样的索引对于您的查询来说太好了。

#2


2  

Assuming your PK is an identity OR is always greater than 0, perhaps you could try this:

假设你的PK是一个身份或者总是大于0,也许你可以试试这个:

Select Distinct(SubID) 
From Workflow 
Where ID >= @LastSeenWorkflowID
    And ID > 0

By adding the 2nd condition, it may cause the optimizer to use an index seek.

通过添加第二个条件,可能会导致优化器使用索引查找。

#3


0  

This is a classic example of local variable producing a sub-optimal plan.

这是产生次优计划的局部变量的典型示例。

You should use OPTION (RECOMPILE) in order to compile your query with the actual parameter value of ID.

您应该使用OPTION(RECOMPILE)以使用ID的实际参数值编译查询。

See my blog post for more information: http://www.sqlbadpractices.com/using-local-variables-in-t-sql-queries/

有关更多信息,请参阅我的博客文章:http://www.sqlbadpractices.com/using-local-variables-in-t-sql-queries/