为什么我的执行计划中会出现这种情况?

时间:2022-09-10 20:03:59

I have the sql query below that is running very slowly. I took a look at the execution plan and it is claiming that a sort on Files.OrderId is the highest cost operation (53%). Why would this be happening if I am not ordering by OrderId anywhere? Is my best bet to create an index on File.OrderId?

我有下面的SQL查询运行速度非常慢。我看了一下执行计划,它声称Files.OrderId上的排序是成本最高的操作(53%)。如果我没有在任何地方订购OrderId,为什么会发生这种情况呢?我最好在File.OrderId上创建索引吗?

Execution plan if anyone is interested.

执行计划,如果有人有兴趣。

with custOrders as
(
    SELECT c.firstName + ' ' + c.lastname as Customer, c.PartnerId , c.CustomerId,o.OrderId,o.CreateDate, c.IsPrimary
    FROM Customers c
    LEFT JOIN CustomerRelationships as cr
        ON c.CustomerId = cr.PrimaryCustomerId
    INNER JOIN Orders as o
       ON c.customerid = o.customerid 
           OR (cr.secondarycustomerid IS NOT NULL AND o.customerid = cr.secondarycustomerid)
    where c.createdate >= @FromDate + ' 00:00' 
       AND c.createdate <= @ToDate + ' 23:59' 
),
 temp as
(
SELECT Row_number() 
         OVER ( 
           ORDER BY c.createdate DESC)                    AS 'row_number', 
       c.customerid as customerId, 
       c.partnerid as partnerId, 
       c.Customer, 
       c.orderid as OrderId, 
       c.createdate as CreateDate, 
       Count(f.orderid)                                   AS FileCount, 
       dbo.Getparentcustomerid(c.isprimary, c.customerid) AS ParentCustomerId, 
       au.firstname + ' ' + au.lastname                   AS Admin, 
       '' as blank, 
       0  as zero
FROM   custOrders c 
       INNER JOIN files f 
               ON c.orderid = f.orderid 
       INNER JOIN admincustomers ac 
               ON c.customerid = ac.customerid 
       INNER JOIN adminusers au 
               ON ac.adminuserid = au.id 
       INNER JOIN filestatuses s 
               ON f.statusid = s.statusid 
WHERE  ac.adminuserid IS NOT NULL 
       AND f.statusid NOT IN ( 5, 6 ) 
GROUP  BY c.customerid, 
          c.partnerid, 
          c.Customer, 
          c.isprimary, 
          c.orderid, 
          c.createdate, 
          au.firstname, 
          au.lastname 
)

4 个解决方案

#1


11  

SQL Server has three algorithms to choose from when it needs to join two tables. The Nested-Loops-Join, the Hash-Join and the Sort-Merge-Join. Which one it selects it bases on cost estimates. In this case it figured, that based on the information it had available a Sort-Merge-Join was the right choice.

SQL Server在需要连接两个表时有三种算法可供选择。 Nested-Loops-Join,Hash-Join和Sort-Merge-Join。它选择哪一个基于成本估算。在这种情况下,它认为,基于它可用的信息,Sort-Merge-Join是正确的选择。

In SQL Server execution plans a Sort-Merge is splitt into two operators, the Sort and the Merge-Join, because the sort operation might not be necessary, for example if the data is sorted already.

在SQL Server执行计划中,Sort-Merge被拆分为两个运算符,Sort和Merge-Join,因为排序操作可能不是必需的,例如,如果数据已经排序。

For mor information about joins check out my join series here: http://sqlity.net/en/1146/a-join-a-day-introduction/ The article about the Sort-Merg-Join is here: http://sqlity.net/en/1480/a-join-a-day-the-sort-merge-join/

有关联接的信息,请查看我的联接系列:http://sqlity.net/en/1146/a-join-a-day-introduction/有关Sort-Merg-Join的文章在这里:http:// sqlity.net/en/1480/a-join-a-day-the-sort-merge-join/


To make your query faster, I first would look at indexes. You have a bunch of clustered index scans in the query. If you can replace a few of them with seeks you will be most likely better of. Also check if the estimates that SQL Server produces match the actual row counts in an actual execution plan. If they are far off, SQL Server often makes bad choices. So providing better statistics can help you query performance too.

为了使您的查询更快,我首先看看索引。您在查询中有一堆聚簇索引扫描。如果你可以用寻求替换它们中的一些,那么你很可能会更好。还要检查SQL Server生成的估计值是否与实际执行计划中的实际行计数相匹配。如果它们距离很远,SQL Server通常做出错误的选择。因此提供更好的统计信息也可以帮助您查询性能。

#2


2  

SQL Server is performing the sort to enable the merge join between the dataset to the right of that sort operator and the records in the Orders table. Merge join itself is a very efficient way to join all the records in a dataset, but it requires that each dataset to be joined is sorted according to the join keys and in the same order.

SQL Server正在执行排序以启用该排序运算符右侧的数据集与Orders表中的记录之间的合并连接。合并连接本身是一种非常有效的方式来连接数据集中的所有记录,但它要求要连接的每个数据集都根据连接键和相同的顺序进行排序。

Since the PK_Orders key is already ordered by OrderID, SQL Server decided to take advantage of that by sorting the other end of the join (the other stuff to the right of the sort) so that the two datasets can be merged together at that point in the plan. The common alternative to merge join is a hash join, but that wouldn't help you because you would instead have an expensive hash join operator instead of the sort and merge. The query optimizer has determined the sort and merge to be more efficient in this case.

由于PK_Orders键已经由OrderID排序,因此SQL Server决定通过对连接的另一端(排序右侧的其他内容)进行排序来利用它,以便两个数据集可以在此时合并在一起。计划。合并连接的常见替代方法是散列连接,但这对您没有帮助,因为您将使用昂贵的散列连接运算符而不是排序和合并。在这种情况下,查询优化器已确定排序和合并更有效。

The root cause of the expensive step in the plan is the need to combine all the records from the orders table into the dataset. Is there a way to limit the records coming from the files table? An index on files.statusid may be helpful if the records not in 5,6 are less than 10% of the total table size.

计划中昂贵步骤的根本原因是需要将订单表中的所有记录组合到数据集中。有没有办法限制来自文件表的记录?如果不在5,6中的记录小于总表大小的10%,则files.statusid上的索引可能会有所帮助。

The QO thinks that most of the records are going to be filtered out at the end. Try to push as many of those filter conditions back to the record sources so that less records have to be handled in the middle of the plan.

QO认为大部分记录最终会被过滤掉。尝试将尽可能多的过滤条件反馈到记录源,以便在计划的中间处理更少的记录。

EDIT: I forgot to mention, it is very helpful to have an execution plan that we can look at. Is there any way we can get an actual execution plan result to see the real number of records going through those operators? Sometimes the estimated record counts can be a little off.

编辑:我忘了提到,制定一个我们可以看到的执行计划是非常有帮助的。我们有什么方法可以获得实际的执行计划结果,以查看通过这些运营商的实际记录数量?有时估计的记录数量可能有点偏差。

EDIT: Looking deeper into the 2nd to last filter operator's predicate field, summarized:

编辑:深入研究第二个到最后一个过滤器运算符的谓词字段,总结如下:

c.CustomerId=o.CustomerId
OR o.CustomerId=cr.SecondaryCustomerId AND cr.SecondaryCustomerId IS NOT NULL

Looks like SQL Server is producing a cross join between all possible matching records between Orders and Customers up to this point in the query (the plan on the right of the 2nd to last filter operator) and then looking at each record with that condition to see if it does indeed match. Notice how the line going into the filter is really fat and the line coming out is really thin? That's because the estimated row count goes from 21k to 4 after that operator. Forget what I said earlier, this is probably the main problem in the plan. Even if there are indexes on these columns, SQL Server can't use them because the join condition is too complex. It's causing the plan to merge all the records together instead of seeking to just the ones you need because it can't use the full join predicate right away.

看起来SQL Server在订单和客户之间的所有可能的匹配记录之间产生交叉连接,直到查询中的这一点(第二个到最后一个过滤器操作符右侧的计划),然后查看具有该条件的每个记录以查看如果它确实匹配。请注意进入过滤器的线路是如何变胖的,而且出来的线路真的很薄?那是因为在该运算符之后估计的行数从21k变为4。忘掉我之前说过的话,这可能是计划中的主要问题。即使这些列上有索引,SQL Server也无法使用它们,因为连接条件太复杂。它导致计划将所有记录合并在一起,而不是只寻求您需要的记录,因为它不能立即使用完整的连接谓词。

My first thought is to rephrase the CTE custOrders as a union of two datasets: one using CustomerId and one using SecondaryCustomerId to join. This will duplicate the work of the rest of the CTE but if it enables proper use of the indexes, it could be a big win.

我的第一个想法是将CTE custOrders重新定义为两个数据集的联合:一个使用CustomerId,另一个使用SecondaryCustomerId加入。这将复制CTE其余部分的工作,但如果能够正确使用索引,则可能是一个巨大的胜利。

#3


1  

I think the sort is occurring for this join:

我认为此连接正在发生排序:

FROM   custOrders c 
       INNER JOIN files f 
               ON c.orderid = f.orderid 

I would create an index on files that includes the columns orderid and statusid since the query also uses the statusid column.

我会在包含列orderid和statusid的文件上创建索引,因为查询也使用statusid列。

You might also want to consider the following changes:

您可能还需要考虑以下更改:

  1. You don't need "ac.adminuserid IS NOT NULL" as this is covered by the inner join between adminusers and admincustomers
  2. 您不需要“ac.adminuserid IS NOT NULL”,因为adminusers和admincustomers之间的内部联接涵盖了这一点
  3. Change the test "f.statusid NOT IN ( 5, 6 )" to a positive condition (e.g. In) as negative conditions are more expensive to process.
  4. 将测试“f.statusid NOT IN(5,6)”改为正条件(例如In),因为负条件处理起来更昂贵。

#4


0  

I know this question is quite old, however I had this same issue and realised there was a completely different reason my tables suddenly slowed. The symptoms were the same, slow to update views that were previously lightning fast. "Sort" giving a cost of 40%. This solution may prove useful to someone, and it is simple. When joining tables, ensure you are joining on a "like for like" basis. I was joining two tables on ID. However in one table my ID was set as an int and in the other as nvarchar. I corrected this to have them both defined as the same type and the view is back to lightning speed.

我知道这个问题已经很老了,但是我遇到了同样的问题并且意识到我的桌子突然放慢了一个完全不同的原因。症状相同,更新以前闪电般快速的视图很慢。 “排序”给出了40%的成本。这个解决方案可能对某人有用,而且很简单。加入表格时,请确保您加入“喜欢”之类的基础。我在ID上加入了两张桌子。但是在一个表中,我的ID被设置为int,而另一个表被设置为nvarchar。我更正了这个问题,将它们定义为相同类型,并且视图恢复闪电般的速度。

hopefully this will help someone else to avoid spending a week trying to figure out what's wrong with SQL, when its really a PEBKAC moment.

希望这可以帮助其他人避免花费一周时间试图找出SQL的错误,当它真的是一个PEBKAC时刻。

(Problem Exists Between Keyboard And Chair)

(键盘和椅子之间存在问题)

#1


11  

SQL Server has three algorithms to choose from when it needs to join two tables. The Nested-Loops-Join, the Hash-Join and the Sort-Merge-Join. Which one it selects it bases on cost estimates. In this case it figured, that based on the information it had available a Sort-Merge-Join was the right choice.

SQL Server在需要连接两个表时有三种算法可供选择。 Nested-Loops-Join,Hash-Join和Sort-Merge-Join。它选择哪一个基于成本估算。在这种情况下,它认为,基于它可用的信息,Sort-Merge-Join是正确的选择。

In SQL Server execution plans a Sort-Merge is splitt into two operators, the Sort and the Merge-Join, because the sort operation might not be necessary, for example if the data is sorted already.

在SQL Server执行计划中,Sort-Merge被拆分为两个运算符,Sort和Merge-Join,因为排序操作可能不是必需的,例如,如果数据已经排序。

For mor information about joins check out my join series here: http://sqlity.net/en/1146/a-join-a-day-introduction/ The article about the Sort-Merg-Join is here: http://sqlity.net/en/1480/a-join-a-day-the-sort-merge-join/

有关联接的信息,请查看我的联接系列:http://sqlity.net/en/1146/a-join-a-day-introduction/有关Sort-Merg-Join的文章在这里:http:// sqlity.net/en/1480/a-join-a-day-the-sort-merge-join/


To make your query faster, I first would look at indexes. You have a bunch of clustered index scans in the query. If you can replace a few of them with seeks you will be most likely better of. Also check if the estimates that SQL Server produces match the actual row counts in an actual execution plan. If they are far off, SQL Server often makes bad choices. So providing better statistics can help you query performance too.

为了使您的查询更快,我首先看看索引。您在查询中有一堆聚簇索引扫描。如果你可以用寻求替换它们中的一些,那么你很可能会更好。还要检查SQL Server生成的估计值是否与实际执行计划中的实际行计数相匹配。如果它们距离很远,SQL Server通常做出错误的选择。因此提供更好的统计信息也可以帮助您查询性能。

#2


2  

SQL Server is performing the sort to enable the merge join between the dataset to the right of that sort operator and the records in the Orders table. Merge join itself is a very efficient way to join all the records in a dataset, but it requires that each dataset to be joined is sorted according to the join keys and in the same order.

SQL Server正在执行排序以启用该排序运算符右侧的数据集与Orders表中的记录之间的合并连接。合并连接本身是一种非常有效的方式来连接数据集中的所有记录,但它要求要连接的每个数据集都根据连接键和相同的顺序进行排序。

Since the PK_Orders key is already ordered by OrderID, SQL Server decided to take advantage of that by sorting the other end of the join (the other stuff to the right of the sort) so that the two datasets can be merged together at that point in the plan. The common alternative to merge join is a hash join, but that wouldn't help you because you would instead have an expensive hash join operator instead of the sort and merge. The query optimizer has determined the sort and merge to be more efficient in this case.

由于PK_Orders键已经由OrderID排序,因此SQL Server决定通过对连接的另一端(排序右侧的其他内容)进行排序来利用它,以便两个数据集可以在此时合并在一起。计划。合并连接的常见替代方法是散列连接,但这对您没有帮助,因为您将使用昂贵的散列连接运算符而不是排序和合并。在这种情况下,查询优化器已确定排序和合并更有效。

The root cause of the expensive step in the plan is the need to combine all the records from the orders table into the dataset. Is there a way to limit the records coming from the files table? An index on files.statusid may be helpful if the records not in 5,6 are less than 10% of the total table size.

计划中昂贵步骤的根本原因是需要将订单表中的所有记录组合到数据集中。有没有办法限制来自文件表的记录?如果不在5,6中的记录小于总表大小的10%,则files.statusid上的索引可能会有所帮助。

The QO thinks that most of the records are going to be filtered out at the end. Try to push as many of those filter conditions back to the record sources so that less records have to be handled in the middle of the plan.

QO认为大部分记录最终会被过滤掉。尝试将尽可能多的过滤条件反馈到记录源,以便在计划的中间处理更少的记录。

EDIT: I forgot to mention, it is very helpful to have an execution plan that we can look at. Is there any way we can get an actual execution plan result to see the real number of records going through those operators? Sometimes the estimated record counts can be a little off.

编辑:我忘了提到,制定一个我们可以看到的执行计划是非常有帮助的。我们有什么方法可以获得实际的执行计划结果,以查看通过这些运营商的实际记录数量?有时估计的记录数量可能有点偏差。

EDIT: Looking deeper into the 2nd to last filter operator's predicate field, summarized:

编辑:深入研究第二个到最后一个过滤器运算符的谓词字段,总结如下:

c.CustomerId=o.CustomerId
OR o.CustomerId=cr.SecondaryCustomerId AND cr.SecondaryCustomerId IS NOT NULL

Looks like SQL Server is producing a cross join between all possible matching records between Orders and Customers up to this point in the query (the plan on the right of the 2nd to last filter operator) and then looking at each record with that condition to see if it does indeed match. Notice how the line going into the filter is really fat and the line coming out is really thin? That's because the estimated row count goes from 21k to 4 after that operator. Forget what I said earlier, this is probably the main problem in the plan. Even if there are indexes on these columns, SQL Server can't use them because the join condition is too complex. It's causing the plan to merge all the records together instead of seeking to just the ones you need because it can't use the full join predicate right away.

看起来SQL Server在订单和客户之间的所有可能的匹配记录之间产生交叉连接,直到查询中的这一点(第二个到最后一个过滤器操作符右侧的计划),然后查看具有该条件的每个记录以查看如果它确实匹配。请注意进入过滤器的线路是如何变胖的,而且出来的线路真的很薄?那是因为在该运算符之后估计的行数从21k变为4。忘掉我之前说过的话,这可能是计划中的主要问题。即使这些列上有索引,SQL Server也无法使用它们,因为连接条件太复杂。它导致计划将所有记录合并在一起,而不是只寻求您需要的记录,因为它不能立即使用完整的连接谓词。

My first thought is to rephrase the CTE custOrders as a union of two datasets: one using CustomerId and one using SecondaryCustomerId to join. This will duplicate the work of the rest of the CTE but if it enables proper use of the indexes, it could be a big win.

我的第一个想法是将CTE custOrders重新定义为两个数据集的联合:一个使用CustomerId,另一个使用SecondaryCustomerId加入。这将复制CTE其余部分的工作,但如果能够正确使用索引,则可能是一个巨大的胜利。

#3


1  

I think the sort is occurring for this join:

我认为此连接正在发生排序:

FROM   custOrders c 
       INNER JOIN files f 
               ON c.orderid = f.orderid 

I would create an index on files that includes the columns orderid and statusid since the query also uses the statusid column.

我会在包含列orderid和statusid的文件上创建索引,因为查询也使用statusid列。

You might also want to consider the following changes:

您可能还需要考虑以下更改:

  1. You don't need "ac.adminuserid IS NOT NULL" as this is covered by the inner join between adminusers and admincustomers
  2. 您不需要“ac.adminuserid IS NOT NULL”,因为adminusers和admincustomers之间的内部联接涵盖了这一点
  3. Change the test "f.statusid NOT IN ( 5, 6 )" to a positive condition (e.g. In) as negative conditions are more expensive to process.
  4. 将测试“f.statusid NOT IN(5,6)”改为正条件(例如In),因为负条件处理起来更昂贵。

#4


0  

I know this question is quite old, however I had this same issue and realised there was a completely different reason my tables suddenly slowed. The symptoms were the same, slow to update views that were previously lightning fast. "Sort" giving a cost of 40%. This solution may prove useful to someone, and it is simple. When joining tables, ensure you are joining on a "like for like" basis. I was joining two tables on ID. However in one table my ID was set as an int and in the other as nvarchar. I corrected this to have them both defined as the same type and the view is back to lightning speed.

我知道这个问题已经很老了,但是我遇到了同样的问题并且意识到我的桌子突然放慢了一个完全不同的原因。症状相同,更新以前闪电般快速的视图很慢。 “排序”给出了40%的成本。这个解决方案可能对某人有用,而且很简单。加入表格时,请确保您加入“喜欢”之类的基础。我在ID上加入了两张桌子。但是在一个表中,我的ID被设置为int,而另一个表被设置为nvarchar。我更正了这个问题,将它们定义为相同类型,并且视图恢复闪电般的速度。

hopefully this will help someone else to avoid spending a week trying to figure out what's wrong with SQL, when its really a PEBKAC moment.

希望这可以帮助其他人避免花费一周时间试图找出SQL的错误,当它真的是一个PEBKAC时刻。

(Problem Exists Between Keyboard And Chair)

(键盘和椅子之间存在问题)