需要使用聚合函数提高SQL查询的性能

I have a particular SQL query that seems to suffer from a mysterious performance issue. Here is the query:

我有一个特殊的SQL查询似乎遇到了一个神秘的性能问题。这是查询：

SELECT COUNT(LengthOfTime) AS TotalTime, 
       SUM(LengthOfTime) AS TotalLength, 
       SUM(LengthOfTime) / COUNT(LengthOfTime) AS AverageTime, 
       SUM(Pops) / COUNT(LengthOfTime) AS AveragePop 
  FROM ((SELECT * 
           FROM (SELECT *, ID & YearRec AS ID2 
                   FROM MyFirstTable 
                 UNION ALL 
                 SELECT *, ID & YearRec AS ID2 
                   FROM Table2011) AS TEMP 
          WHERE STARTTIME >= '8/1/2011 00:00:00' 
            AND StartTime <= '8/5/2011 23:59:59' ) AS TEMP2 
  JOIN AppleTable ON TEMP2.Reason = AppleTable.Skills ) 
  JOIN PeopleTable ON TEMP2.Operator = PeopleTable.Operators 
 WHERE AppleTable.[ON] = 1 
   AND PeopleTable.[ON] = 1 
   AND Rec_Type = 'SECRET AGENT'

The issue here is that this query runs very quickly (0:00 to 0:02) when run for a 5 day span, but very slowly (1:20 to 1:45) for a 6 day span.

这里的问题是，当运行5天时，此查询运行得非常快（0:00到0:02），但是在6天的跨度内非常慢（1:20到1:45）。

There are approximately 105,000 records per day in the Tables (MyFirstTable and Table2011).

表中每天大约有105,000条记录（MyFirstTable和Table2011）。

My question: Is there an upper limit to the number of rows you can pass an aggregate function before you see a serious performance issue in SQL Server? (currently using 2008 R2)

我的问题：在SQL Server中发现严重的性能问题之前，您可以传递聚合函数的行数是否有上限？（目前使用的是2008 R2）

2 个解决方案

#1

No, there is no pre-defined upper limit for aggregate functions.

不，聚合函数没有预定义的上限。

The skew in performance is likely affected by one or multiple of the following:

性能偏差可能受以下一种或多种影响：

Old and/or unsuitable index structure
旧的和/或不合适的索引结构
Cached execution plan
缓存执行计划
Cached data
缓存数据
data size not being uniform (the first five days are 10 rows while the sixth is 100 B rows)
数据大小不均匀（前五天是10行，而第六行是100 B行）

You can run the query in SSMS and view the actual execution plan. This will tell you the places where the cost of running the query is the highest, and that will help you determine the best course of action.

您可以在SSMS中运行查询并查看实际的执行计划。这将告诉您运行查询的成本最高的位置，这将帮助您确定最佳操作过程。

Edit based on comments:

根据评论进行编辑：

If there isn't an index on Table2011 that contains [STARTTIME], then create one. If there is an index, but it is getting ignored, then you have to figure out why. If the is fragmented, then rebuilding the index will definitely help. Here is how to rebuild

如果Table2011上没有包含[STARTTIME]的索引，则创建一个索引。如果有一个索引，但它被忽略了，那么你必须找出原因。如果碎片化，那么重建索引肯定会有所帮助。以下是如何重建

ALTER INDEX [YourIndexName] ON [dbo].[Table2011] REBUILD WITH (STATISTICS_NORECOMPUTE = ON);

ALTER INDEX [YourIndexName] ON [dbo]。[Table2011] REBUILD WITH（STATISTICS_NORECOMPUTE = ON）;

Alternately you can do this in SSMS - browse to the specific index in the object browser, right click and rebuild.

或者，您可以在SSMS中执行此操作 - 浏览到对象浏览器中的特定索引，右键单击并重建。

#2

Short answer: No, there's not some magic number of records that will cause MSSQL to start performing poorly.

简短的回答：不，没有一些神奇的记录会导致MSSQL开始表现不佳。

Now, it's possible queries won't scale well and, as a result, the larger the dataset the [exponentially] worse it performs.

现在，可能的查询不能很好地扩展，因此，数据集越大，其执行的指数越大。

A large problem you're going to have is that you're predicating the StartTime after the UNIONED statements. Instead, try predicating on that in both of your selects prior to the UNION. That should make a huge difference, especially if you index both tables on StartTime (generating index seeks on those tables).

您将遇到的一个大问题是您在UNIONED语句之后预测StartTime。相反，尝试在UNION之前的两个选择中预测它。这应该会产生巨大的差异，特别是如果你在StartTime上索引两个表（在这些表上生成索引）。

SELECT * FROM (
SELECT *, ID & YearRec AS ID2 FROM MyFirstTable 
   WHERE STARTTIME >= '8/1/2011 00:00:00' 
   AND STARTTIME <= '8/5/2011 23:59:59'
UNION ALL SELECT *, ID & YearRec AS ID2 
FROM Table2011
   WHERE STARTTIME >= '8/1/2011 00:00:00' 
   AND STARTTIME <= '8/5/2011 23:59:59'
) AS TEMP

You may be able to do some additional refactoring of your code as well.

您也可以对代码进行一些额外的重构。

#1