I'm working on performance tuning some aggregate-heavy queries, and I'm wondering whether repeating the same aggregate function over and over has a significant performance penalty.
我正在努力调整一些聚合重量级的查询,我想知道重复使用相同的聚合函数是否会显着降低性能。
I'm assuming MS SQL Server is intelligent enough to calculate each repeated aggregate only once, and then reuse the resultant value each time that exact aggregate is encountered within the same query - Is my assumption correct here?
我假设MS SQL Server足够智能,只能计算每个重复聚合一次,然后每次在同一查询中遇到精确聚合时重用结果值 - 我的假设在这里是否正确?
The alternative to this is, we can add a bunch more joins to this view (joining the same tables over and over again), with varying join on
clauses, to group rows together in a bunch of different ways to produce the various totals without reusing any aggregate functions more than once - but looking at the execution plans we can see that adding more joins definitely does make the query take longer, and we technically already have all the information we need with the current number of joins anyway (we just have to perform the addition to produce the compound totals).
替代方案是,我们可以在此视图中添加更多连接(一遍又一遍地连接相同的表),使用不同的连接子句,以多种不同的方式将行组合在一起以生成各种总计而无需重用任何聚合函数不止一次 - 但是看一下执行计划我们可以看到添加更多联接肯定会使查询花费更长时间,而且我们技术上已经拥有了当前连接数所需的所有信息(我们只需要执行添加以生成化合物总数)。
Here's some example code, from one of the views in question:
以下是一些示例代码,来自其中一个视图:
COUNT_BIG ( [UVCE]. [ID] ) AS [TotalU] ,
COUNT_BIG ( [SVCE]. [ID] ) AS [TotalS] ,
COUNT_BIG ( [TVCE]. [ID] ) AS [TotalT] ,
COUNT_BIG ( [CVCE]. [ID] ) AS [TotalC] ,
COUNT_BIG ( [WVCE]. [ID] ) AS [TotalW] ,
/* More individual totals, etc. */
COUNT_BIG ( [SCE]. [ID] ) +
COUNT_BIG ( [TCE]. [ID] ) +
COUNT_BIG ( [CCE]. [ID] ) +
COUNT_BIG ( [WCE]. [ID] ) +
COUNT_BIG ( [UVCE]. [ID] ) +
COUNT_BIG ( [SVCE]. [ID] ) +
COUNT_BIG ( [TVCE]. [ID] ) +
COUNT_BIG ( [CVCE]. [ID] ) +
COUNT_BIG ( [WVCE]. [ID] ) AS [OverallTotal] ,
CASE WHEN COUNT_BIG ( [SCE]. [ID] ) +
COUNT_BIG ( [TCE]. [ID] ) +
COUNT_BIG ( [CCE]. [ID] ) +
COUNT_BIG ( [WCE]. [ID] ) +
COUNT_BIG ( [UVCE]. [ID] ) +
COUNT_BIG ( [SVCE]. [ID] ) +
COUNT_BIG ( [TVCE]. [ID] ) +
COUNT_BIG ( [CVCE]. [ID] ) +
COUNT_BIG ( [WVCE]. [ID] ) >= 64 THEN 4E0 ELSE (
COUNT_BIG ( [SCE]. [ID] ) +
COUNT_BIG ( [TCE]. [ID] ) +
COUNT_BIG ( [CCE]. [ID] ) +
COUNT_BIG ( [WCE]. [ID] ) +
COUNT_BIG ( [UVCE]. [ID] ) +
COUNT_BIG ( [SVCE]. [ID] ) +
COUNT_BIG ( [TVCE]. [ID] ) +
COUNT_BIG ( [CVCE]. [ID] ) +
COUNT_BIG ( [WVCE]. [ID] ) )
/ ( 64 / 4E0 ) END AS [Score]
1 个解决方案
#1
1
The SQL Server optimizer is pretty good.
SQL Server优化器非常好。
However, you are missing an important point when using aggregation functions. In general, the group by
clause is way, way more expensive than the aggregation function calls. That is, moving the data around to define the groups is the expensive part of the query. (One exception to this is count(distinct)
.)
但是,在使用聚合函数时,您缺少重要的一点。通常,group by子句比聚合函数调用方式更昂贵。也就是说,移动数据来定义组是查询的昂贵部分。 (一个例外是count(不同)。)
That said, doing dozens of function calls can have a noticeable performance impact. In SQL Server, it is easy enough to use common table expressions (CTEs) or subqueries to define the values at one level and use them at another. That said, SQL Server may do this for you. I just think that other parts of the query are likely to be much more important in terms of performance.
也就是说,进行数十次函数调用会对性能产生明显的影响。在SQL Server中,使用公用表表达式(CTE)或子查询来定义一个级别的值并在另一个级别使用它们很容易。也就是说,SQL Server可能会为您执行此操作。我只是认为查询的其他部分在性能方面可能更重要。
#1
1
The SQL Server optimizer is pretty good.
SQL Server优化器非常好。
However, you are missing an important point when using aggregation functions. In general, the group by
clause is way, way more expensive than the aggregation function calls. That is, moving the data around to define the groups is the expensive part of the query. (One exception to this is count(distinct)
.)
但是,在使用聚合函数时,您缺少重要的一点。通常,group by子句比聚合函数调用方式更昂贵。也就是说,移动数据来定义组是查询的昂贵部分。 (一个例外是count(不同)。)
That said, doing dozens of function calls can have a noticeable performance impact. In SQL Server, it is easy enough to use common table expressions (CTEs) or subqueries to define the values at one level and use them at another. That said, SQL Server may do this for you. I just think that other parts of the query are likely to be much more important in terms of performance.
也就是说,进行数十次函数调用会对性能产生明显的影响。在SQL Server中,使用公用表表达式(CTE)或子查询来定义一个级别的值并在另一个级别使用它们很容易。也就是说,SQL Server可能会为您执行此操作。我只是认为查询的其他部分在性能方面可能更重要。