表值函数破坏了查询性能

时间:2021-03-04 04:15:25

I was having a horrible time today trying to get a query to perform the way I would expect. I had to make a slight change to a table valued function that lives in the query yesterday and that change created a huge performance impact on the query. After evaluating the execution plan and looking at statistics IO and Time I found that because I changed the function to return a table variable instead of just a result set it was doing a full scan on one of the tables being queried.

我今天过得很糟糕,试图让查询以我预期的方式执行。我不得不对昨天出现在查询中的表值函数做一个小小的更改,该更改对查询产生了巨大的性能影响。在评估执行计划并查看统计IO和时间之后,我发现由于我更改了函数以返回一个表变量,而不是只返回一个结果集,所以它正在对正在查询的一个表进行全面扫描。

My question is why would having it return the table (TableVariable) instead of just a Select / Result set cause such a big change to the plan?

我的问题是,为什么让它返回表(TableVariable)而不只是选择/结果集会对计划造成如此大的改变?

Stumped....

难住了....

5 个解决方案

#1


48  

Returning a Table Variable will make it a multi-statement table valued function and can be bad for performance due to the fact that it's treated like a table except there are no statistics available for SQL Server to base a good execution plan on - so it will estimate the function as returning a very small number of rows. If it returns a larger number of rows, then therefore the plan generated could be a lot less than optimal.

返回一个表变量将使它成为一个multi-statement表值函数,可以对性能不利的事实当作一个表除了没有统计数据可用于SQL Server基础好的执行计划,所以它将估计函数返回一个非常小的行数。如果它返回较大的行数,那么生成的计划可能会比最优计划少得多。

Whereas, returning just a SELECT makes it an inline table valued function - think of it more as a view. In this case, the actual underlying tables get brought into the main query and a better execution plan can be generated based on proper statistics. You'll notice that in this case, the execution plan will NOT have a mention of the function at all as it's basically just merged the function into the main query.

然而,仅仅返回一个SELECT就会使它成为一个内联表值函数——更多地将它看作一个视图。在这种情况下,将实际的底层表引入到主查询中,并且可以基于适当的统计信息生成更好的执行计划。您将注意到,在这种情况下,执行计划根本没有提到函数,因为它基本上只是将函数合并到主查询中。

There's a great reference on it on MSDN by CSS SQL Server Engineers including (quote):

CSS SQL Server工程师在MSDN上有很好的参考(引用):

But if you use multi-statement TVF, it’s treated as just like another table. Because there is no statistics available, SQL Server has to make some assumptions and in general provide low estimate. If your TVF returns only a few rows, it will be fine. But if you intend to populate the TVF with thousands of rows and if this TVF is joined with other tables, inefficient plan can result from low cardinality estimate.

但是如果你使用多语句TVF,它就像另一个表一样。由于没有可用的统计数据,SQL Server必须做出一些假设,并且通常提供较低的估计。如果您的TVF只返回几行,那就可以了。但是,如果您打算用数千行填充TVF,并且如果这个TVF与其他表相连接,低基数估计会导致低效的计划。

#2


5  

This is because a multi-Statement Table valued UDF cannot be processed inline with the rest of the SQL statememnt it is used in, and therefore cannot be part of the statement cache plan.. That means that it must be compiled separately from the rest of the SQL it is used in, over and over, for every row in the final resultset generated by the query.

这是因为值为UDF的多语句表不能与使用它的SQL statemnt的其余部分内联处理,因此不能成为语句缓存计划的一部分。这意味着它必须与SQL的其余部分分开编译,在查询生成的最终结果集中,一次又一次地使用它。

An Inline Table valued UDF, otoh, is processed and compiled along with the sql it is used in, and it therefore becomes part of the cache plan and only gets processed and compiled once, no matter how many rows you generate.

一个值为UDF的内联表otoh与它所使用的sql一起被处理和编译,因此它成为缓存计划的一部分,无论生成多少行,只被处理和编译一次。

#3


3  

Really impossible to answer definitively without more information. However, since I like to take crazy stabs in the dark . . .

如果没有更多的信息,真的不可能确切地回答。但是,因为我喜欢在黑暗中疯狂的刺。

Table variables can't be optimized by the engine--the engine always "assumes" that the table variable only has one row in it when it generates an execution plan. That is one reason why you might be seeing strange performance.

表变量不能由引擎优化——引擎总是“假定”表变量在生成执行计划时只有一行。这就是为什么你会看到奇怪的表演。

#4


0  

When using multi-statement table-valued UDF, that UDF is run to completion before its results can be used by the caller. With an inline table-valued UDF, the SQL Server basically expands the UDF into the calling query just like macro expansion. This has the following implications, among others:

当使用多语句表值的UDF时,UDF在其结果可以被调用者使用之前运行。使用内联表值UDF, SQL服务器基本上将UDF扩展为调用查询,就像宏扩展一样。除其他外,这有下列含义:

  • The calling query’s WHERE clause can be interpolated directly into an inline table-valued UDF, but not a multi-statement UDF. Thus, if your table-valued UDF generates a lot of rows that would be filtered out by the calling query’s WHERE clause, the query optimizer can apply down the WHERE clause directly into an inline table-valued UDF but not into a multi-statement UDF.
  • 调用查询的WHERE子句可以直接插入到一个内联表值的UDF中,而不是一个多语句的UDF。因此,如果表值UDF生成许多行,这些行将被调用查询的WHERE子句过滤掉,那么查询优化器可以将WHERE子句直接应用到内联表值UDF中,但不能应用到多语句UDF中。
  • An inline table-valued UDF behaves like a parameterized VIEW would if SQL Server had such a concept whereas a multi-statement table-valued UDF would behave like you populated and then used a table variable in your query.
  • 如果SQL Server有这样的概念,而多语句表值的UDF会像您所填充的那样,然后在查询中使用一个表变量,那么内联表值的UDF就像一个参数化视图。

If your UDF returns many rows and is backed by a table, I imagine this could be where the table scan is coming from. Either add more parameters to your UDF to enable the caller to constrain its result size or try to reformulate it as an inline table-valued UDF with the help of friends such as UNION et al. I would avoid multi-statement table-valued UDFs at all costs unless if the result size is known to only be a few rows and it is hard to produce the required results with set-based logic.

如果您的UDF返回许多行,并且支持一个表,我想这可能是表扫描的来源。要么将多个参数添加到您的UDF允许调用者限制其结果大小或试图用它作为内联表值UDF在朋友的帮助下,如工会et al。我将避免multi-statement表值UDF不惜一切代价,除非如果结果大小已知只有几行,很难产生所需的结果与基于集合的逻辑。

#5


0  

On the SQL Server 2014 we were able to solve our issue by inserting table value function data into temp table and then doing join on it. Instead of doing a join directly to table value function.

在SQL Server 2014上,我们可以通过将表值函数数据插入到临时表中,然后在表上执行join来解决这个问题。而不是直接连接到表值函数。

This improved our execution time from 2 min to 4 secs.

这提高了我们的执行时间从2分钟到4秒。

Here is an example that worked for our team:

这里有一个为我们团队工作的例子:

--SLOW QUERY (2 min):

——慢查询(2分钟):

DECLARE @id INT = 1;

SELECT * 
FROM [data].[someTable] T
INNER JOIN [data].[tableValueFunction](@id) TVF ON TVF.id = T.id;

--FAST QUERY (4 sec):

——快速查询(4秒):

DECLARE @id INT = 1;

SELECT * 
INTO #tableValueFunction
FROM [data].[tableValueFunction](@id) TVF

SELECT * 
FROM [data].[someTable] T
INNER JOIN #tableValueFunction TVF ON TVF.id = T.id;

#1


48  

Returning a Table Variable will make it a multi-statement table valued function and can be bad for performance due to the fact that it's treated like a table except there are no statistics available for SQL Server to base a good execution plan on - so it will estimate the function as returning a very small number of rows. If it returns a larger number of rows, then therefore the plan generated could be a lot less than optimal.

返回一个表变量将使它成为一个multi-statement表值函数,可以对性能不利的事实当作一个表除了没有统计数据可用于SQL Server基础好的执行计划,所以它将估计函数返回一个非常小的行数。如果它返回较大的行数,那么生成的计划可能会比最优计划少得多。

Whereas, returning just a SELECT makes it an inline table valued function - think of it more as a view. In this case, the actual underlying tables get brought into the main query and a better execution plan can be generated based on proper statistics. You'll notice that in this case, the execution plan will NOT have a mention of the function at all as it's basically just merged the function into the main query.

然而,仅仅返回一个SELECT就会使它成为一个内联表值函数——更多地将它看作一个视图。在这种情况下,将实际的底层表引入到主查询中,并且可以基于适当的统计信息生成更好的执行计划。您将注意到,在这种情况下,执行计划根本没有提到函数,因为它基本上只是将函数合并到主查询中。

There's a great reference on it on MSDN by CSS SQL Server Engineers including (quote):

CSS SQL Server工程师在MSDN上有很好的参考(引用):

But if you use multi-statement TVF, it’s treated as just like another table. Because there is no statistics available, SQL Server has to make some assumptions and in general provide low estimate. If your TVF returns only a few rows, it will be fine. But if you intend to populate the TVF with thousands of rows and if this TVF is joined with other tables, inefficient plan can result from low cardinality estimate.

但是如果你使用多语句TVF,它就像另一个表一样。由于没有可用的统计数据,SQL Server必须做出一些假设,并且通常提供较低的估计。如果您的TVF只返回几行,那就可以了。但是,如果您打算用数千行填充TVF,并且如果这个TVF与其他表相连接,低基数估计会导致低效的计划。

#2


5  

This is because a multi-Statement Table valued UDF cannot be processed inline with the rest of the SQL statememnt it is used in, and therefore cannot be part of the statement cache plan.. That means that it must be compiled separately from the rest of the SQL it is used in, over and over, for every row in the final resultset generated by the query.

这是因为值为UDF的多语句表不能与使用它的SQL statemnt的其余部分内联处理,因此不能成为语句缓存计划的一部分。这意味着它必须与SQL的其余部分分开编译,在查询生成的最终结果集中,一次又一次地使用它。

An Inline Table valued UDF, otoh, is processed and compiled along with the sql it is used in, and it therefore becomes part of the cache plan and only gets processed and compiled once, no matter how many rows you generate.

一个值为UDF的内联表otoh与它所使用的sql一起被处理和编译,因此它成为缓存计划的一部分,无论生成多少行,只被处理和编译一次。

#3


3  

Really impossible to answer definitively without more information. However, since I like to take crazy stabs in the dark . . .

如果没有更多的信息,真的不可能确切地回答。但是,因为我喜欢在黑暗中疯狂的刺。

Table variables can't be optimized by the engine--the engine always "assumes" that the table variable only has one row in it when it generates an execution plan. That is one reason why you might be seeing strange performance.

表变量不能由引擎优化——引擎总是“假定”表变量在生成执行计划时只有一行。这就是为什么你会看到奇怪的表演。

#4


0  

When using multi-statement table-valued UDF, that UDF is run to completion before its results can be used by the caller. With an inline table-valued UDF, the SQL Server basically expands the UDF into the calling query just like macro expansion. This has the following implications, among others:

当使用多语句表值的UDF时,UDF在其结果可以被调用者使用之前运行。使用内联表值UDF, SQL服务器基本上将UDF扩展为调用查询,就像宏扩展一样。除其他外,这有下列含义:

  • The calling query’s WHERE clause can be interpolated directly into an inline table-valued UDF, but not a multi-statement UDF. Thus, if your table-valued UDF generates a lot of rows that would be filtered out by the calling query’s WHERE clause, the query optimizer can apply down the WHERE clause directly into an inline table-valued UDF but not into a multi-statement UDF.
  • 调用查询的WHERE子句可以直接插入到一个内联表值的UDF中,而不是一个多语句的UDF。因此,如果表值UDF生成许多行,这些行将被调用查询的WHERE子句过滤掉,那么查询优化器可以将WHERE子句直接应用到内联表值UDF中,但不能应用到多语句UDF中。
  • An inline table-valued UDF behaves like a parameterized VIEW would if SQL Server had such a concept whereas a multi-statement table-valued UDF would behave like you populated and then used a table variable in your query.
  • 如果SQL Server有这样的概念,而多语句表值的UDF会像您所填充的那样,然后在查询中使用一个表变量,那么内联表值的UDF就像一个参数化视图。

If your UDF returns many rows and is backed by a table, I imagine this could be where the table scan is coming from. Either add more parameters to your UDF to enable the caller to constrain its result size or try to reformulate it as an inline table-valued UDF with the help of friends such as UNION et al. I would avoid multi-statement table-valued UDFs at all costs unless if the result size is known to only be a few rows and it is hard to produce the required results with set-based logic.

如果您的UDF返回许多行,并且支持一个表,我想这可能是表扫描的来源。要么将多个参数添加到您的UDF允许调用者限制其结果大小或试图用它作为内联表值UDF在朋友的帮助下,如工会et al。我将避免multi-statement表值UDF不惜一切代价,除非如果结果大小已知只有几行,很难产生所需的结果与基于集合的逻辑。

#5


0  

On the SQL Server 2014 we were able to solve our issue by inserting table value function data into temp table and then doing join on it. Instead of doing a join directly to table value function.

在SQL Server 2014上,我们可以通过将表值函数数据插入到临时表中,然后在表上执行join来解决这个问题。而不是直接连接到表值函数。

This improved our execution time from 2 min to 4 secs.

这提高了我们的执行时间从2分钟到4秒。

Here is an example that worked for our team:

这里有一个为我们团队工作的例子:

--SLOW QUERY (2 min):

——慢查询(2分钟):

DECLARE @id INT = 1;

SELECT * 
FROM [data].[someTable] T
INNER JOIN [data].[tableValueFunction](@id) TVF ON TVF.id = T.id;

--FAST QUERY (4 sec):

——快速查询(4秒):

DECLARE @id INT = 1;

SELECT * 
INTO #tableValueFunction
FROM [data].[tableValueFunction](@id) TVF

SELECT * 
FROM [data].[someTable] T
INNER JOIN #tableValueFunction TVF ON TVF.id = T.id;