SQL Server 2005表值函数的怪异性能

I have a huge difference of time execution between a 1-minute query and the same one in a table-valued function.

在一个1分钟的查询和一个表值函数中，我有一个巨大的时间执行差异。

But the most weired thing is that running the UDF with another (valid) company_id argument gives me a result in ~40 seconds and as soon as I change this company_id for 12 (valid again), it never stops. The execution plans of these two queries are absolutely not the same and of course, the long one is the most complicated. BUT the execution plan between the batch version and the UDF version are the same AND the batch version is fast...!

但是最奇怪的事情是，使用另一个(有效的)company_id参数运行UDF会在40秒内给我一个结果，当我将这个company_id更改为12(再次有效)时，它就不会停止。这两个查询的执行计划是完全不同的，当然，长查询是最复杂的。但是批处理版本和UDF版本之间的执行计划是相同的，批处理版本是快速的…!

If I do the following query "by hand", the execution time is 1min36s with 306 rows:

如果我“手动”执行以下查询，执行时间为1min36，包含306行:

 SELECT
  dbo.date_only(Call.date) AS date,
  count(DISTINCT customer_id) AS new_customers
 FROM
  Call
 LEFT OUTER JOIN
  dbo.company_new_customers(12, 2009, 2009) new_customers
   ON dbo.date_only(new_customers.date) = dbo.date_only(Call.date)
 WHERE
  company_id = 12
  AND year(Call.date) >= 2009
  AND year(Call.date) <= 2009
 GROUP BY
  dbo.date_only(Call.date)

I stored this exactly same query in a function and ran it like that :

我将相同的查询存储在一个函数中，然后像这样运行:

SELECT * FROM company_new_customers_count(12, 2009, 2009)

13 minutes for now that it is running... And I am sure that it will never give me any result.

13分钟后，它开始运行……我相信它永远不会给我任何结果。

Yesterday, I had the exact same infinite-loop-like behaviour during more than 4h (so I stopped it).

昨天，在超过4h的时间里，我有完全相同的微环行为(所以我停止了)。

Here is the definition of the function:

下面是函数的定义:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION company_new_customers_count 
( 
 @company_id int, 
 @start_year int,
 @end_year int
)
RETURNS TABLE 
AS
RETURN 
(
 SELECT
  dbo.date_only(Call.date) AS date,
  count(DISTINCT customer_id) AS new_customers
 FROM
  Call
 LEFT OUTER JOIN
  dbo.company_new_customers(@company_id, @start_year, @end_year) new_customers
   ON dbo.date_only(new_customers.date) = dbo.date_only(Call.date)
 WHERE
  company_id = @company_id
  AND year(Call.date) >= @start_year
  AND year(Call.date) <= @end_year
 GROUP BY
  dbo.date_only(Call.date)
)
GO

I would be very happy to understand what is going on.

我很高兴能理解发生了什么。

Thanks

谢谢

Additional:

附加:

Definition of company_new_customers:

company_new_customers的定义:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Description: Create the list of new customers of @company_id
--          in the given period.
-- =============================================
CREATE FUNCTION company_new_customers 
(   
    @company_id int, 
    @start_year int,
    @end_year   int
)
RETURNS TABLE 
AS
RETURN 
(
    SELECT
        customer_id,
        date
    FROM
    (   -- select apparition dates of cutomers before @end_year
        SELECT
            min(date)       AS date,
            customer_id
        FROM
            Call
        JOIN
            Call_Customer ON Call_Customer.call_id = Call.call_id
        WHERE
            company_id = @company_id
            AND year(date) <= @end_year
        GROUP BY
            customer_id
    ) new_customers
    WHERE
        year(date) >= @start_year -- select apparition dates of cutomers after @start_year
)
GO

Definition of date_only:

date_only的定义:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:      Julio Guerra
-- Create date: 14/10/2010
-- Description: Return only the date part of a datetime value
-- Example:         date_only('2010-10-25 13:00:12') returns 2010-10-25
-- =============================================
CREATE FUNCTION date_only
(
    @datetime datetime
)
RETURNS datetime
AS
BEGIN
    RETURN dateadd(dd, 0, datediff(dd, 0, @datetime))
END
GO

Execution Plan of SELECT * FROM company_new_customers_count(8, 2009, 2009) SQL Server 2005表值函数的怪异性能

从company_new_customers_count中选择*的执行计划(8,2009,2009)

Execution Plan of SELECT * FROM company_new_customers_count(12, 2009, 2009) SQL Server 2005表值函数的怪异性能

从company_new_customers_count中选择*的执行计划(12,2009,2009)

4 个解决方案

#1

From these query plans it looks like you could benefit from an index like this (if I inferred your DB schema right):

从这些查询计划中，看起来您可以从这样的索引中获益(如果我正确地推断出您的DB模式):

CREATE INDEX IX_call_company_date ON call (company_id, date)

In general this seems to be a standard query optimization problem and the table-valued functions aren't making a difference here actually.

一般来说，这似乎是一个标准的查询优化问题，表值函数在这里并没有什么区别。

#2

The short plan uses HashJoin and clustered index scans on PK_CALL. The long plan uses NestedLoops and repeated seeks in UK_Pair_.... Most likely the cardinality estimates for '12, 2009, 2009' rule out HashJoin due to insufficient system memory, so you end up with a worse plan (despite the seeks instead of scans). Probably company_id 12 has way more customers than company_id 8.

短计划在PK_CALL上使用HashJoin和聚集索引扫描。长期计划使用NestedLoops和一再寻求UK_Pair_ ....最可能的情况是，基数估计值对“12,2009,2009”的估计排除了HashJoin，因为系统内存不足，所以您最终会得到一个更糟糕的计划(尽管查找而不是扫描)。可能company_id 12比company_id 8有更多的客户。

To give a solution is impossible w/o exact information on all factors involved (the exact schema used, including every index, and the exact statistics and cardinality of every table involved). A simple avenue to pursue would be to use a plan guide, see Designing and Implementing Plan Guides.

要给出一个解决方案，不可能有关于所有相关因素的准确信息(使用的确切模式，包括每个索引，以及所涉及的每个表的准确统计数据和基数)。一个简单的方法是使用计划指南，参见设计和实施计划指南。

#3

There are a couple of parts of the answer here. For the first part, I'm going to attempt answer the question (you didn't ask) about why none of the queries are particularly fast. This has relevance to the question you actually did ask, so bear with me.

这里有几个部分的答案。在第一部分中，我将尝试回答关于为什么所有查询都不是特别快的问题(您没有问)。这和你实际上问过的问题有关，请见谅。

Your date criteria are generally not SARGable as you've specified them -- for example in your company_new_customers function. That means that the server can't use its statistics to figure out how selective your criteria are. That means that your query plans are going to be very, very sensitive to how selective your customer_id criterion is, no matter how many date rows you have per customer.

您的日期标准通常不像您所指定的那样可SARGable——例如在您的company_new_customers函数中。这意味着服务器不能使用它的统计数据来确定您的标准有多大的选择性。这意味着您的查询计划将非常、非常敏感地关注您的customer_id标准的选择性，无论您每个客户有多少日期行。

Using a range query with an index on date and call_id in your call table should vastly improve performance in all cases and reduce the sensitivity of your query to the selectivity of the customer_id. Assuming that the date is on your call table, I'd rewrite your inner UDF something like this and tweak the input parameters to use dates instead. Doing so will make your UDF more versatile, too:

在调用表中使用带有日期索引和call_id的范围查询将极大地提高所有情况下的性能，并降低查询对customer_id的选择性的敏感性。假设日期在您的调用表上，我将重写您的内部UDF，类似这样，并调整输入参数以使用日期。这样做将使你的UDF更多功能，也:

CREATE FUNCTION company_new_customers 
(   
    @company_id INT, 
    @start_date DATETIME,
    @end_date   DATETIME
)
RETURNS TABLE 
AS
RETURN 
(    
        SELECT
            MIN(c.[date]) AS [date],
            c.customer_id
        FROM dbo.[Call] c
        JOIN dbo.[Call_Customer] cc
            ON cc.call_id = c.call_id
        WHERE c.company_id = @company_id
        AND   c.[date]    <= @end_date
        AND   NOT EXISTS (
                SELECT *
                FROM  dbo.[Call] c1
                WHERE c1.customer_id = c.customer_id
                AND   c1.[date] <= @start_date              
        )    
        GROUP BY
            c.customer_id          
)
GO

Same goes for your other view. By using year() and your date_only() functions, you make any statistics or indexes you have on your dates all but useless (though the optimizer can use them to limit the amount of data scanned, but that's a bigger discussion).

你的另一个观点也是如此。通过使用year()和date_only()函数，您可以对您的日期进行任何统计或索引，但这些统计或索引几乎是无用的(尽管优化器可以使用它们来限制扫描的数据量，但这是一个更大的讨论)。

Now then -- why does your UDF take forever? Because it calls another UDF and you use date_only() as a join argument, it pretty much can't "know" anything about what to expect in the UDF subquery, so it has chosen to loop join. It's likely choosing that plan because it's the appropriate one for some values of customer_id. It's likely that you have run a query against one of these selective customer_ids shortly after creating the UDF and the plan for that query has been cached -- even though it's not appropriate for other values of customer_id.

那么，为什么你的UDF需要永远?因为它调用了另一个UDF，并且您只使用date_only()作为一个连接参数，它几乎不能“知道”在UDF子查询中期望什么，所以它选择了循环连接。很可能选择该计划，因为它适合customer_id的某些值。很可能在创建UDF之后不久就对这些选择的customer_id运行了查询，并且缓存了该查询的计划——尽管它不适用于customer_id的其他值。

Why does a stored proc not take forever? Because the first time you run it, the stored proc generates a plan based on the first criteria you give it. Perhaps the first time you ran the SP, you used the non-selective customer ID and the stored proc has chosen to hash join. Same thing with the ad-hoc query. The optimizer is "noticing" that you have passed it a non-selective customer_id and is choosing to create a hash join for you.

为什么存储的proc不会永远保存?因为您第一次运行它时，存储的proc基于您给出的第一个标准生成计划。可能在第一次运行SP时，您使用了非选择性客户ID，而存储的proc选择了散列连接。ad-hoc查询也是如此。优化器正在“注意”您已经为它传递了一个非选择性的customer_id，并正在选择为您创建一个散列连接。

Either way, unless you get the date-SARGability issue under control, you're going to find that all of your queries like this are going to be very sensitive to your customer_id input, and depending on your usage patterns, they could blow up in your face in terms of performance -- UDFs or not.

无论哪种方式,除非你控制date-SARGability问题,你会发现,你所有的这样的查询将会对你非常敏感customer_id输入,根据您的使用模式,他们可以炸毁你的脸在性能方面——udf。

Hope this helps!

希望这可以帮助!

#4

I've seen this with SQL Server 2005. When we used a table value function for our particular query we reliably got awful performance. Took the exact same text of the query, parameters and all, put them into a stored proc and reliably got a marvelous query plan. Calling the function with the same parameters as the stored proc produced different behaviour (we started both from a cold cache). Very disappointing!

我在SQL Server 2005中看到过这个。当我们为特定的查询使用一个表值函数时，我们的性能会很糟糕。将查询、参数和所有内容的完全相同的文本输入到存储的proc中，并可靠地得到了一个奇妙的查询计划。使用与存储的proc相同的参数调用函数会产生不同的行为(我们都是从冷缓存开始的)。很失望!

Sadly we didn't have the time to diagnose this strange behaviour any more deeply and moved the project on avoiding table value functions on 2005.

遗憾的是，我们没有时间对这种奇怪的行为进行更深入的诊断，并在2005年转移了避免表值函数的项目。

This may indicate a bug in SQL Server 2005.

这可能表示SQL Server 2005中的一个错误。

#1