具有datediff函数的慢速T-SQL查询

时间:2021-11-30 21:22:51

I have a query which runs fast when the date clause "and datediff(day,con2.DT_DateIncluded),'2017-01-01')<=0" in the code below isn't used in the query, but runs slowly when it is included. Though it runs fast when I run just the part "select top 2 ID_Contact...", even including the date clause. I have this query on a classic ASP application, and it can't be converted in to a stored procedure (project scope reasons). Can you help me find a way to improve the performance of the full query just by changing the query code?

我有一个查询,当查询中没有使用以下代码中的日期子句“和datediff(day,con2.DT_DateIncluded),'2017-01-01')<= 0”时运行速度很快,但在运行时运行缓慢它包括在内。虽然它只运行“select top 2 ID_Contact ...”部分时运行得很快,甚至包括date子句。我在经典ASP应用程序上有这个查询,它无法转换为存储过程(项目范围原因)。您能否通过更改查询代码帮助我找到改善完整查询性能的方法?

select distinct top 10 
    ID_Contact, NO_CodCompany 
from 
    tblContacts con1 
where 
    ID_Contact in (select top 2 ID_Contact
                   from tblContacts con2 
                   inner join tblCompanies cp on con2.NO_CodCompany = cp.ID_Company
                   where con2.NO_CodCompany = con1.NO_CodCompany
                     and datediff(day, con2.DT_DateIncluded), '2017-01-01') <= 0)

3 个解决方案

#1


1  

Instead of `DATEDIFF() < 0' try using:

而不是'DATEDIFF()<0'尝试使用:

and con2.DT_DateIncluded <= '2017-01-01' 

Also, ensure that there is an index on the `DT_DateIncluded' column.

另外,确保“DT_DateIncluded”列上有索引。

The reason DATEDIFF() runs slow is that using it takes a bit of time to perform the calculation, the query optimizer is (probably) ending up running it for the entire table, and there is (probably) no index to help it select the required rows.

DATEDIFF()运行缓慢的原因是使用它需要一些时间来执行计算,查询优化器(可能)最终为整个表运行它,并且(可能)没有索引来帮助它选择所需的行。

When you remove that clause the query runs faster, but that is probably helped along by the fact that you're only selecting the first two rows in the inner query and ten rows in the outer query, allowing a table scan to be performant enough.

当您删除该子句时,查询运行得更快,但这可能有助于您只选择内部查询中的前两行和外部查询中的十行,从而使表扫描足够高效。

#2


1  

This is essentially your query: This is your query:

这基本上是您的查询:这是您的查询:

select distinct top 10 ID_Contact, NO_CodCompany
from tblContacts con1
where ID_Contact in (select top 2 ID_Contact
                     from tblContacts con2 inner join
                          tblCompanies cp
                          on con2.NO_CodCompany = cp.ID_Company
                     where con2.NO_CodCompany = con1.NO_CodCompany and
                           datediff(day, con2.DT_DateIncluded), '2017-01-01') <= 0
                    );

My first suggestion is to change the datediff() to a simple date comparison:

我的第一个建议是将datediff()更改为简单的日期比较:

select distinct top 10 ID_Contact, NO_CodCompany
from tblContacts con1
where ID_Contact in (select top 2 ID_Contact
                     from tblContacts con2 inner join
                          tblCompanies cp
                          on con2.NO_CodCompany = cp.ID_Company
                     where con2.NO_CodCompany = con1.NO_CodCompany and
                           con2.DT_DateIncluded < '2017-01-02'
                    );

Then, I would remove the JOIN in the subquery. I'm not 100% sure this is exactly equivalent, because that might depend on nuances in the data:

然后,我将删除子查询中的JOIN。我不是100%确定这完全等价,因为这可能取决于数据中的细微差别:

select distinct top 10 ID_Contact, NO_CodCompany
from tblContacts con1
where con1.ID_Contact in (select top 2 con2.ID_Contact
                          from tblCompanies cp
                          where con1.NO_CodCompany = cp.ID_Company and
                                con1.DT_DateIncluded < '2017-01-02' 
                         );

Then, if you can remove the select distinct in the outermost query, you should do that.

然后,如果您可以删除最外层查询中的select distinct,则应该这样做。

#3


1  

Try this instead:

试试这个:

con2.DT_DateIncluded < '20170102'

It's better because it still allows the server to make use of any indexes on the DT_DateIncluded column. Currently, this is not possible. Even worse, the query is probably having to run that DATEDIFF() function on every record in the table.

它更好,因为它仍然允许服务器使用DT_DateIncluded列上的任何索引。目前,这是不可能的。更糟糕的是,查询可能必须在表中的每个记录上运行该DATEDIFF()函数。

Note that this is equivalent to what you posted, even if it might not match what you intended. I suspect con2.DT_DateIncluded < '20170101' is closer to what you really meant.

请注意,这相当于您发布的内容,即使它可能与您的预期不符。我怀疑con2.DT_DateIncluded <'20170101'更接近你的真实含义。

I also suspect you could do this either without the 2nd instance of tblContacts or with a windowing function to get much better results, or at least by using JOIN instead of IN to filter the results.

我还怀疑你可以在没有tblContacts的第二个实例或者使用窗口函数来获得更好的结果,或者至少使用JOIN而不是IN来过滤结果。

Finally, for historical reasons, when entering a date-only value, you should use the unseparated date format as described here:

最后,由于历史原因,在输入仅限日期的值时,您应该使用未分离的日期格式,如下所述:

The ultimate guide to the datetime datatypes

datetime数据类型的最终指南

For date/time values, you can still use the separated yyyy-mm-dd hh:mm:ss you're used to, but if you only have the date part, yyyymmdd is better.

对于日期/时间值,您仍然可以使用您习惯的分离的yyyy-mm-dd hh:mm:ss,但如果您只有日期部分,则yyyymmdd会更好。


Based on this comment:

根据这个评论:

My goal with this query is to obtain contacts from companies but limited to "n" contacts per company

我对此查询的目标是从公司获取联系人,但每家公司仅限于“n”个联系人

You should look into the APPLY operator. Unfortunately, it's still not clear to me how everything fits together, but I will least provide a demonstration using the APPLY operator to show two contacts per company that you can use as a starting point:

您应该查看APPLY运算符。遗憾的是,我仍然不清楚所有内容是如何组合在一起的,但我最少会提供一个演示,使用APPLY运算符显示每个公司的两个联系人,您可以将其作为起点:

SELECT TOP 10 ct.ID_Contact, ct.NO_CodCompany
FROM tblCompanies cp
CROSS APPLY (
    SELECT TOP 2 ID_Contact, NO_CodCompany
    FROM tblContacs 
    WHERE NO_CodCompany = cp.ID_Company
        AND DT_DateIncluded < '20170102'
    ORDER BY DT_DateIncluded DESC
) ct

APPLY works kind of like a JOIN on a nested SELECT query, where there is no ON clause; the join conditional is instead included as part of the WHERE clause in the nested SELECT statement.

APPLY在嵌套的SELECT查询中有点像JOIN,其中没有ON子句;而是将join条件作为嵌套SELECT语句中WHERE子句的一部分包含在内。

Note the use of CROSS. This will exclude companies that have no contacts at all. If you want to include those companies, change it to OUTER.

注意使用CROSS。这将排除完全没有联系的公司。如果要包含这些公司,请将其更改为OUTER。

You should also look at what indexes you have defined. A single index on the tblContacts table that looks at NO_CodCompany and DT_DateIncluded (in that order!) might work wonders for this query, especially if it also has ID_Contact in the INCLUDES clause. Then you could complete the tblContacts portion of the query entirely from the index.

您还应该查看已定义的索引。在tblContacts表上查看NO_CodCompany和DT_DateIncluded(按此顺序!)的单个索引可能会对此查询产生奇迹,特别是如果它在INCLUDES子句中也有ID_Contact。然后,您可以完全从索引中完成查询的tblContacts部分。

#1


1  

Instead of `DATEDIFF() < 0' try using:

而不是'DATEDIFF()<0'尝试使用:

and con2.DT_DateIncluded <= '2017-01-01' 

Also, ensure that there is an index on the `DT_DateIncluded' column.

另外,确保“DT_DateIncluded”列上有索引。

The reason DATEDIFF() runs slow is that using it takes a bit of time to perform the calculation, the query optimizer is (probably) ending up running it for the entire table, and there is (probably) no index to help it select the required rows.

DATEDIFF()运行缓慢的原因是使用它需要一些时间来执行计算,查询优化器(可能)最终为整个表运行它,并且(可能)没有索引来帮助它选择所需的行。

When you remove that clause the query runs faster, but that is probably helped along by the fact that you're only selecting the first two rows in the inner query and ten rows in the outer query, allowing a table scan to be performant enough.

当您删除该子句时,查询运行得更快,但这可能有助于您只选择内部查询中的前两行和外部查询中的十行,从而使表扫描足够高效。

#2


1  

This is essentially your query: This is your query:

这基本上是您的查询:这是您的查询:

select distinct top 10 ID_Contact, NO_CodCompany
from tblContacts con1
where ID_Contact in (select top 2 ID_Contact
                     from tblContacts con2 inner join
                          tblCompanies cp
                          on con2.NO_CodCompany = cp.ID_Company
                     where con2.NO_CodCompany = con1.NO_CodCompany and
                           datediff(day, con2.DT_DateIncluded), '2017-01-01') <= 0
                    );

My first suggestion is to change the datediff() to a simple date comparison:

我的第一个建议是将datediff()更改为简单的日期比较:

select distinct top 10 ID_Contact, NO_CodCompany
from tblContacts con1
where ID_Contact in (select top 2 ID_Contact
                     from tblContacts con2 inner join
                          tblCompanies cp
                          on con2.NO_CodCompany = cp.ID_Company
                     where con2.NO_CodCompany = con1.NO_CodCompany and
                           con2.DT_DateIncluded < '2017-01-02'
                    );

Then, I would remove the JOIN in the subquery. I'm not 100% sure this is exactly equivalent, because that might depend on nuances in the data:

然后,我将删除子查询中的JOIN。我不是100%确定这完全等价,因为这可能取决于数据中的细微差别:

select distinct top 10 ID_Contact, NO_CodCompany
from tblContacts con1
where con1.ID_Contact in (select top 2 con2.ID_Contact
                          from tblCompanies cp
                          where con1.NO_CodCompany = cp.ID_Company and
                                con1.DT_DateIncluded < '2017-01-02' 
                         );

Then, if you can remove the select distinct in the outermost query, you should do that.

然后,如果您可以删除最外层查询中的select distinct,则应该这样做。

#3


1  

Try this instead:

试试这个:

con2.DT_DateIncluded < '20170102'

It's better because it still allows the server to make use of any indexes on the DT_DateIncluded column. Currently, this is not possible. Even worse, the query is probably having to run that DATEDIFF() function on every record in the table.

它更好,因为它仍然允许服务器使用DT_DateIncluded列上的任何索引。目前,这是不可能的。更糟糕的是,查询可能必须在表中的每个记录上运行该DATEDIFF()函数。

Note that this is equivalent to what you posted, even if it might not match what you intended. I suspect con2.DT_DateIncluded < '20170101' is closer to what you really meant.

请注意,这相当于您发布的内容,即使它可能与您的预期不符。我怀疑con2.DT_DateIncluded <'20170101'更接近你的真实含义。

I also suspect you could do this either without the 2nd instance of tblContacts or with a windowing function to get much better results, or at least by using JOIN instead of IN to filter the results.

我还怀疑你可以在没有tblContacts的第二个实例或者使用窗口函数来获得更好的结果,或者至少使用JOIN而不是IN来过滤结果。

Finally, for historical reasons, when entering a date-only value, you should use the unseparated date format as described here:

最后,由于历史原因,在输入仅限日期的值时,您应该使用未分离的日期格式,如下所述:

The ultimate guide to the datetime datatypes

datetime数据类型的最终指南

For date/time values, you can still use the separated yyyy-mm-dd hh:mm:ss you're used to, but if you only have the date part, yyyymmdd is better.

对于日期/时间值,您仍然可以使用您习惯的分离的yyyy-mm-dd hh:mm:ss,但如果您只有日期部分,则yyyymmdd会更好。


Based on this comment:

根据这个评论:

My goal with this query is to obtain contacts from companies but limited to "n" contacts per company

我对此查询的目标是从公司获取联系人,但每家公司仅限于“n”个联系人

You should look into the APPLY operator. Unfortunately, it's still not clear to me how everything fits together, but I will least provide a demonstration using the APPLY operator to show two contacts per company that you can use as a starting point:

您应该查看APPLY运算符。遗憾的是,我仍然不清楚所有内容是如何组合在一起的,但我最少会提供一个演示,使用APPLY运算符显示每个公司的两个联系人,您可以将其作为起点:

SELECT TOP 10 ct.ID_Contact, ct.NO_CodCompany
FROM tblCompanies cp
CROSS APPLY (
    SELECT TOP 2 ID_Contact, NO_CodCompany
    FROM tblContacs 
    WHERE NO_CodCompany = cp.ID_Company
        AND DT_DateIncluded < '20170102'
    ORDER BY DT_DateIncluded DESC
) ct

APPLY works kind of like a JOIN on a nested SELECT query, where there is no ON clause; the join conditional is instead included as part of the WHERE clause in the nested SELECT statement.

APPLY在嵌套的SELECT查询中有点像JOIN,其中没有ON子句;而是将join条件作为嵌套SELECT语句中WHERE子句的一部分包含在内。

Note the use of CROSS. This will exclude companies that have no contacts at all. If you want to include those companies, change it to OUTER.

注意使用CROSS。这将排除完全没有联系的公司。如果要包含这些公司,请将其更改为OUTER。

You should also look at what indexes you have defined. A single index on the tblContacts table that looks at NO_CodCompany and DT_DateIncluded (in that order!) might work wonders for this query, especially if it also has ID_Contact in the INCLUDES clause. Then you could complete the tblContacts portion of the query entirely from the index.

您还应该查看已定义的索引。在tblContacts表上查看NO_CodCompany和DT_DateIncluded(按此顺序!)的单个索引可能会对此查询产生奇迹,特别是如果它在INCLUDES子句中也有ID_Contact。然后,您可以完全从索引中完成查询的tblContacts部分。