SQL Server 2008上的间歇性慢查询

I am developing a system which periodically (4-5 times daily) runs a select statement, that normally takes less than 10 seconds but periodically has taken up to 40 minutes.

我正在开发一个系统,它定期(每天4-5次)运行一个选择语句,通常需要不到10秒,但定期需要40分钟。

The database is on Windows Server 2008 + SQL Server 2008 R2; both 64bit.

该数据库位于Windows Server 2008 + SQL Server 2008 R2上;都是64位。

There is a service on the machine running the database which polls the database and generates values for records which require it. These records are then periodically queried using a multi table join select from a service on a second machine written in C++ (VS 2010) using the MFC CRecordset class to extract the data. An example of the the query causing the problem is shown below.

运行数据库的计算机上有一个服务,它轮询数据库并为需要它的记录生成值。然后,使用MFC CRecordset类从使用CFC(VS 2010)编写的第二台计算机上的服务中使用多表连接选择来定期查询这些记录,以提取数据。导致问题的查询示例如下所示。

SELECT DISTINCT "JobKeysFrom"."Key" AS "KeyFrom","KeysFrom"."ID" AS "IDFrom",
"KeysFrom"."X" AS "XFrom","KeysFrom"."Y" AS "YFrom","JobKeysTo"."Key" AS "KeyTo",
"KeysTo"."ID" AS "IDTo","KeysTo"."X" AS "XTo","KeysTo"."Y" AS "YTo",
"Matrix"."TimeInSeconds","Matrix"."DistanceInMetres","Matrix"."Calculated"
FROM "JobKeys" AS "JobKeysFrom"
INNER JOIN "JobKeys" AS "JobKeysTo" ON 
("JobKeysFrom"."Key"<>"JobKeysTo"."Key") AND 
("JobKeysFrom"."JobID"=531) AND 
("JobKeysTo"."JobID"=531)
INNER JOIN "Keys" AS "KeysFrom" ON 
("JobKeysFrom"."Key"="KeysFrom"."Key") AND ("JobKeysFrom"."Status"=4)
INNER JOIN "Keys" AS "KeysTo" ON 
("JobKeysTo"."Key"="KeysTo"."Key") AND ("JobKeysTo"."Status"=4)
INNER JOIN "Matrix" AS "Matrix" ON 
("Matrix"."IDFrom"="KeysFrom"."ID") AND ("Matrix"."IDTo"="KeysTo"."ID")
ORDER BY "JobKeysFrom"."Key","JobKeysTo"."Key"

I have tried the following

我尝试了以下内容

checked the indexes and all seem correct and they are active and are being used according to the query

检查索引,所有看起来都是正确的,它们是活动的,并根据查询使用

the design advisor comes back with no suggestions

设计顾问回来时没有任何建议

I have tried defragging the indexes and data

我试过去整理索引和数据

rebuilt the database from scratch by exporting the data and reimporting it in a new database.

通过导出数据并在新数据库中重新导入数据,从头开始重建数据库。

ran the profiler on it and found that when it goes wrong it seems to do many millions (up to 100 million) of reads rather than a few hundred thousand.

在它上面运行了分析器,发现当它出错时,似乎可以进行数百万(最多1亿)读取而不是数十万次。

ran the database on a different server

在另一台服务器上运行数据库

During the time it is running the query, I can run exactly the same query in the management studio window and it will be back to running in 10 seconds. The problem does not seem to be lock, deadlock, CPU, disk or memory related as it has done it when the machine running the database was only running this one query. The server has 4 processors and 16 gb of memory to run it in. I have also tried upgrading the disks to much faster ones and this had no effect.

在运行查询期间,我可以在管理工作室窗口中运行完全相同的查询,它将在10秒内恢复运行。当运行数据库的机器只运行这一个查询时,问题似乎不是锁,死锁,CPU,磁盘或内存相关。服务器有4个处理器和16 GB的内存来运行它。我也尝试将磁盘升级到更快的磁盘,这没有任何效果。

It seems to me that it is almost as though the database receives the query, starts to process it and then goes to sleep for 40 minutes or runs the query without using the indexes.

在我看来,它几乎就像数据库接收查询,开始处理它然后进入休眠状态40分钟或运行查询而不使用索引。

When it takes a long time it will eventually finish and send the query results (normally about 70-100000 records) back to the calling application.

当它需要很长时间时,它最终会完成并将查询结果(通常约70-100000条记录)发送回调用应用程序。

Any help or suggestions would be gratefully received, many thanks

非常感谢任何帮助或建议

2 个解决方案

#1

This sounds very much like parameter sniffing.

这听起来非常像参数嗅探。

When a stored procedure is invoked and there is no existing execution plan in the cache matching the set options for the connection a new execution plan will be compiled using the parameter values passed in on that invocation.

当调用存储过程并且高速缓存中没有与连接的设置选项匹配的现有执行计划时,将使用在该调用上传递的参数值来编译新的执行计划。

Sometimes this will happen when the parameters passed are atypical (e.g. have unusually high selectivity) so the generated plan will not be suitable for most other invocations with different parameters. For example it may choose a plan with index seeks and bookmark lookups which is fine for a highly selective case but poor if it needs to be done hundreds of thousands of times.

有时,当传递的参数不典型时(例如,具有异常高的选择性),这将发生,因此生成的计划将不适用于具有不同参数的大多数其他调用。例如,它可以选择具有索引搜索和书签查找的计划,这对于高度选择性的情况是好的,但是如果需要完成数十万次则很差。

This would explain why the number of reads goes through the roof.

这可以解释为什么读数会通过屋顶。

Your SSMS connection will likely have different SET ... options so will not get handed the same problematic plan from the cache when you execute the stored procedure inside SSMS

您的SSMS连接可能会有不同的SET ...选项,因此当您在SSMS中执行存储过程时,不会从缓存中获取相同的有问题的计划

You can use the following to get the plan for the slow session

您可以使用以下内容来获取慢速会话的计划

select p.query_plan, *
from sys.dm_exec_requests r
cross apply sys.dm_exec_query_plan(r.plan_handle) p
where r.session_id = <session_id>

Then compare with the plan for the good session.

然后与良好会话的计划进行比较。

If you do determine that parameter sniffing is at fault you can use OPTIMIZE FOR hints to avoid it choosing the bad plan.

如果您确定参数嗅探有问题,则可以使用OPTIMIZE FOR提示来避免选择错误的计划。

#2

Check that you don't have a maintenance task running that is rebuilding indexes, or that your database statistics are somehow invalid when the query is executed.

检查您是否正在运行正在重建索引的维护任务,或者在执行查询时数据库统计信息在某种程度上无效。

This is exactly the sort of thing one would expect to see if the query is not using your indexes, which is usually because either the indexes are not accessible to the query at the point it runs or because the statistics are invalid and make the optimiser believe that your large table(s) only have a few rows in them and the query would run faster with a full table scan than using indexed access.

如果查询没有使用索引,这就是人们期望看到的东西,这通常是因为查询在运行时无法访问索引,或者因为统计信息无效并使优化器相信您的大型表只有几行,并且使用全表扫描比使用索引访问查询运行得更快。

#1