I have a table that I will populate with values from an expensive calculation (with xquery from an immutable XML column). To speed up deployment to production I have precalculated values on a test server and saved to a file with BCP.
我有一个表,我将使用昂贵的计算值(使用来自不可变XML列的xquery)填充值。为了加快部署到生产,我在测试服务器上预先计算了值,并使用BCP保存到文件中。
My script is as follows
我的脚本如下
-- Lots of other work, including modifying OtherTable
CREATE TABLE FOO (...)
GO
BULK INSERT FOO
FROM 'C:\foo.dat';
GO
-- rerun from here after the break
INSERT INTO FOO
(ID, TotalQuantity)
SELECT
e.ID,
SUM(e.Quantity) as TotalQuantity
FROM (select
o.ID,
h.n.value('TotalQuantity[1]/.', 'int') as TotalQuantity
FROM dbo.OtherTable o
CROSS APPLY XmlColumn.nodes('(item/.../salesorder/)') h(n)
WHERE o.ID NOT IN (SELECT DISTINCT ID FROM FOO)
) as E
GROUP BY e.ID
When I run the script in management studio the first two statements completes within seconds, but the last statement takes 4 hours to complete. Since no rows are added to the OtherTable
since my foo.dat was computed management studio reports (0 row(s) affected)
.
当我在管理工作室中运行脚本时,前两个语句在几秒钟内完成,但最后一个语句需要4个小时才能完成。因为没有行添加到OtherTable,因为我的foo.dat是计算管理工作室报告(0行受影响)。
If I cancel the query execution after a couple of minutes and selects just the last query and run that separately it completes within 5 seconds.
如果我在几分钟后取消查询执行并仅选择最后一个查询并单独运行它在5秒内完成。
Notable facts:
- The OtherTable contains 200k rows and the data in XmlColumn is pretty large, total table size ~3GB
- The FOO table gets 1.3M rows
OtherTable包含200k行,XmlColumn中的数据非常大,总表大小约为3GB
FOO表获得1.3M行
What could possibly make the difference?
Management studio has implicit transactions turned off. Is far as I can understand each statement will then run in its own transaction.
什么可能有所作为?管理工作室已关闭隐式交易。据我所知,每个语句将在其自己的事务中运行。
Update:
If I first select and run the script until -- rerun from here after the break
, then select and run just the last query, it is still slow until I cancel execution and try again. This at least rules out any effects of running "together" with the previous code in the script and boils down to the same query being slow on first execution and fast on the second (running with all other conditions the same).
更新:如果我首先选择并运行脚本,直到 - 在中断后从此处重新运行,然后选择并运行最后一个查询,它仍然很慢,直到我取消执行并再次尝试。这至少排除了与脚本中的前一代码一起运行“一起”的任何影响,归结为同一查询在第一次执行时速度慢,在第二次运行时快速运行(在所有其他条件相同的情况下运行)。
3 个解决方案
#1
2
Probably different execution plans. See Slow in the Application, Fast in SSMS? Understanding Performance Mysteries.
可能是不同的执行计划。请参阅应用程序中的慢速,SSMS中的快速?了解性能之谜。
#2
1
Could it possibly be related to the statistics being completely wrong on the newly created Foo
table? If SQL Server automatically updates the statistics when it first runs the query, the second run would have its execution plan created from up-to-date statistics.
它可能与新创建的Foo表上的统计数据完全错误有关吗?如果SQL Server在首次运行查询时自动更新统计信息,则第二次运行将根据最新统计信息创建其执行计划。
What if you check the statistics right after the bulk insert (with the STATS_DATE
function) and then checks it again after having cancelled the long-running query? Did the stats get updated, even though the query was cancelled?
如果您在批量插入之后立即检查统计信息(使用STATS_DATE函数),然后在取消长时间运行的查询后再次检查它,该怎么办?即使查询被取消,统计数据是否也会更新?
In that case, an UPDATE STATISTICS
on Foo
right after the bulk insert could help.
在这种情况下,批量插入后的Foo UPDATE STATISTICS可以提供帮助。
#3
0
Not sure exactly why it helped, but i rewrote the last query to an left outer join
instead and suddenly the execution dropped to 15 milliseconds.
不知道为什么它有帮助,但我重写了最后一个查询到左外连接而不是突然执行下降到15毫秒。
INSERT INTO FOO
(ID, TotalQuantity)
SELECT
e.ID,
SUM(e.Quantity) as TotalQuantity
FROM (select
o.ID,
h.n.value('TotalQuantity[1]/.', 'int') as TotalQuantity
FROM dbo.OtherTable o
INNER JOIN FOO f ON o.ID = f.ID
CROSS APPLY o.XmlColumn.nodes('(item/.../salesorder/)') h(n)
WHERE f.ID = null
) as E
GROUP BY e.ID
#1
2
Probably different execution plans. See Slow in the Application, Fast in SSMS? Understanding Performance Mysteries.
可能是不同的执行计划。请参阅应用程序中的慢速,SSMS中的快速?了解性能之谜。
#2
1
Could it possibly be related to the statistics being completely wrong on the newly created Foo
table? If SQL Server automatically updates the statistics when it first runs the query, the second run would have its execution plan created from up-to-date statistics.
它可能与新创建的Foo表上的统计数据完全错误有关吗?如果SQL Server在首次运行查询时自动更新统计信息,则第二次运行将根据最新统计信息创建其执行计划。
What if you check the statistics right after the bulk insert (with the STATS_DATE
function) and then checks it again after having cancelled the long-running query? Did the stats get updated, even though the query was cancelled?
如果您在批量插入之后立即检查统计信息(使用STATS_DATE函数),然后在取消长时间运行的查询后再次检查它,该怎么办?即使查询被取消,统计数据是否也会更新?
In that case, an UPDATE STATISTICS
on Foo
right after the bulk insert could help.
在这种情况下,批量插入后的Foo UPDATE STATISTICS可以提供帮助。
#3
0
Not sure exactly why it helped, but i rewrote the last query to an left outer join
instead and suddenly the execution dropped to 15 milliseconds.
不知道为什么它有帮助,但我重写了最后一个查询到左外连接而不是突然执行下降到15毫秒。
INSERT INTO FOO
(ID, TotalQuantity)
SELECT
e.ID,
SUM(e.Quantity) as TotalQuantity
FROM (select
o.ID,
h.n.value('TotalQuantity[1]/.', 'int') as TotalQuantity
FROM dbo.OtherTable o
INNER JOIN FOO f ON o.ID = f.ID
CROSS APPLY o.XmlColumn.nodes('(item/.../salesorder/)') h(n)
WHERE f.ID = null
) as E
GROUP BY e.ID