I am currently running into some performance issues when running a query which joins multiple tables. The main table has 170 million records, so it is pretty big.
在运行连接多个表的查询时,我目前遇到了一些性能问题。主表有1.7亿条记录,所以非常大。
What I encounter is that when I run the query with a top 1000 clause, the results are instantaneous. However, when I increase that to top 8000 the query easily runs for 15 minutes (and then I kill it). Through trial and error I found that the tipping point is with Top 7934 (works like a charm) and Top 7935 (Runs for ever)
我遇到的情况是,当我使用top 1000子句运行查询时,结果是即时的。然而,当我将其增加到8000上面时,查询很容易运行15分钟(然后我将其删除)。经过反复试验,我发现临界点在7934(像咒语一样工作)和7935(永远运行)之间
Does someone recognise this behaviour and sees what I am doing wrong? Maybe my Query is faulty in some respects.
是否有人意识到这种行为并发现我做错了什么?也许我的问题在某些方面是错误的。
Thanks in advance
谢谢提前
SELECT top 7934 h.DocIDBeg
,h.[Updated By]
,h.Action
,h.Type
,h.Details
,h.[Update Date]
,h.[Updated Field Name]
,i.Name AS 'Value Set To'
,COALESCE(i.Name,'') + COALESCE(h.NewValue, '') As 'Value Set To'
,h.OldValue
FROM
(SELECT g.DocIDBeg
,g.[Updated By]
,g.Action
,g.Type
,g.Details
,g.[Update Date]
,CAST(g.details as XML).value('auditElement[1]/field[1]/@name','nvarchar(max)') as 'Updated Field Name'
,CAST(g.details as XML).value('(/auditElement//field/setChoice/node())[1]','nvarchar(max)') as 'value'
,CAST(g.details as XML).value('(/auditElement//field/newValue/node())[1]','nvarchar(max)') as 'NewValue'
,CAST(g.details as XML).value('(/auditElement//field/oldValue/node())[1]','nvarchar(max)') as 'OldValue'
FROM(
SELECT a.ArtifactID
,f.DocIDBeg
,b.FullName AS 'Updated By'
,c.Action
,e.ArtifactType AS 'Type'
,a.Details
,a.TimeStamp AS 'Update Date'
FROM [EDDS1015272].[EDDSDBO].[AuditRecord] a
LEFT JOIN [EDDS1015272].[EDDSDBO].AuditUser b
ON a.UserID = b.UserID
LEFT JOIN [EDDS1015272].[EDDSDBO].AuditAction c
ON a.Action = c.AuditActionID
LEFT JOIN [EDDS1015272].[EDDSDBO].[Artifact] d
ON a.ArtifactID = d.ArtifactID
LEFT JOIN [EDDS1015272].[EDDSDBO].[ArtifactType] e
ON d.ArtifactTypeID = e.ArtifactTypeID
INNER JOIN [EDDS1015272].[EDDSDBO].[Document] f
ON a.ArtifactID = f.ArtifactID
) g
) h
LEFT JOIN [EDDS1015272].[EDDSDBO].[Code] i
ON h.value = i.ArtifactID
3 个解决方案
#1
2
"Through trial and error I found that the tipping point is with Top 7934 (works like a charm) and Top 7935 (Runs for ever)"
“经过反复试验,我发现最具引爆点是Top 7934(像魔法一样管用)和Top 7935(永远跑)。”
This sounds very much like a spill. Adam Mechanic does a nice demo of the internals of this in the video below. Basically the top forces a sort which requires memory. If the memory grant is not big enough to complete the operation, some of it gets done on disk.
这听起来很像泄漏。Adam Mechanic请在下面的视频中演示一下它的内部原理。基本上,顶部的力是需要记忆的。如果内存授予不够大,无法完成操作,那么可以在磁盘上完成一些操作。
https://www.youtube.com/watch?v=j5YGdIk3DXw
https://www.youtube.com/watch?v=j5YGdIk3DXw
Go to 1:03:50 to see Adam demo a spill. In his query, 668,935 rows do not spill but 668,936 rows do and the query time more than doubles.
请转到03:50观看亚当演示漏油。在他的查询中,668,935行不会溢出,但668,936行会溢出,查询时间也会加倍。
Watch the whole session if you have time. Very good for performance tuning!
如果你有时间的话,看一下整个过程。非常适合性能调优!
Could also be the tipping point, as @Remus suggested, but it's all guessing without knowing the actual plan.
也可能是引爆点,正如@Remus所建议的,但这都是猜测而不知道真正的计划。
#2
2
I used to work with data warehouses a lot and encountered similar problems quite often. The root cause is obviously in memory usage like it was already mentioned here. I don't think that rewriting your query will help a lot if you really need to query all 170 million records and I don't think that it is OK for you to wait for more memory resources. So here is just a simple workaround from me:
我以前经常使用数据仓库,经常遇到类似的问题。根源很明显是在内存使用中,就像这里提到的那样。如果你真的需要查询全部1.7亿条记录,我认为重写你的查询不会有多大帮助,我也不认为你可以等待更多的内存资源。这是我的一个简单的解决办法:
Try to split your query. For example, first query all data you need from AuditRecord record table joined to AuditUser table and store the result in another(temporary table for example) table. Then join this new table with Artifact table and so on. In this case this steps will require less memory one by one then running the whole query and have it hung out. So in the long run you will have not a query but a scrip which will be easy to track as you can print out some statuses in the console and which will do his job unlike the query which never ends
试着分割您的查询。例如,首先从AuditRecord记录表中查询所需的所有数据,并将结果存储在另一个表(例如临时表)中。然后将这个新表与工件表等连接起来。在这种情况下,这些步骤将需要更少的内存,然后运行整个查询并挂起它。因此,从长远来看,您将不会有一个查询,而是一个很容易跟踪的scrip,因为您可以在控制台中打印出一些状态,这将完成它的工作,而不像查询是永远不会结束的
Also make sure that you really need to query all this data at once, because I can think of no use cases why you need it, but still if it is an application then you should implement paging, if it is some export functionality then maybe there is a timeline you can use to batch data. For example to export data on a daily basis and query only the data from yersterday. In this case you will come up with an incremental export.
也确保你真的需要查询所有的数据,因为我能想到的任何用例你为什么需要它,但是如果它是一个应用程序那么你应该实现分页,如果是一些出口的功能也许有一个时间表可以使用批处理数据。例如,每天导出数据,只查询来自yersterday的数据。在这种情况下,您将提出增量导出。
#3
0
i think the subselects are forcing the server to fetch all before the filter can be applied this will couse more memory usage (xlm fields) and make it hard to use a decent qry plan
我认为子选择迫使服务器在应用过滤器之前获取所有内容,这将带来更多的内存使用(xlm字段),并使使用像样的qry计划变得困难
as to the strange top behavior: top has a big influence on qry plan generation. it is possible that the 7935 is a cutoff point for 1 optimal plan and that sql server will choose a different path when it needs to fetch more. or it could go back to the memory and run out of mem on 7935
关于奇怪的顶部行为:顶部对qry计划的生成有很大的影响。7935可能是一个最优计划的截止点,当sql服务器需要获取更多时,它将选择不同的路径。或者它可以回到内存,在7935上用完mem
update:
更新:
i reworked your qry to eliminate the nested selects, i'm not saying its now going to be that mutch faster but it eliminates some fields that werent used and it should be easyer to understand and optimize based on the qry plan. since we don't now the exact size of each table and we can hardly run the qry to test its impossible to give you the best answer. but i could try some tips:
我重新设计了qry以消除嵌套的选择,我并不是说现在的mutch会更快一些,但是它消除了一些未使用的字段,并且基于qry计划应该更容易理解和优化。由于我们现在没有每个表的确切大小,我们很难运行qry来测试它不可能给出最好的答案。但是我可以尝试一些建议:
1 step would be to check if you need all the left joins and turn them into inner if it is not needed ex: AuditUser, an AuditRecord could always have a user?
第一步是检查是否需要所有左连接,如果不需要,将它们转换为内部连接。例如:AuditUser,一个AuditRecord可能总是有一个用户?
an other thing you could try is to put the data of preferably the smaller tables in a tmp table and join the bigger tables to that tmp table, possible eliminating a lot of records to join
另一种方法是,将最好的数据放在tmp表中,并将更大的表加入到tmp表中,这可能会消除大量记录。
if possible you could denormalize a bit and for example put the username in the auditrecord 2 so you would eliminate the join on AuditUser alltogether
如果可能的话,您可以去规范化一些,例如将用户名放在auditrecord 2中,这样就可以消除AuditUser上的连接
but it is up to wat you need wat you can/are allowed to and the data/server
但这取决于您需要的wat,您可以/允许的wat和数据/服务器
SELECT top 7934 f.DocIDBeg
,b.FullName AS 'Updated By'
,c.Action
,e.ArtifactType AS 'Type'
,a.Details
,a.TimeStamp AS 'Update Date'
,CAST(a.Details as XML).value('auditElement[1]/field[1]/@name','nvarchar(max)') as 'Updated Field Name'
,i.Name AS 'Value Set To'
,COALESCE(i.Name,'') + COALESCE(CAST(a.Details as XML).value('(/auditElement//field/newValue/node())[1]','nvarchar(max)') as 'NewValue', '') As 'Value Set To'
,CAST(a.Details as XML).value('(/auditElement//field/oldValue/node())[1]','nvarchar(max)') as 'OldValue'
FROM [EDDS1015272].[EDDSDBO].[AuditRecord] a
LEFT JOIN [EDDS1015272].[EDDSDBO].AuditUser b
ON a.UserID = b.UserID
LEFT JOIN [EDDS1015272].[EDDSDBO].AuditAction c
ON a.Action = c.AuditActionID
LEFT JOIN [EDDS1015272].[EDDSDBO].[Artifact] d
ON a.ArtifactID = d.ArtifactID
LEFT JOIN [EDDS1015272].[EDDSDBO].[ArtifactType] e
ON d.ArtifactTypeID = e.ArtifactTypeID
INNER JOIN [EDDS1015272].[EDDSDBO].[Document] f
ON a.ArtifactID = f.ArtifactID
LEFT JOIN [EDDS1015272].[EDDSDBO].[Code] i
ON CAST(a.details as XML).value('(/auditElement//field/setChoice/node())[1]','nvarchar(max)') = i.ArtifactID
#1
2
"Through trial and error I found that the tipping point is with Top 7934 (works like a charm) and Top 7935 (Runs for ever)"
“经过反复试验,我发现最具引爆点是Top 7934(像魔法一样管用)和Top 7935(永远跑)。”
This sounds very much like a spill. Adam Mechanic does a nice demo of the internals of this in the video below. Basically the top forces a sort which requires memory. If the memory grant is not big enough to complete the operation, some of it gets done on disk.
这听起来很像泄漏。Adam Mechanic请在下面的视频中演示一下它的内部原理。基本上,顶部的力是需要记忆的。如果内存授予不够大,无法完成操作,那么可以在磁盘上完成一些操作。
https://www.youtube.com/watch?v=j5YGdIk3DXw
https://www.youtube.com/watch?v=j5YGdIk3DXw
Go to 1:03:50 to see Adam demo a spill. In his query, 668,935 rows do not spill but 668,936 rows do and the query time more than doubles.
请转到03:50观看亚当演示漏油。在他的查询中,668,935行不会溢出,但668,936行会溢出,查询时间也会加倍。
Watch the whole session if you have time. Very good for performance tuning!
如果你有时间的话,看一下整个过程。非常适合性能调优!
Could also be the tipping point, as @Remus suggested, but it's all guessing without knowing the actual plan.
也可能是引爆点,正如@Remus所建议的,但这都是猜测而不知道真正的计划。
#2
2
I used to work with data warehouses a lot and encountered similar problems quite often. The root cause is obviously in memory usage like it was already mentioned here. I don't think that rewriting your query will help a lot if you really need to query all 170 million records and I don't think that it is OK for you to wait for more memory resources. So here is just a simple workaround from me:
我以前经常使用数据仓库,经常遇到类似的问题。根源很明显是在内存使用中,就像这里提到的那样。如果你真的需要查询全部1.7亿条记录,我认为重写你的查询不会有多大帮助,我也不认为你可以等待更多的内存资源。这是我的一个简单的解决办法:
Try to split your query. For example, first query all data you need from AuditRecord record table joined to AuditUser table and store the result in another(temporary table for example) table. Then join this new table with Artifact table and so on. In this case this steps will require less memory one by one then running the whole query and have it hung out. So in the long run you will have not a query but a scrip which will be easy to track as you can print out some statuses in the console and which will do his job unlike the query which never ends
试着分割您的查询。例如,首先从AuditRecord记录表中查询所需的所有数据,并将结果存储在另一个表(例如临时表)中。然后将这个新表与工件表等连接起来。在这种情况下,这些步骤将需要更少的内存,然后运行整个查询并挂起它。因此,从长远来看,您将不会有一个查询,而是一个很容易跟踪的scrip,因为您可以在控制台中打印出一些状态,这将完成它的工作,而不像查询是永远不会结束的
Also make sure that you really need to query all this data at once, because I can think of no use cases why you need it, but still if it is an application then you should implement paging, if it is some export functionality then maybe there is a timeline you can use to batch data. For example to export data on a daily basis and query only the data from yersterday. In this case you will come up with an incremental export.
也确保你真的需要查询所有的数据,因为我能想到的任何用例你为什么需要它,但是如果它是一个应用程序那么你应该实现分页,如果是一些出口的功能也许有一个时间表可以使用批处理数据。例如,每天导出数据,只查询来自yersterday的数据。在这种情况下,您将提出增量导出。
#3
0
i think the subselects are forcing the server to fetch all before the filter can be applied this will couse more memory usage (xlm fields) and make it hard to use a decent qry plan
我认为子选择迫使服务器在应用过滤器之前获取所有内容,这将带来更多的内存使用(xlm字段),并使使用像样的qry计划变得困难
as to the strange top behavior: top has a big influence on qry plan generation. it is possible that the 7935 is a cutoff point for 1 optimal plan and that sql server will choose a different path when it needs to fetch more. or it could go back to the memory and run out of mem on 7935
关于奇怪的顶部行为:顶部对qry计划的生成有很大的影响。7935可能是一个最优计划的截止点,当sql服务器需要获取更多时,它将选择不同的路径。或者它可以回到内存,在7935上用完mem
update:
更新:
i reworked your qry to eliminate the nested selects, i'm not saying its now going to be that mutch faster but it eliminates some fields that werent used and it should be easyer to understand and optimize based on the qry plan. since we don't now the exact size of each table and we can hardly run the qry to test its impossible to give you the best answer. but i could try some tips:
我重新设计了qry以消除嵌套的选择,我并不是说现在的mutch会更快一些,但是它消除了一些未使用的字段,并且基于qry计划应该更容易理解和优化。由于我们现在没有每个表的确切大小,我们很难运行qry来测试它不可能给出最好的答案。但是我可以尝试一些建议:
1 step would be to check if you need all the left joins and turn them into inner if it is not needed ex: AuditUser, an AuditRecord could always have a user?
第一步是检查是否需要所有左连接,如果不需要,将它们转换为内部连接。例如:AuditUser,一个AuditRecord可能总是有一个用户?
an other thing you could try is to put the data of preferably the smaller tables in a tmp table and join the bigger tables to that tmp table, possible eliminating a lot of records to join
另一种方法是,将最好的数据放在tmp表中,并将更大的表加入到tmp表中,这可能会消除大量记录。
if possible you could denormalize a bit and for example put the username in the auditrecord 2 so you would eliminate the join on AuditUser alltogether
如果可能的话,您可以去规范化一些,例如将用户名放在auditrecord 2中,这样就可以消除AuditUser上的连接
but it is up to wat you need wat you can/are allowed to and the data/server
但这取决于您需要的wat,您可以/允许的wat和数据/服务器
SELECT top 7934 f.DocIDBeg
,b.FullName AS 'Updated By'
,c.Action
,e.ArtifactType AS 'Type'
,a.Details
,a.TimeStamp AS 'Update Date'
,CAST(a.Details as XML).value('auditElement[1]/field[1]/@name','nvarchar(max)') as 'Updated Field Name'
,i.Name AS 'Value Set To'
,COALESCE(i.Name,'') + COALESCE(CAST(a.Details as XML).value('(/auditElement//field/newValue/node())[1]','nvarchar(max)') as 'NewValue', '') As 'Value Set To'
,CAST(a.Details as XML).value('(/auditElement//field/oldValue/node())[1]','nvarchar(max)') as 'OldValue'
FROM [EDDS1015272].[EDDSDBO].[AuditRecord] a
LEFT JOIN [EDDS1015272].[EDDSDBO].AuditUser b
ON a.UserID = b.UserID
LEFT JOIN [EDDS1015272].[EDDSDBO].AuditAction c
ON a.Action = c.AuditActionID
LEFT JOIN [EDDS1015272].[EDDSDBO].[Artifact] d
ON a.ArtifactID = d.ArtifactID
LEFT JOIN [EDDS1015272].[EDDSDBO].[ArtifactType] e
ON d.ArtifactTypeID = e.ArtifactTypeID
INNER JOIN [EDDS1015272].[EDDSDBO].[Document] f
ON a.ArtifactID = f.ArtifactID
LEFT JOIN [EDDS1015272].[EDDSDBO].[Code] i
ON CAST(a.details as XML).value('(/auditElement//field/setChoice/node())[1]','nvarchar(max)') = i.ArtifactID