I have a large SQL Server database with a table at about 45 million records. I am archiving this table, and need to remove all entries greater than two years ago. I have the inserting into my archive table working fine, but I'm having issues with efficiency when deleting.
我有一个大型SQL Server数据库,其中有一个表,记录约为4500万条。我正在归档这个表,需要删除所有大于两年前的条目。我将插入到我的归档表中工作得很好,但是我在删除时遇到了效率问题。
My problem lies within the indexes currently on the table. I would like to delete (and archival insert) in 1000 record chunks. To do this, I need to determine the "top" 1000 records fulfilling the requirement (greater than two years old). The DateTime stamp on the row is a clustered index, so this is great for grabbing the rows. However SQL 2000 does not allow DELETE TOP 1000.... so I need to do something like:
我的问题在于目前表上的索引。我想在1000个记录块中删除(和归档插入)。要做到这一点,我需要确定满足需求的“最高”1000条记录(大于两岁)。行上的DateTime戳是一个聚集索引,所以这对于获取行非常有用。然而SQL 2000不允许删除前1000 ....所以我需要做的是:
DELETE FROM <table> WHERE [UniqueID] IN
(SELECT TOP 1000 [UniqueID] FROM <table> WHERE [DateTime] < @TwoYearsAgo)
This would work great, if UniqueID was indexed. Since it is not, this takes a very long time (it is scanning the table for each of the 1000 records to be deleted). There are no other indexes on the table that uniquely identify the records. I am told it would be too costly to compute an index on UniqueID, as this is a live DB. Can anyone point out a way to optimize this query?
如果eid被编入索引,这将非常有用。因为它不是,所以这需要很长的时间(它扫描表中要删除的1000条记录)。表上没有唯一标识记录的其他索引。有人告诉我,计算一个关于UniqueID的索引是非常昂贵的,因为这是一个动态数据库。有人能指出一种优化这个查询的方法吗?
7 个解决方案
#1
17
How about rewriting the query?
重写查询怎么样?
SET ROWCOUNT 1000
DELETE FROM <table> WHERE [DateTime] < @TwoYearsAgo
See documentation on SET ROWCOUNT (Transact-SQL).
参见关于SET ROWCOUNT (Transact-SQL)的文档。
Also note that per the documentation for DELETE, it supports the TOP
clause, but that is apparently new for SQL Server 2005 and up. I'm saying this since it sounds like it isn't supported on your database server, but have you actually tried using it? I don't have access to SQL Server 2000 documentation so I'm unsure if it is supported on that version. It very well might not be.
还要注意,根据DELETE的文档,它支持TOP子句,但这显然是SQL Server 2005和up的新特性。我这么说是因为它听起来好像不支持您的数据库服务器,但是您真的尝试过使用它吗?我不能访问SQL Server 2000文档,所以我不确定这个版本是否支持它。很可能不是。
DELETE TOP (1000) FROM <table> WHERE [DateTime] < @TwoYearsAgo
Note the difference from the way TOP on select can be written, without the parenthesis. For UPDATE, DELETE and INSERT, the expression must be parenthesized, even if it's only a constant number like above.
注意,select的顶部与没有括号的方式不同。对于更新、删除和插入,表达式必须加上括号,即使它只是上面提到的一个常数。
#2
8
You can delete a subquery:
可以删除子查询:
DELETE <table> FROM (
SELECT TOP 1000 *
FROM <table>
WHERE [DateTime] < @TwoYearsAgo);
See the example E: at SQL 2000 DELETE Syntax. This is recommended over the SET ROWCOUNT approach. In SQL 2005 and later you can specify directly the TOP in DELETE.
参见示例E:在SQL 2000中删除语法。这是通过SET ROWCOUNT方法推荐的。在SQL 2005和以后的版本中,您可以直接在DELETE中指定顶部。
#3
3
you can also do
你也可以做
DELETE TOP(1000) FROM <table> WHERE [DateTime] < @TwoYearsAgo
God only knows why they use top(x) for delete and top x for select, most people don't even seem to know about this feature!
天知道为什么他们用top(x)来删除,用top x来选择,大多数人似乎都不知道这个功能!
edit: Apparently its 2005+ so you should probably ignore this.
编辑:显然是2005+,所以你应该忽略它。
#4
2
You could use SET ROWCOUNT:
您可以使用SET ROWCOUNT:
SET ROWCOUNT 1000
DELETE FROM <table> WHERE [DateTime] < @TwoYearsAgo
#5
1
I had to do something similar a while back -- make lightweight insert and delete to move old records to an archive table. Although counterintuitive, the fastest and least impactful solution I found was:
我以前也做过类似的事情——创建轻量级的插入和删除,将旧记录移动到归档表。虽然有违直觉,但我找到的最快、影响最小的解决方案是:
-
Make a small #temp table with the values of IDs for the top (x) rows. If ID really can't be indexed in your scenario, you might use date AND ID instead, so the combination of the two can use an index.
创建一个带有顶部(x)行id值的小型#temp表。如果ID在您的场景中确实不能被索引,那么您可以使用date和ID,因此两者的组合可以使用索引。
-
begin tran
开始tran
-
Insert into archive table where ID and DATE in ( #temp )
插入到归档表中,其中ID和日期在(#temp)中
-
Delete from main table where ID and DATE in ( #temp )
从主表中删除ID和日期(#temp)
-
commit
提交
-
Truncate #temp
截断#临时
-
Repeat
重复
Having the temp table to stage the row identifiers is more total work than a straight delete, but makes the process very lightweight in cases where you want to just chip away a little at a time without blocking.
让temp表处理行标识符比直接删除更需要大量的工作,但是在您希望每次只处理一点而不阻塞的情况下,这会使进程非常轻量级。
Also I agree with Lasse - can't see the point of a unique id with no index, and therefore no constraint, to enforce it.
我也同意Lasse的观点——不能看到唯一id的点,没有索引,因此没有约束来执行它。
#6
0
I wonder whether you must stick with the 1000 record chunk requirement. If it is there for the reason of server load and kind of arbitrary, you may want to try the following, since you already have a clustered index on [DateTime]:
我想知道你是否必须坚持1000记录块的要求。如果它存在是因为服务器负载的原因,或者出于某种随意性,您可能想尝试以下方法,因为您已经在[DateTime]上有一个集群索引:
DELETE FROM <table>
WHERE [DateTime] < @TwoYearsAgo
and [DateTime] < (select dateadd(day, 1, min([DateTime])) from <table>)
#7
0
For backward compatibility, the parentheses are optional in SELECT statements. We recommend that you always use parentheses for TOP in SELECT statements for consistency with its required use in INSERT
, UPDATE
, MERGE
, and DELETE
statements in which the parentheses are required.
对于向后兼容性,在SELECT语句中括号是可选的。我们建议您在SELECT语句中始终使用圆括号作为顶部,以保持其在INSERT、UPDATE、MERGE和DELETE语句中所必需的一致性。
USE AdventureWorks;
GO
DELETE TOP (20)
FROM Purchasing.PurchaseOrderDetail
WHERE DueDate < '20120701';
GO
#1
17
How about rewriting the query?
重写查询怎么样?
SET ROWCOUNT 1000
DELETE FROM <table> WHERE [DateTime] < @TwoYearsAgo
See documentation on SET ROWCOUNT (Transact-SQL).
参见关于SET ROWCOUNT (Transact-SQL)的文档。
Also note that per the documentation for DELETE, it supports the TOP
clause, but that is apparently new for SQL Server 2005 and up. I'm saying this since it sounds like it isn't supported on your database server, but have you actually tried using it? I don't have access to SQL Server 2000 documentation so I'm unsure if it is supported on that version. It very well might not be.
还要注意,根据DELETE的文档,它支持TOP子句,但这显然是SQL Server 2005和up的新特性。我这么说是因为它听起来好像不支持您的数据库服务器,但是您真的尝试过使用它吗?我不能访问SQL Server 2000文档,所以我不确定这个版本是否支持它。很可能不是。
DELETE TOP (1000) FROM <table> WHERE [DateTime] < @TwoYearsAgo
Note the difference from the way TOP on select can be written, without the parenthesis. For UPDATE, DELETE and INSERT, the expression must be parenthesized, even if it's only a constant number like above.
注意,select的顶部与没有括号的方式不同。对于更新、删除和插入,表达式必须加上括号,即使它只是上面提到的一个常数。
#2
8
You can delete a subquery:
可以删除子查询:
DELETE <table> FROM (
SELECT TOP 1000 *
FROM <table>
WHERE [DateTime] < @TwoYearsAgo);
See the example E: at SQL 2000 DELETE Syntax. This is recommended over the SET ROWCOUNT approach. In SQL 2005 and later you can specify directly the TOP in DELETE.
参见示例E:在SQL 2000中删除语法。这是通过SET ROWCOUNT方法推荐的。在SQL 2005和以后的版本中,您可以直接在DELETE中指定顶部。
#3
3
you can also do
你也可以做
DELETE TOP(1000) FROM <table> WHERE [DateTime] < @TwoYearsAgo
God only knows why they use top(x) for delete and top x for select, most people don't even seem to know about this feature!
天知道为什么他们用top(x)来删除,用top x来选择,大多数人似乎都不知道这个功能!
edit: Apparently its 2005+ so you should probably ignore this.
编辑:显然是2005+,所以你应该忽略它。
#4
2
You could use SET ROWCOUNT:
您可以使用SET ROWCOUNT:
SET ROWCOUNT 1000
DELETE FROM <table> WHERE [DateTime] < @TwoYearsAgo
#5
1
I had to do something similar a while back -- make lightweight insert and delete to move old records to an archive table. Although counterintuitive, the fastest and least impactful solution I found was:
我以前也做过类似的事情——创建轻量级的插入和删除,将旧记录移动到归档表。虽然有违直觉,但我找到的最快、影响最小的解决方案是:
-
Make a small #temp table with the values of IDs for the top (x) rows. If ID really can't be indexed in your scenario, you might use date AND ID instead, so the combination of the two can use an index.
创建一个带有顶部(x)行id值的小型#temp表。如果ID在您的场景中确实不能被索引,那么您可以使用date和ID,因此两者的组合可以使用索引。
-
begin tran
开始tran
-
Insert into archive table where ID and DATE in ( #temp )
插入到归档表中,其中ID和日期在(#temp)中
-
Delete from main table where ID and DATE in ( #temp )
从主表中删除ID和日期(#temp)
-
commit
提交
-
Truncate #temp
截断#临时
-
Repeat
重复
Having the temp table to stage the row identifiers is more total work than a straight delete, but makes the process very lightweight in cases where you want to just chip away a little at a time without blocking.
让temp表处理行标识符比直接删除更需要大量的工作,但是在您希望每次只处理一点而不阻塞的情况下,这会使进程非常轻量级。
Also I agree with Lasse - can't see the point of a unique id with no index, and therefore no constraint, to enforce it.
我也同意Lasse的观点——不能看到唯一id的点,没有索引,因此没有约束来执行它。
#6
0
I wonder whether you must stick with the 1000 record chunk requirement. If it is there for the reason of server load and kind of arbitrary, you may want to try the following, since you already have a clustered index on [DateTime]:
我想知道你是否必须坚持1000记录块的要求。如果它存在是因为服务器负载的原因,或者出于某种随意性,您可能想尝试以下方法,因为您已经在[DateTime]上有一个集群索引:
DELETE FROM <table>
WHERE [DateTime] < @TwoYearsAgo
and [DateTime] < (select dateadd(day, 1, min([DateTime])) from <table>)
#7
0
For backward compatibility, the parentheses are optional in SELECT statements. We recommend that you always use parentheses for TOP in SELECT statements for consistency with its required use in INSERT
, UPDATE
, MERGE
, and DELETE
statements in which the parentheses are required.
对于向后兼容性,在SELECT语句中括号是可选的。我们建议您在SELECT语句中始终使用圆括号作为顶部,以保持其在INSERT、UPDATE、MERGE和DELETE语句中所必需的一致性。
USE AdventureWorks;
GO
DELETE TOP (20)
FROM Purchasing.PurchaseOrderDetail
WHERE DueDate < '20120701';
GO