I have an SQL Server 2005 database, and I tried putting indexes on the appropriate fields in order to speed up the DELETE
of records from a table with millions of rows (big_table
has only 3 columns), but now the DELETE
execution time is even longer! (1 hour versus 13 min for example)
我有一个SQL Server 2005数据库,我尝试在相应的字段上放置索引,以便加速从具有数百万行的表中删除记录(big_table只有3列),但现在DELETE执行时间更长! (例如1小时对比13分钟)
I have a relationship between to tables, and the column that I filter my DELETE
by is in the other table. For example
我与表之间有关系,而我过滤DELETE的列在另一个表中。例如
DELETE FROM big_table
WHERE big_table.id_product IN (
SELECT small_table.id_product FROM small_table
WHERE small_table.id_category = 1)
Btw, I've also tried:
顺便说一句,我也尝试过:
DELETE FROM big_table
WHERE EXISTS
(SELECT 1 FROM small_table
WHERE small_table.id_product = big_table.id_product
AND small_table.id_category = 1)
and while it seems to run slightly faster than the first, it's still a lot slower with the indexes than without.
虽然它似乎比第一个运行得稍快,但索引的速度仍然比没有速度慢得多。
I created indexes on these fields:
我在这些字段上创建了索引:
big_table.id_product
small_table.id_product
small_table.id_category
My .ldf file grows a lot during the DELETE
.
我的.ldf文件在DELETE期间增长了很多。
Why are my DELETE
queries slower when I have indexes on my tables? I thought they were supposed to run faster.
当我的表上有索引时,为什么我的DELETE查询会变慢?我以为他们应该跑得更快。
UPDATE
Okay, consensus seems to be indexes will slow down a huge DELETE
becuase the index has to be updated. Although, I still don't understand why it can't DELETE
all the rows all at once, and just update the index once at the end.
好的,共识似乎是索引将放慢一个巨大的DELETE因为索引必须更新。虽然,我仍然不明白为什么它不能同时删除所有行,并且最后只更新一次索引。
I was under the impression from some of my reading that indexes would speed up DELETE
by making searches for fields in the WHERE
clause faster.
我的一些阅读印象中,索引会通过更快地搜索WHERE子句中的字段来加速DELETE。
"Indexes work just as well when searching for a record in DELETE and UPDATE commands as they do for SELECT statements."
“在DELETE和UPDATE命令中搜索记录时,索引的工作方式也与SELECT语句一样。”
But later in the article, it says that too many indexes can hurt performance.
但是在文章的后面,它说太多的索引会影响性能。
Answers to bobs questions:
鲍勃问题的答案:
- 55 million rows in table
- 42 million rows being deleted
- Similar
SELECT
statement would not run (Exception of type 'System.OutOfMemoryException' was thrown)
表中有5500万行
删除了4200万行
类似的SELECT语句不会运行(抛出类型'System.OutOfMemoryException'的异常)
I tried the following 2 queries:
我尝试了以下2个查询:
SELECT * FROM big_table
WHERE big_table.id_product IN (
SELECT small_table.id_product FROM small_table
WHERE small_table.id_category = 1)
SELECT * FROM big_table
INNER JOIN small_table
ON small_table.id_product = big_table.id_product
WHERE small_table.id_category = 1
Both failed after running for 25 min with this error message from SQL Server 2005:
使用SQL Server 2005的此错误消息运行25分钟后,两者都失败了:
An error occurred while executing batch. Error message is: Exception of type 'System.OutOfMemoryException' was thrown.
The database server is an older dual core Xeon machine with 7.5 GB ram. It's my toy test database :) so it's not running anything else.
数据库服务器是一台较旧的双核Xeon机器,内存为7.5 GB。这是我的玩具测试数据库:)所以它没有运行任何其他东西。
Do I need to do something special with my indexes after I CREATE
them to make them work properly?
在创建它们之后,我是否需要对索引执行一些特殊操作才能使它们正常工作?
5 个解决方案
#1
27
Indexes make lookups faster - like the index at the back of a book.
索引使查找更快 - 就像书后面的索引一样。
Operations that change the data (like a DELETE) are slower, as they involve manipulating the indexes. Consider the same index at the back of the book. You have more work to do if you add, remove or change pages because you have to also update the index.
更改数据的操作(如DELETE)速度较慢,因为它们涉及操纵索引。考虑本书后面的相同索引。如果添加,删除或更改页面,还有更多工作要做,因为您还必须更新索引。
#2
2
I Agree with Bobs comment above - if you are deleting large volumes of data from large tables deleting the indices can take a while on top of deleting the data its the cost of doing business though. As it deletes all the data out you are causing reindexing events to happen.
我同意上面的Bobs评论 - 如果你要从大型表中删除大量数据,删除索引可能需要一段时间才能删除数据,而不是开展业务的成本。因为它删除了所有数据,导致重建索引事件发生。
With regards to the logfile growth; if you arent doing anything with your logfiles you could switch to Simple logging; but i urge you to read up on the impact that might have on your IT department before you change.
关于日志文件的增长;如果你没有对你的日志文件做任何事情,你可以切换到简单日志记录;但我建议您在更改之前了解可能对您的IT部门产生的影响。
If you need to do the delete in real time; its often a good work around to flag the data as inactive either directly on the table or in another table and exclude that data from queries; then come back later and delete the data when the users aren't staring at an hourglass. There is a second reason for covering this; if you are deleting lots of data out of the table (which is what i am supposing based on your logfile issue) then you will likely want to do an indexdefrag to reorgnaise the index; doing that out of hours is the way to go if you dont like users on the phone !
如果你需要实时删除;它通常是一个很好的工作,可以直接在表或另一个表中将数据标记为非活动状态,并从查询中排除该数据;然后回来并在用户不盯着沙漏时删除数据。覆盖这个的第二个原因;如果你要从表中删除大量数据(这是我根据你的日志文件问题所假设的)那么你可能想要做一个indexdefrag来重新生成索引;如果您不喜欢手机上的用户,那么在非工作时间这样做是可行的!
#3
1
JohnB is deleting about 75% of the data. I think the following would have been a possible solution and probably one of the faster ones. Instead of deleting the data, create a new table and insert the data that you need to keep. Create the indexes on that new table after inserting the data. Now drop the old table and rename the new one to the same name as the old one.
JohnB正在删除大约75%的数据。我认为以下可能是一个可能的解决方案,可能是更快的解决方案之一。创建新表并插入需要保留的数据,而不是删除数据。插入数据后在该新表上创建索引。现在删除旧表并将新表重命名为与旧表相同的名称。
The above of course assumes that sufficient disk space is available to temporarily store the duplicated data.
以上当然假设有足够的磁盘空间可用于临时存储重复数据。
#4
0
You can also try TSQL extension to DELETE syntax and check whether it improves performance:
您还可以尝试使用TSQL扩展来删除语法并检查它是否可以提高性能:
DELETE FROM big_table
FROM big_table AS b
INNER JOIN small_table AS s ON (s.id_product = b.id_product)
WHERE s.id_category =1
#5
0
Try something like this to avoid bulk delete (and thereby avoid log file growth)
尝试这样的事情以避免批量删除(从而避免日志文件增长)
declare @continue bit = 1
-- delete all ids not between starting and ending ids
while @continue = 1
begin
set @continue = 0
delete top (10000) u
from <tablename> u WITH (READPAST)
where <condition>
if @@ROWCOUNT > 0
set @continue = 1
end
#1
27
Indexes make lookups faster - like the index at the back of a book.
索引使查找更快 - 就像书后面的索引一样。
Operations that change the data (like a DELETE) are slower, as they involve manipulating the indexes. Consider the same index at the back of the book. You have more work to do if you add, remove or change pages because you have to also update the index.
更改数据的操作(如DELETE)速度较慢,因为它们涉及操纵索引。考虑本书后面的相同索引。如果添加,删除或更改页面,还有更多工作要做,因为您还必须更新索引。
#2
2
I Agree with Bobs comment above - if you are deleting large volumes of data from large tables deleting the indices can take a while on top of deleting the data its the cost of doing business though. As it deletes all the data out you are causing reindexing events to happen.
我同意上面的Bobs评论 - 如果你要从大型表中删除大量数据,删除索引可能需要一段时间才能删除数据,而不是开展业务的成本。因为它删除了所有数据,导致重建索引事件发生。
With regards to the logfile growth; if you arent doing anything with your logfiles you could switch to Simple logging; but i urge you to read up on the impact that might have on your IT department before you change.
关于日志文件的增长;如果你没有对你的日志文件做任何事情,你可以切换到简单日志记录;但我建议您在更改之前了解可能对您的IT部门产生的影响。
If you need to do the delete in real time; its often a good work around to flag the data as inactive either directly on the table or in another table and exclude that data from queries; then come back later and delete the data when the users aren't staring at an hourglass. There is a second reason for covering this; if you are deleting lots of data out of the table (which is what i am supposing based on your logfile issue) then you will likely want to do an indexdefrag to reorgnaise the index; doing that out of hours is the way to go if you dont like users on the phone !
如果你需要实时删除;它通常是一个很好的工作,可以直接在表或另一个表中将数据标记为非活动状态,并从查询中排除该数据;然后回来并在用户不盯着沙漏时删除数据。覆盖这个的第二个原因;如果你要从表中删除大量数据(这是我根据你的日志文件问题所假设的)那么你可能想要做一个indexdefrag来重新生成索引;如果您不喜欢手机上的用户,那么在非工作时间这样做是可行的!
#3
1
JohnB is deleting about 75% of the data. I think the following would have been a possible solution and probably one of the faster ones. Instead of deleting the data, create a new table and insert the data that you need to keep. Create the indexes on that new table after inserting the data. Now drop the old table and rename the new one to the same name as the old one.
JohnB正在删除大约75%的数据。我认为以下可能是一个可能的解决方案,可能是更快的解决方案之一。创建新表并插入需要保留的数据,而不是删除数据。插入数据后在该新表上创建索引。现在删除旧表并将新表重命名为与旧表相同的名称。
The above of course assumes that sufficient disk space is available to temporarily store the duplicated data.
以上当然假设有足够的磁盘空间可用于临时存储重复数据。
#4
0
You can also try TSQL extension to DELETE syntax and check whether it improves performance:
您还可以尝试使用TSQL扩展来删除语法并检查它是否可以提高性能:
DELETE FROM big_table
FROM big_table AS b
INNER JOIN small_table AS s ON (s.id_product = b.id_product)
WHERE s.id_category =1
#5
0
Try something like this to avoid bulk delete (and thereby avoid log file growth)
尝试这样的事情以避免批量删除(从而避免日志文件增长)
declare @continue bit = 1
-- delete all ids not between starting and ending ids
while @continue = 1
begin
set @continue = 0
delete top (10000) u
from <tablename> u WITH (READPAST)
where <condition>
if @@ROWCOUNT > 0
set @continue = 1
end