SQL DELETE - 最大行数

时间:2022-11-25 22:15:52

What limit should be placed on the number of rows to delete in a SQL statement?

对SQL语句中要删除的行数应该有什么限制?

We need to delete from 1 to several hundred thousand rows and need to apply some sort of best practise limit in order to not absolutely kill the SQL server or fill up the logs every time we empty a waste-basket.

我们需要删除1到数十万行,并且需要应用某种最佳实践限制,以便在每次清空废纸篓时都不会完全杀死SQL服务器或填满日志。

This question is not specific to any type of database.

此问题并非特定于任何类型的数据库。

5 个解决方案

#1


12  

That's a very very broad question that basically boils down to "it depends". The factors that influence it include:

这是一个非常广泛的问题,基本上归结为“它取决于”。影响它的因素包括:

  • What is your level of concurrency? A delete statement places an exclusive lock on affected rows. Depending on the databse engine, deleted data distribution, etc., that could escalate to page or entire table. Can your data readers afford to be blocked for the duration of the delete?

    你的并发水平是多少? delete语句对受影响的行放置独占锁。取决于数据库引擎,已删除的数据分布等,可能会升级到页面或整个表。您的数据读取器是否可以在删除期间被阻止?

  • How complex is the delete statement? How many other tables are you joining to, or are there complex WHERE clauses? Sometimes the identification of rows to delete can be more "expensive" than the delete itself, so one big delete may be "cheaper".

    删除语句有多复杂?你加入了多少个其他表,或者是否有复杂的WHERE子句?有时,删除行的标识可能比删除本身更“昂贵”,因此一个大删除可能“更便宜”。

  • Are you fearful about deadlocks? As you decrease the size of your delete, your deadlock "foot print" is reduced. Ideally, single-row deletes will always succeed.

    你害怕僵局吗?当您减小删除的大小时,您的死锁“足迹”会减少。理想情况下,单行删除将始终成功。

  • Do you care about throughput performance? As with any SQL statement, there is a generally constant amount of overhead (connection stuff, query parsing, returning results, etc.). From a single-connection point of view, a 1000-line delete will be faster than 1000 x 1-line deletes.

    您关心吞吐量性能吗?与任何SQL语句一样,通常有一定量的开销(连接内容,查询解析,返回结果等)。从单连接的角度来看,1000行删除将比1000 x 1行删除更快。

  • Don't forget about index maintenance overhead, fragmentation cleanup, or any triggers. They can also affect your system.

    不要忘记索引维护开销,碎片清理或任何触发器。它们也会影响您的系统。

In general, though, I benchmark at 1000-lines per statement. Most systems I've worked with (sub-"enterprise") end up with a sweet-spot between 500 and 5000 records per delete. I like to do something like this:

但总的来说,我的每个语句的基线为1000行。我使用过的大多数系统(子“企业”)最终都有每个删除500到5000个记录的甜点。我喜欢做这样的事情:

set rowcount 500

select 1    -- Just to force @@rowcount > 0
while @@ROWCOUNT > 0
delete from [table]
     [where ...]

#2


4  

Though limiting the number of rows affected by your delete using the set rowcount option and then performing a loop is very good (and I've used it many a time before), be aware that from SQL 2012 onwards this will not be an option (see BOL).

虽然使用set rowcount选项限制删除影响的行数然后执行循环是非常好的(我之前已经使用过很多次),请注意从SQL 2012开始,这不是一个选项(见BOL)。

Therefore, another option may be to limit the number of rows being deleted using the TOP clause. i.e.

因此,另一个选项可能是使用TOP子句限制要删除的行数。即

SELECT 1

WHILE @@ROWCOUNT > 0
BEGIN
    DELETE TOP (#)
    FROM mytable
    [WHERE ...]
END

#3


1  

Unless you have a lot of triggers or integrity constraints to verify, deletion shouldn't be that expensive an operation.

除非您有很多要验证的触发器或完整性约束,否则删除操作不应该是那么昂贵。

But if you're that concerned about performance, my initial hunch would be to mark the appropriate rows as deleted and then physically delete them later during a periodic cleanup. But I'm not a big fan of this because you'll have to change any queries on that table to exclude logically- but not physically-deleted rows.

但是如果你担心性能问题,那么我最初的预感就是将相应的行标记为已删除,然后在定期清理过程中将其物理删除。但我不是这个的忠实粉丝,因为您必须更改该表上的任何查询以排除逻辑但不是物理删除的行。

#4


1  

Whenever I see a database that routinely deletes large amounts of rows in bulk, it makes me think the data model or processing design is not optimal. Why load 1 million rows and then delete them? If you need to do something like purge historical data, then consider table partitioning.

每当我看到一个数据库经常批量删除大量行时,我就会认为数据模型或处理设计不是最优的。为什么要加载100万行然后删除它们?如果您需要执行清除历史数据等操作,请考虑表分区。

#5


0  

a general answer is to drop the table and re-create it, that is a good performing solution, but applies for the full table

一般的答案是删除表并重新创建它,这是一个表现良好的解决方案,但适用于整个表

#1


12  

That's a very very broad question that basically boils down to "it depends". The factors that influence it include:

这是一个非常广泛的问题,基本上归结为“它取决于”。影响它的因素包括:

  • What is your level of concurrency? A delete statement places an exclusive lock on affected rows. Depending on the databse engine, deleted data distribution, etc., that could escalate to page or entire table. Can your data readers afford to be blocked for the duration of the delete?

    你的并发水平是多少? delete语句对受影响的行放置独占锁。取决于数据库引擎,已删除的数据分布等,可能会升级到页面或整个表。您的数据读取器是否可以在删除期间被阻止?

  • How complex is the delete statement? How many other tables are you joining to, or are there complex WHERE clauses? Sometimes the identification of rows to delete can be more "expensive" than the delete itself, so one big delete may be "cheaper".

    删除语句有多复杂?你加入了多少个其他表,或者是否有复杂的WHERE子句?有时,删除行的标识可能比删除本身更“昂贵”,因此一个大删除可能“更便宜”。

  • Are you fearful about deadlocks? As you decrease the size of your delete, your deadlock "foot print" is reduced. Ideally, single-row deletes will always succeed.

    你害怕僵局吗?当您减小删除的大小时,您的死锁“足迹”会减少。理想情况下,单行删除将始终成功。

  • Do you care about throughput performance? As with any SQL statement, there is a generally constant amount of overhead (connection stuff, query parsing, returning results, etc.). From a single-connection point of view, a 1000-line delete will be faster than 1000 x 1-line deletes.

    您关心吞吐量性能吗?与任何SQL语句一样,通常有一定量的开销(连接内容,查询解析,返回结果等)。从单连接的角度来看,1000行删除将比1000 x 1行删除更快。

  • Don't forget about index maintenance overhead, fragmentation cleanup, or any triggers. They can also affect your system.

    不要忘记索引维护开销,碎片清理或任何触发器。它们也会影响您的系统。

In general, though, I benchmark at 1000-lines per statement. Most systems I've worked with (sub-"enterprise") end up with a sweet-spot between 500 and 5000 records per delete. I like to do something like this:

但总的来说,我的每个语句的基线为1000行。我使用过的大多数系统(子“企业”)最终都有每个删除500到5000个记录的甜点。我喜欢做这样的事情:

set rowcount 500

select 1    -- Just to force @@rowcount > 0
while @@ROWCOUNT > 0
delete from [table]
     [where ...]

#2


4  

Though limiting the number of rows affected by your delete using the set rowcount option and then performing a loop is very good (and I've used it many a time before), be aware that from SQL 2012 onwards this will not be an option (see BOL).

虽然使用set rowcount选项限制删除影响的行数然后执行循环是非常好的(我之前已经使用过很多次),请注意从SQL 2012开始,这不是一个选项(见BOL)。

Therefore, another option may be to limit the number of rows being deleted using the TOP clause. i.e.

因此,另一个选项可能是使用TOP子句限制要删除的行数。即

SELECT 1

WHILE @@ROWCOUNT > 0
BEGIN
    DELETE TOP (#)
    FROM mytable
    [WHERE ...]
END

#3


1  

Unless you have a lot of triggers or integrity constraints to verify, deletion shouldn't be that expensive an operation.

除非您有很多要验证的触发器或完整性约束,否则删除操作不应该是那么昂贵。

But if you're that concerned about performance, my initial hunch would be to mark the appropriate rows as deleted and then physically delete them later during a periodic cleanup. But I'm not a big fan of this because you'll have to change any queries on that table to exclude logically- but not physically-deleted rows.

但是如果你担心性能问题,那么我最初的预感就是将相应的行标记为已删除,然后在定期清理过程中将其物理删除。但我不是这个的忠实粉丝,因为您必须更改该表上的任何查询以排除逻辑但不是物理删除的行。

#4


1  

Whenever I see a database that routinely deletes large amounts of rows in bulk, it makes me think the data model or processing design is not optimal. Why load 1 million rows and then delete them? If you need to do something like purge historical data, then consider table partitioning.

每当我看到一个数据库经常批量删除大量行时,我就会认为数据模型或处理设计不是最优的。为什么要加载100万行然后删除它们?如果您需要执行清除历史数据等操作,请考虑表分区。

#5


0  

a general answer is to drop the table and re-create it, that is a good performing solution, but applies for the full table

一般的答案是删除表并重新创建它,这是一个表现良好的解决方案,但适用于整个表