Say I have a simple table that has the following fields:
假设我有一个简单的表,它有以下字段:
- ID: int, autoincremental (identity), primary key
- ID: int,自动增量(标识),主键
- Name: varchar(50), unique, has unique index
- 名称:varchar(50),唯一,有唯一索引
- Tag: int
- 标签:int
I never use the ID field for lookup, because my application is always based on working with the Name field.
我从不使用ID字段进行查找,因为我的应用程序总是基于使用Name字段。
I need to change the Tag value from time to time. I'm using the following trivial SQL code:
我需要不时地更改标记值。我正在使用以下简单的SQL代码:
UPDATE Table SET Tag = XX WHERE Name = YY;
I wondered if anyone knows whether the above is always faster than:
我想知道,是否有人知道上述情况是否总是比:
DELETE FROM Table WHERE Name = YY;
INSERT INTO Table (Name, Tag) VALUES (YY, XX);
Again - I know that in the second example the ID is changed, but it does not matter for my application.
再说一遍——我知道在第二个示例中ID被更改了,但这对我的应用程序来说并不重要。
14 个解决方案
#1
46
A bit too late with this answer, but since I faced a similar question, I made a test with JMeter and a MySQL server on same machine, where I have used:
这个答案有点太迟了,但是由于我遇到了类似的问题,我在同一台机器上对JMeter和MySQL服务器进行了测试,我使用的是:
- A transaction Controller (generating parent sample) that contained two JDBC Requests: a Delete and an Insert statement
- 一个事务控制器(生成父样本)包含两个JDBC请求:一个Delete和一个Insert语句。
- A sepparate JDBC Request containing the Update statement.
- 包含Update语句的sepparate JDBC请求。
After running the test for 500 loops, I have obtained the following results:
在运行了500个循环的测试后,我得到了以下结果:
DEL + INSERT - Average: 62ms
DEL + INSERT - Average: 62ms
Update - Average: 30ms
更新——平均:30 ms
结果:
#2
28
The bigger the table (number of and size of columns) the more expensive it becomes to delete and insert rather than update. Because you have to pay the price of UNDO and REDO. DELETEs consume more UNDO space than UPDATEs, and your REDO contains twice as many statements as are necessary.
表越大(列的数量和大小),删除和插入而不是更新的成本就越高。因为你必须付出撤销和重做的代价。删除使用的撤销空间比更新要多,而REDO包含的语句是必要的两倍。
Besides, it is plain wrong from a business point of view. Consider how much harder it would be to understand a notional audit trail on that table.
此外,从商业角度来看,这显然是错误的。考虑一下,要理解表格上的名义审计记录有多难。
There are some scenarios involving bulk updates of all the rows in a table where it is faster to create a new table using CTAS from the old table (applying the update in the the projection of the SELECT clause), dropping the old table and renaming the new table. The side-effects are creating indexes, managing constraints and renewing privileges, but it is worth considering.
有一些场景涉及对表中的所有行进行批量更新,在这些场景中,使用旧表中的CTAS创建新表会更快(在SELECT子句的投影中应用update),删除旧表并重命名新表。副作用是创建索引、管理约束和更新特权,但值得考虑。
#3
12
One command on the same row should always be faster than two on that same row. So the UPDATE only would be better.
同一行上的一个命令应该总是比同一行上的两个命令快。所以更新只会更好。
EDIT set up the table:
编辑设置表格:
create table YourTable
(YourName varchar(50) primary key
,Tag int
)
insert into YourTable values ('first value',1)
run this, which takes 1 second on my system (sql server 2005):
运行这个,在我的系统上花费1秒(sql server 2005):
SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
UPDATE YourTable set YourName='new name'
while @x<10000
begin
Set @x=@x+1
update YourTable set YourName='new name' where YourName='new name'
SET @y=@y+@@ROWCOUNT
end
print @y
run this, which took 2 seconds on my system:
运行这个,在我的系统上花了2秒:
SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
while @x<10000
begin
Set @x=@x+1
DELETE YourTable WHERE YourName='new name'
insert into YourTable values ('new name',1)
SET @y=@y+@@ROWCOUNT
end
print @y
#4
8
I am afraid the body of your question is unrelated to title question.
恐怕你问题的主体与题目题无关。
If to answer the title:
如果要回答这个题目:
In SQL, is UPDATE always faster than DELETE+INSERT?
在SQL中,更新总是比删除+插入快吗?
then answer is NO!
答案是否定的!
Just google for
就谷歌
- "Expensive direct update"* "sql server"
- “昂贵的直接更新”*“sql server”
- "deferred update"* "sql server"
- *“延迟更新sql服务器”
Such update(s) result in more costly (more processing) realization of update through insert+update than direct insert+update. These are the cases when
这样的更新导致通过插入+更新实现更新比直接插入+更新更昂贵(处理更多)。这些都是当时的情况
- one updates the field with unique (or primary) key or
- 使用唯一(或主)键或
- when the new data does not fit (is bigger) in the pre-update row space allocated (or even maximum row size),resulting in fragmentation,
- 当新数据不适合(更大)在预先更新的行空间中分配(甚至最大的行大小)时,导致碎片化,
- etc.
- 等。
My fast (non-exhaustive) search, not pretending to be covering one, gave me [1], [2]
我的快速(非彻底的)搜索,没有假装覆盖一个,给了我[1],[2]
[1]
Update Operations
(Sybase® SQL Server Performance and Tuning Guide
Chapter 7: The SQL Server Query Optimizer)
http://www.lcard.ru/~nail/sybase/perf/11500.htm
[2]
UPDATE Statements May be Replicated as DELETE/INSERT Pairs
http://support.microsoft.com/kb/238254
[1]更新操作(Sybase®SQL服务器性能和调优指南第7章:SQL服务器查询优化器)http://www.lcard.ru/钉/ Sybase /性能/ 11500. htm[2]Update语句可以复制删除/插入对http://support.microsoft.com/kb/238254
#5
5
Just tried updating 43 fields on a table with 44 fields, the remaining field was the primary clustered key.
刚刚在一个包含44个字段的表中更新了43个字段,剩下的字段是主聚集键。
The update took 8 seconds.
更新时间为8秒。
A Delete + Insert is faster than the minimum time interval that the "Client Statistics" reports via SQL Management Studio.
删除+插入比“客户端统计信息”通过SQL Management Studio报告的最小时间间隔要快。
Peter
彼得
MS SQL 2008
MS SQL 2008
#6
4
Keep in mind the actual fragmentation that occurs when DELETE+INSERT is issued opposed to a correctly implemented UPDATE will make great difference by time.
请记住,在发布DELETE+INSERT与正确实现的更新相反时发生的实际碎片化会随着时间的推移而产生巨大的差异。
Thats why, for instance, REPLACE INTO that MySQL implements is discouraged as opposed to using the INSERT INTO ... ON DUPLICATE KEY UPDATE ... syntax.
这就是为什么,例如,替换MySQL实现的方法是不鼓励的,而不是使用INSERT INTO…对于重复的密钥更新…语法。
#7
3
In your case, I believe the update will be faster.
就你而言,我相信更新会更快。
Remember indexes!
记得索引!
You have defined a primary key, it will likely automatically become a clustered index (at least SQL Server does so). A cluster index means the records are physically laid on the disk according to the index. DELETE operation itself won't cause much trouble, even after one record goes away, the index stays correct. But when you INSERT a new record, the DB engine will have to put this record in the correct location which under circumstances will cause some "reshuffling" of the old records to "make place" for a new one. There where it will slow down the operation.
您已经定义了一个主键,它很可能会自动成为一个聚集索引(至少SQL Server这样做)。集群索引意味着根据索引将记录物理地放在磁盘上。删除操作本身不会造成太大的麻烦,即使一条记录消失了,索引仍然保持正确。但是,当您插入一个新记录时,DB引擎将不得不将该记录放在正确的位置,在这种情况下,旧记录将会“重新洗牌”,以便为新记录“让位”。在那里它会减慢操作。
An index (especially clustered) works best if the values are ever increasing, so the new records just get appended to the tail. Maybe you can add an extra INT IDENTITY column to become a clustered index, this will simplify insert operations.
如果值不断增加,则索引(特别是集群)的工作效果最好,因此新记录只会被附加到尾部。也许您可以添加一个额外的INT标识列来作为集群索引,这将简化插入操作。
#8
3
What if you have a few million rows. Each row starts with one piece of data, perhaps a client name. As you collect data for clients, their entries must be updated. Now, let's assume that the collection of client data is distributed across numerous other machines from which it is later collected and put into the database. If each client has unique information, then you would not be able to perform a bulk update; i.e., there is no where-clause criteria for you to use to update multiple clients in one shot. On the other hand, you could perform bulk inserts. So, the question might be better posed as follows: Is it better to perform millions of single updates, or is it better to compile them into large bulk deletes and inserts. In other words, instead of "update [table] set field=data where clientid=123" a milltion times, you do 'delete from [table] where clientid in ([all clients to be updated]);insert into [table] values (data for client1), (data for client2), etc'
如果你有几百万行。每一行从一段数据开始,可能是客户端名称。当您为客户端收集数据时,必须更新它们的条目。现在,让我们假设客户端数据的收集分布在许多其他机器上,然后从这些机器中收集数据并将其放入数据库。如果每个客户端都有唯一的信息,那么您将无法执行批量更新;即。,您不需要使用where子句标准一次性更新多个客户端。另一方面,您可以执行批量插入。因此,这个问题可以更好地提出如下:是执行数百万个单个更新更好,还是将它们编译为大型批量删除和插入更好。换句话说,代替“更新[表]set字段=数据,clientid=123”的计算时间,您可以从[表]中删除clientid([所有客户端更新]);插入[表]值(client1的数据),(client2的数据)等等。
Is either choice better than the other, or are you screwed both ways?
你的选择是好是坏?
#9
3
Delete + Insert is almost always faster because an Update has way more steps involved.
删除+插入几乎总是更快,因为更新涉及的步骤更多。
Update:
更新:
- Look for the row using PK.
- 使用PK查找行。
- Read the row from disk.
- 从磁盘读取行。
- Check for which values have changed
- 检查哪些值已经更改。
- Raise the onUpdate Trigger with populated :NEW and :OLD variables
- 使用填充的:NEW和:OLD变量引发onUpdate触发器
-
Write New variables to disk (The entire row)
向磁盘(整行)写入新变量
(This repeats for every row you're updating)
(对更新的每一行重复此操作)
Delete + Insert:
删除+插入:
- Mark rows as deleted (Only in the PK).
- 将行标记为已删除(仅在PK中)。
- Insert new rows at the end of the table.
- 在表的末尾插入新的行。
-
Update PK Index with locations of new records.
用新记录的位置更新PK索引。
(This doesn't repeat, all can be perfomed in a single block of operation).
(这不会重复,所有操作都可以在一个操作块中完成)。
Using Insert + Delete will fragment your File System, but not that fast. Doing a lazy optimization on the background will allways free unused blocks and pack the table altogether.
使用Insert + Delete将对文件系统进行分段,但不会很快。在后台做一个懒惰的优化,将会避免使用未使用的块,并将整个表打包。
#10
2
Obviously, the answer varies based on what database you are using, but UPDATE can always be implemented faster than DELETE+INSERT. Since in-memory ops are mostly trivial anyways, given a hard-drive based database, an UPDATE can change a database field in-place on the hdd, while a delete would remove a row (leaving an empty space), and insert a new row, perhaps to the end of the table (again, it's all in the implementation).
显然,根据您使用的数据库,答案是不同的,但是更新的实现速度总是比DELETE+INSERT要快。从内存中运维主要是微不足道的无论如何,给定一个硬盘的基础数据库,更新可以改变一个数据库字段就地硬盘,而删除就删除一行(留下一个空的空间),并插入一个新行,也许到最后的表(再一次,这是所有的实现)。
The other, minor, issue is that when you UPDATE a single variable in a single row, the other columns in that row remain the same. If you DELETE and then do an INSERT, you run the risk of forgetting about other columns and consequently leaving them behind (in which case you would have to do a SELECT before your DELETE to temporarily store your other columns before writing them back with INSERT).
另一个小问题是,当您更新单个行中的单个变量时,该行中的其他列保持不变。如果您删除并执行插入操作,那么您就有可能忘记其他列,因此将它们抛在后面(在这种情况下,您必须在删除之前进行选择,以暂时存储其他列,然后再将它们以INSERT的形式返回)。
#11
1
It depends on the product. A product could be implemented that (under the covers) converts all UPDATEs into a (transactionally wrapped) DELETE and INSERT. Provided the results are consistent with the UPDATE semantics.
这取决于产品。可以实现将所有更新转换为(事务包装的)删除和插入的产品。提供的结果与更新语义一致。
I'm not saying I'm aware of any product that does this, but it's perfectly legal.
我不是说我知道有任何产品可以做到这一点,但这是完全合法的。
#12
1
Every write to the database has lots of potential side effects.
每个写入数据库都有很多潜在的副作用。
Delete: a row must be removed, indexes updated, foreign keys checked and possibly cascade-deleted, etc. Insert: a row must be allocated - this might be in place of a deleted row, might not be; indexes must be updated, foreign keys checked, etc. Update: one or more values must be updated; perhaps the row's data no longer fits into that block of the database so more space must be allocated, which may cascade into multiple blocks being re-written, or lead to fragmented blocks; if the value has foreign key constraints they must be checked, etc.
删除:一行必须删除,索引更新,外键检查,可能被删除,等等。插入:必须分配一行——这可能是删除行,可能不是;必须更新索引、检查外键等。更新:必须更新一个或多个值;也许行数据不再适合这个数据库块,因此必须分配更多的空间,这可能导致多个块被重新写入,或者导致碎片块;如果值有外键约束,则必须检查它们。
For a very small number of columns or if the whole row is updated Delete+insert might be faster, but the FK constraint problem is a big one. Sure, maybe you have no FK constraints now, but will that always be true? And if you have a trigger it's easier to write code that handles updates if the update operation is truly an update.
对于非常少的列,或者如果整个行被更新,删除+插入可能会更快,但是FK约束问题是一个很大的问题。当然,也许你现在没有FK约束,但这是真的吗?如果您有一个触发器,那么如果更新操作是真正的更新,那么编写处理更新的代码会更容易。
Another issue to think about is that sometimes inserting and deleting hold different locks than updating. The DB might lock the entire table while you are inserting or deleting, as opposed to just locking a single record while you are updating that record.
另一个要考虑的问题是,有时插入和删除持有不同于更新的锁。DB可能在插入或删除时锁定整个表,而不是在更新记录时锁定单个记录。
In the end, I'd suggest just updating a record if you mean to update it. Then check your DB's performance statistics and the statistics for that table to see if there are performance improvements to be made. Anything else is premature.
最后,我建议您更新一个记录,如果您想更新它的话。然后检查DB的性能统计信息和该表的统计信息,看看是否有性能改进。别的还为时过早。
An example from the ecommerce system I work on: We were storing credit-card transaction data in the database in a two-step approach: first, write a partial transaction to indicate that we've started the process. Then, when the authorization data is returned from the bank update the record. We COULD have deleted then re-inserted the record but instead we just used update. Our DBA told us that the table was fragmented because the DB was only allocating a small amount of space for each row, and the update caused block-chaining since it added a lot of data. However, rather than switch to DELETE+INSERT we just tuned the database to always allocate the whole row, this means the update could use the pre-allocated empty space with no problems. No code change required, and the code remains simple and easy to understand.
我所研究的电子商务系统中的一个例子是:我们以两步的方式在数据库中存储信用卡事务数据:首先,编写一个部分事务来表明我们已经启动了这个过程。然后,当从银行返回授权数据时,更新记录。我们可以删除然后重新插入记录,但是我们只是使用update。我们的DBA告诉我们,表是分段的,因为DB只分配了每一行的一小部分空间,而更新导致了块链接,因为它添加了大量的数据。然而,我们并没有切换到DELETE+INSERT,而是对数据库进行了调优,以始终分配整个行,这意味着更新可以使用预先分配的空空间,没有问题。不需要任何代码更改,并且代码仍然简单且易于理解。
#13
1
The question of speed is irrelevant without a specific speed problem.
在没有特定速度问题的情况下,速度问题是无关紧要的。
If you are writing SQL code to make a change to an existing row, you UPDATE it. Anything else is incorrect.
如果您正在编写SQL代码以对现有行进行更改,则需要更新它。什么是不正确的。
If you're going to break the rules of how code should work, then you'd better have a damn good, quantified reason for it, and not a vague idea of "This way is faster", when you don't have any idea what "faster" is.
如果你想要打破代码应该如何工作的规则,那么你最好有一个好的,量化的原因,而不是一个模糊的概念“这样更快”,当你不知道什么是“更快”的时候。
#14
0
In specific cases, Delete+Insert would save you time. I have a table that has 30000 odd rows and there is a daily update/insert of these records using a data file. The upload process generates 95% of update statements as the records are already there and 5% of inserts for ones that do not exist. Alternatively, uploading the data file records into a temp table, deletion of the destination table for records in the temp table followed by insertion of the same from the temp table has shown 50% gain in time.
在特定的情况下,删除+插入可以节省时间。我有一个表,它有30000多行,并且每天使用数据文件更新/插入这些记录。上载过程生成95%的更新语句,因为记录已经存在,而不存在的记录生成5%的插入。或者,将数据文件记录上载到临时表中,删除temp表中记录的目标表,然后在temp表中插入相同的记录,显示时间增加了50%。
#1
46
A bit too late with this answer, but since I faced a similar question, I made a test with JMeter and a MySQL server on same machine, where I have used:
这个答案有点太迟了,但是由于我遇到了类似的问题,我在同一台机器上对JMeter和MySQL服务器进行了测试,我使用的是:
- A transaction Controller (generating parent sample) that contained two JDBC Requests: a Delete and an Insert statement
- 一个事务控制器(生成父样本)包含两个JDBC请求:一个Delete和一个Insert语句。
- A sepparate JDBC Request containing the Update statement.
- 包含Update语句的sepparate JDBC请求。
After running the test for 500 loops, I have obtained the following results:
在运行了500个循环的测试后,我得到了以下结果:
DEL + INSERT - Average: 62ms
DEL + INSERT - Average: 62ms
Update - Average: 30ms
更新——平均:30 ms
结果:
#2
28
The bigger the table (number of and size of columns) the more expensive it becomes to delete and insert rather than update. Because you have to pay the price of UNDO and REDO. DELETEs consume more UNDO space than UPDATEs, and your REDO contains twice as many statements as are necessary.
表越大(列的数量和大小),删除和插入而不是更新的成本就越高。因为你必须付出撤销和重做的代价。删除使用的撤销空间比更新要多,而REDO包含的语句是必要的两倍。
Besides, it is plain wrong from a business point of view. Consider how much harder it would be to understand a notional audit trail on that table.
此外,从商业角度来看,这显然是错误的。考虑一下,要理解表格上的名义审计记录有多难。
There are some scenarios involving bulk updates of all the rows in a table where it is faster to create a new table using CTAS from the old table (applying the update in the the projection of the SELECT clause), dropping the old table and renaming the new table. The side-effects are creating indexes, managing constraints and renewing privileges, but it is worth considering.
有一些场景涉及对表中的所有行进行批量更新,在这些场景中,使用旧表中的CTAS创建新表会更快(在SELECT子句的投影中应用update),删除旧表并重命名新表。副作用是创建索引、管理约束和更新特权,但值得考虑。
#3
12
One command on the same row should always be faster than two on that same row. So the UPDATE only would be better.
同一行上的一个命令应该总是比同一行上的两个命令快。所以更新只会更好。
EDIT set up the table:
编辑设置表格:
create table YourTable
(YourName varchar(50) primary key
,Tag int
)
insert into YourTable values ('first value',1)
run this, which takes 1 second on my system (sql server 2005):
运行这个,在我的系统上花费1秒(sql server 2005):
SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
UPDATE YourTable set YourName='new name'
while @x<10000
begin
Set @x=@x+1
update YourTable set YourName='new name' where YourName='new name'
SET @y=@y+@@ROWCOUNT
end
print @y
run this, which took 2 seconds on my system:
运行这个,在我的系统上花了2秒:
SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
while @x<10000
begin
Set @x=@x+1
DELETE YourTable WHERE YourName='new name'
insert into YourTable values ('new name',1)
SET @y=@y+@@ROWCOUNT
end
print @y
#4
8
I am afraid the body of your question is unrelated to title question.
恐怕你问题的主体与题目题无关。
If to answer the title:
如果要回答这个题目:
In SQL, is UPDATE always faster than DELETE+INSERT?
在SQL中,更新总是比删除+插入快吗?
then answer is NO!
答案是否定的!
Just google for
就谷歌
- "Expensive direct update"* "sql server"
- “昂贵的直接更新”*“sql server”
- "deferred update"* "sql server"
- *“延迟更新sql服务器”
Such update(s) result in more costly (more processing) realization of update through insert+update than direct insert+update. These are the cases when
这样的更新导致通过插入+更新实现更新比直接插入+更新更昂贵(处理更多)。这些都是当时的情况
- one updates the field with unique (or primary) key or
- 使用唯一(或主)键或
- when the new data does not fit (is bigger) in the pre-update row space allocated (or even maximum row size),resulting in fragmentation,
- 当新数据不适合(更大)在预先更新的行空间中分配(甚至最大的行大小)时,导致碎片化,
- etc.
- 等。
My fast (non-exhaustive) search, not pretending to be covering one, gave me [1], [2]
我的快速(非彻底的)搜索,没有假装覆盖一个,给了我[1],[2]
[1]
Update Operations
(Sybase® SQL Server Performance and Tuning Guide
Chapter 7: The SQL Server Query Optimizer)
http://www.lcard.ru/~nail/sybase/perf/11500.htm
[2]
UPDATE Statements May be Replicated as DELETE/INSERT Pairs
http://support.microsoft.com/kb/238254
[1]更新操作(Sybase®SQL服务器性能和调优指南第7章:SQL服务器查询优化器)http://www.lcard.ru/钉/ Sybase /性能/ 11500. htm[2]Update语句可以复制删除/插入对http://support.microsoft.com/kb/238254
#5
5
Just tried updating 43 fields on a table with 44 fields, the remaining field was the primary clustered key.
刚刚在一个包含44个字段的表中更新了43个字段,剩下的字段是主聚集键。
The update took 8 seconds.
更新时间为8秒。
A Delete + Insert is faster than the minimum time interval that the "Client Statistics" reports via SQL Management Studio.
删除+插入比“客户端统计信息”通过SQL Management Studio报告的最小时间间隔要快。
Peter
彼得
MS SQL 2008
MS SQL 2008
#6
4
Keep in mind the actual fragmentation that occurs when DELETE+INSERT is issued opposed to a correctly implemented UPDATE will make great difference by time.
请记住,在发布DELETE+INSERT与正确实现的更新相反时发生的实际碎片化会随着时间的推移而产生巨大的差异。
Thats why, for instance, REPLACE INTO that MySQL implements is discouraged as opposed to using the INSERT INTO ... ON DUPLICATE KEY UPDATE ... syntax.
这就是为什么,例如,替换MySQL实现的方法是不鼓励的,而不是使用INSERT INTO…对于重复的密钥更新…语法。
#7
3
In your case, I believe the update will be faster.
就你而言,我相信更新会更快。
Remember indexes!
记得索引!
You have defined a primary key, it will likely automatically become a clustered index (at least SQL Server does so). A cluster index means the records are physically laid on the disk according to the index. DELETE operation itself won't cause much trouble, even after one record goes away, the index stays correct. But when you INSERT a new record, the DB engine will have to put this record in the correct location which under circumstances will cause some "reshuffling" of the old records to "make place" for a new one. There where it will slow down the operation.
您已经定义了一个主键,它很可能会自动成为一个聚集索引(至少SQL Server这样做)。集群索引意味着根据索引将记录物理地放在磁盘上。删除操作本身不会造成太大的麻烦,即使一条记录消失了,索引仍然保持正确。但是,当您插入一个新记录时,DB引擎将不得不将该记录放在正确的位置,在这种情况下,旧记录将会“重新洗牌”,以便为新记录“让位”。在那里它会减慢操作。
An index (especially clustered) works best if the values are ever increasing, so the new records just get appended to the tail. Maybe you can add an extra INT IDENTITY column to become a clustered index, this will simplify insert operations.
如果值不断增加,则索引(特别是集群)的工作效果最好,因此新记录只会被附加到尾部。也许您可以添加一个额外的INT标识列来作为集群索引,这将简化插入操作。
#8
3
What if you have a few million rows. Each row starts with one piece of data, perhaps a client name. As you collect data for clients, their entries must be updated. Now, let's assume that the collection of client data is distributed across numerous other machines from which it is later collected and put into the database. If each client has unique information, then you would not be able to perform a bulk update; i.e., there is no where-clause criteria for you to use to update multiple clients in one shot. On the other hand, you could perform bulk inserts. So, the question might be better posed as follows: Is it better to perform millions of single updates, or is it better to compile them into large bulk deletes and inserts. In other words, instead of "update [table] set field=data where clientid=123" a milltion times, you do 'delete from [table] where clientid in ([all clients to be updated]);insert into [table] values (data for client1), (data for client2), etc'
如果你有几百万行。每一行从一段数据开始,可能是客户端名称。当您为客户端收集数据时,必须更新它们的条目。现在,让我们假设客户端数据的收集分布在许多其他机器上,然后从这些机器中收集数据并将其放入数据库。如果每个客户端都有唯一的信息,那么您将无法执行批量更新;即。,您不需要使用where子句标准一次性更新多个客户端。另一方面,您可以执行批量插入。因此,这个问题可以更好地提出如下:是执行数百万个单个更新更好,还是将它们编译为大型批量删除和插入更好。换句话说,代替“更新[表]set字段=数据,clientid=123”的计算时间,您可以从[表]中删除clientid([所有客户端更新]);插入[表]值(client1的数据),(client2的数据)等等。
Is either choice better than the other, or are you screwed both ways?
你的选择是好是坏?
#9
3
Delete + Insert is almost always faster because an Update has way more steps involved.
删除+插入几乎总是更快,因为更新涉及的步骤更多。
Update:
更新:
- Look for the row using PK.
- 使用PK查找行。
- Read the row from disk.
- 从磁盘读取行。
- Check for which values have changed
- 检查哪些值已经更改。
- Raise the onUpdate Trigger with populated :NEW and :OLD variables
- 使用填充的:NEW和:OLD变量引发onUpdate触发器
-
Write New variables to disk (The entire row)
向磁盘(整行)写入新变量
(This repeats for every row you're updating)
(对更新的每一行重复此操作)
Delete + Insert:
删除+插入:
- Mark rows as deleted (Only in the PK).
- 将行标记为已删除(仅在PK中)。
- Insert new rows at the end of the table.
- 在表的末尾插入新的行。
-
Update PK Index with locations of new records.
用新记录的位置更新PK索引。
(This doesn't repeat, all can be perfomed in a single block of operation).
(这不会重复,所有操作都可以在一个操作块中完成)。
Using Insert + Delete will fragment your File System, but not that fast. Doing a lazy optimization on the background will allways free unused blocks and pack the table altogether.
使用Insert + Delete将对文件系统进行分段,但不会很快。在后台做一个懒惰的优化,将会避免使用未使用的块,并将整个表打包。
#10
2
Obviously, the answer varies based on what database you are using, but UPDATE can always be implemented faster than DELETE+INSERT. Since in-memory ops are mostly trivial anyways, given a hard-drive based database, an UPDATE can change a database field in-place on the hdd, while a delete would remove a row (leaving an empty space), and insert a new row, perhaps to the end of the table (again, it's all in the implementation).
显然,根据您使用的数据库,答案是不同的,但是更新的实现速度总是比DELETE+INSERT要快。从内存中运维主要是微不足道的无论如何,给定一个硬盘的基础数据库,更新可以改变一个数据库字段就地硬盘,而删除就删除一行(留下一个空的空间),并插入一个新行,也许到最后的表(再一次,这是所有的实现)。
The other, minor, issue is that when you UPDATE a single variable in a single row, the other columns in that row remain the same. If you DELETE and then do an INSERT, you run the risk of forgetting about other columns and consequently leaving them behind (in which case you would have to do a SELECT before your DELETE to temporarily store your other columns before writing them back with INSERT).
另一个小问题是,当您更新单个行中的单个变量时,该行中的其他列保持不变。如果您删除并执行插入操作,那么您就有可能忘记其他列,因此将它们抛在后面(在这种情况下,您必须在删除之前进行选择,以暂时存储其他列,然后再将它们以INSERT的形式返回)。
#11
1
It depends on the product. A product could be implemented that (under the covers) converts all UPDATEs into a (transactionally wrapped) DELETE and INSERT. Provided the results are consistent with the UPDATE semantics.
这取决于产品。可以实现将所有更新转换为(事务包装的)删除和插入的产品。提供的结果与更新语义一致。
I'm not saying I'm aware of any product that does this, but it's perfectly legal.
我不是说我知道有任何产品可以做到这一点,但这是完全合法的。
#12
1
Every write to the database has lots of potential side effects.
每个写入数据库都有很多潜在的副作用。
Delete: a row must be removed, indexes updated, foreign keys checked and possibly cascade-deleted, etc. Insert: a row must be allocated - this might be in place of a deleted row, might not be; indexes must be updated, foreign keys checked, etc. Update: one or more values must be updated; perhaps the row's data no longer fits into that block of the database so more space must be allocated, which may cascade into multiple blocks being re-written, or lead to fragmented blocks; if the value has foreign key constraints they must be checked, etc.
删除:一行必须删除,索引更新,外键检查,可能被删除,等等。插入:必须分配一行——这可能是删除行,可能不是;必须更新索引、检查外键等。更新:必须更新一个或多个值;也许行数据不再适合这个数据库块,因此必须分配更多的空间,这可能导致多个块被重新写入,或者导致碎片块;如果值有外键约束,则必须检查它们。
For a very small number of columns or if the whole row is updated Delete+insert might be faster, but the FK constraint problem is a big one. Sure, maybe you have no FK constraints now, but will that always be true? And if you have a trigger it's easier to write code that handles updates if the update operation is truly an update.
对于非常少的列,或者如果整个行被更新,删除+插入可能会更快,但是FK约束问题是一个很大的问题。当然,也许你现在没有FK约束,但这是真的吗?如果您有一个触发器,那么如果更新操作是真正的更新,那么编写处理更新的代码会更容易。
Another issue to think about is that sometimes inserting and deleting hold different locks than updating. The DB might lock the entire table while you are inserting or deleting, as opposed to just locking a single record while you are updating that record.
另一个要考虑的问题是,有时插入和删除持有不同于更新的锁。DB可能在插入或删除时锁定整个表,而不是在更新记录时锁定单个记录。
In the end, I'd suggest just updating a record if you mean to update it. Then check your DB's performance statistics and the statistics for that table to see if there are performance improvements to be made. Anything else is premature.
最后,我建议您更新一个记录,如果您想更新它的话。然后检查DB的性能统计信息和该表的统计信息,看看是否有性能改进。别的还为时过早。
An example from the ecommerce system I work on: We were storing credit-card transaction data in the database in a two-step approach: first, write a partial transaction to indicate that we've started the process. Then, when the authorization data is returned from the bank update the record. We COULD have deleted then re-inserted the record but instead we just used update. Our DBA told us that the table was fragmented because the DB was only allocating a small amount of space for each row, and the update caused block-chaining since it added a lot of data. However, rather than switch to DELETE+INSERT we just tuned the database to always allocate the whole row, this means the update could use the pre-allocated empty space with no problems. No code change required, and the code remains simple and easy to understand.
我所研究的电子商务系统中的一个例子是:我们以两步的方式在数据库中存储信用卡事务数据:首先,编写一个部分事务来表明我们已经启动了这个过程。然后,当从银行返回授权数据时,更新记录。我们可以删除然后重新插入记录,但是我们只是使用update。我们的DBA告诉我们,表是分段的,因为DB只分配了每一行的一小部分空间,而更新导致了块链接,因为它添加了大量的数据。然而,我们并没有切换到DELETE+INSERT,而是对数据库进行了调优,以始终分配整个行,这意味着更新可以使用预先分配的空空间,没有问题。不需要任何代码更改,并且代码仍然简单且易于理解。
#13
1
The question of speed is irrelevant without a specific speed problem.
在没有特定速度问题的情况下,速度问题是无关紧要的。
If you are writing SQL code to make a change to an existing row, you UPDATE it. Anything else is incorrect.
如果您正在编写SQL代码以对现有行进行更改,则需要更新它。什么是不正确的。
If you're going to break the rules of how code should work, then you'd better have a damn good, quantified reason for it, and not a vague idea of "This way is faster", when you don't have any idea what "faster" is.
如果你想要打破代码应该如何工作的规则,那么你最好有一个好的,量化的原因,而不是一个模糊的概念“这样更快”,当你不知道什么是“更快”的时候。
#14
0
In specific cases, Delete+Insert would save you time. I have a table that has 30000 odd rows and there is a daily update/insert of these records using a data file. The upload process generates 95% of update statements as the records are already there and 5% of inserts for ones that do not exist. Alternatively, uploading the data file records into a temp table, deletion of the destination table for records in the temp table followed by insertion of the same from the temp table has shown 50% gain in time.
在特定的情况下,删除+插入可以节省时间。我有一个表,它有30000多行,并且每天使用数据文件更新/插入这些记录。上载过程生成95%的更新语句,因为记录已经存在,而不存在的记录生成5%的插入。或者,将数据文件记录上载到临时表中,删除temp表中记录的目标表,然后在temp表中插入相同的记录,显示时间增加了50%。