更新包含多个列的表中的行

时间:2021-11-06 04:30:05

I have a table (several actually) that contain a lot of columns (maybe 100+). What's best performance-wise when updating rows in the table, if only a few columns have been changed.

我有一个表(实际上有几个)包含很多列(可能有100多个)。如果只更改了几个列,那么在更新表中的行时,最好的性能考虑方法是什么?

  1. To build the UPDATE statement dynamically only updating the changed columns.
  2. 要动态构建UPDATE语句,只需更新已更改的列。
  3. To build a parameterized UPDATE-statement containing all columns, including those that have not changed.
  4. 要构建一个参数化的UPDATE-statement,该语句包含所有列,包括那些没有更改的列。
  5. To create a procedure that takes ALL values as parameters and updates the row.
  6. 创建一个将所有值作为参数并更新行的过程。

I'm using SQL Server. There are no BLOBS in the table.

我用SQL Server。桌子上没有斑点。

Thanks / M

谢谢/ M

3 个解决方案

#1


1  

Options 2 and 3 require more data transmitted to the server on an update - and thus have a bigger communication overhead for just the data.

选项2和3需要在更新时向服务器传输更多的数据,因此仅对数据就有更大的通信开销。

Does each row have a different set of updated columns, or is the set of columns updated the same for any given run (but the list might vary from run to run)?

每一行都有不同的更新的列集,或者是在给定的运行中更新了相同的列(但是列表可能会因运行而变化)?

In the latter case (same set of columns updated on a given run), then option 1 is likely to perform better; the statement will be prepared once and used many times with a minimum of data transferred to the server for each update.

在后一种情况下(在给定的运行中更新相同的列集),那么选项1可能执行得更好;该语句将准备一次,并多次使用,每次更新时至少向服务器传输数据。

In the former case, I would look to see whether there is a relatively small subset of the columns that are changed (say 10 columns that are changed in different rows, even if any one row only changes up to 3 of those 10). In that case, I'd probably parameterize for the 10 columns, accepting the relatively small overhead of transmitting 7-9 column values that have not changed for the convenience of a single prepared statement. If the set of updated columns is all over the map (say more than 50 of the 100 columns are updated over the entire operation), then it is probably simpler just to deal with the whole lot.

在前一种情况下,我将查看是否有更改的列的相对较小的子集(比如10列在不同的行中更改,即使任何一行只更改其中的3行)。在这种情况下,我可能会对10列进行参数化,接受传输7-9列值的相对较小的开销,这些值不会因为单个准备语句的方便而更改。如果更新的列集在地图上到处都是(假设在整个操作过程中更新了100列中的50列),那么处理整个目录可能会更简单。

To some extent, it depends on how easy your host language (client API) makes it to handle the various possible ways of parameterizing the updates.

在某种程度上,这取决于您的主机语言(客户端API)如何轻松地处理各种可能的参数化更新的方法。

#2


4  

I would say number 2 and 3 are equivalent from a performance perspective. If you are using a PK to figure out which row to update and it is a clustered key, then I wouldn't worry about updating a column to itself. The problem with the 1st situation is that you are going to cause "procedure cache bloat", where you have many similar plans all taking up your plan cache because they are a slightly different iteration of the update.

从性能角度来看,2和3是等价的。如果您正在使用PK来确定要更新哪一行,并且它是一个集群密钥,那么我就不用担心将列更新为自己。第一种情况的问题是,您将导致“过程缓存膨胀”,在这里您有许多类似的计划,它们都占用了您的计划缓存,因为它们是更新的略有不同的迭代。

If you plan on doing massive updates, I might hesitate to recommend updating all columns, since it may cause FK look-ups, etc.

如果您计划进行大量更新,我可能不建议您更新所有列,因为它可能导致FK查找等等。

Thanks, Eric

谢谢你,埃里克

#3


0  

I'd vote for p.1 mixed with p.2, i.e. dynamically build a parametrized UPDATE statement that will update only changed columns. This will work for the case when your read/write rate is on the 'read' side and you're not doing updates too frequently so we can safely trade query plan caching for (physical) update performance.

我投票给p。1混合p。2,动态构建一个参数化更新语句,只更新已更改的列。当您的读/写速率在“读”端,并且您不太频繁地进行更新时,这样我们就可以安全地交换查询计划缓存以获得(物理)更新性能。

#1


1  

Options 2 and 3 require more data transmitted to the server on an update - and thus have a bigger communication overhead for just the data.

选项2和3需要在更新时向服务器传输更多的数据,因此仅对数据就有更大的通信开销。

Does each row have a different set of updated columns, or is the set of columns updated the same for any given run (but the list might vary from run to run)?

每一行都有不同的更新的列集,或者是在给定的运行中更新了相同的列(但是列表可能会因运行而变化)?

In the latter case (same set of columns updated on a given run), then option 1 is likely to perform better; the statement will be prepared once and used many times with a minimum of data transferred to the server for each update.

在后一种情况下(在给定的运行中更新相同的列集),那么选项1可能执行得更好;该语句将准备一次,并多次使用,每次更新时至少向服务器传输数据。

In the former case, I would look to see whether there is a relatively small subset of the columns that are changed (say 10 columns that are changed in different rows, even if any one row only changes up to 3 of those 10). In that case, I'd probably parameterize for the 10 columns, accepting the relatively small overhead of transmitting 7-9 column values that have not changed for the convenience of a single prepared statement. If the set of updated columns is all over the map (say more than 50 of the 100 columns are updated over the entire operation), then it is probably simpler just to deal with the whole lot.

在前一种情况下,我将查看是否有更改的列的相对较小的子集(比如10列在不同的行中更改,即使任何一行只更改其中的3行)。在这种情况下,我可能会对10列进行参数化,接受传输7-9列值的相对较小的开销,这些值不会因为单个准备语句的方便而更改。如果更新的列集在地图上到处都是(假设在整个操作过程中更新了100列中的50列),那么处理整个目录可能会更简单。

To some extent, it depends on how easy your host language (client API) makes it to handle the various possible ways of parameterizing the updates.

在某种程度上,这取决于您的主机语言(客户端API)如何轻松地处理各种可能的参数化更新的方法。

#2


4  

I would say number 2 and 3 are equivalent from a performance perspective. If you are using a PK to figure out which row to update and it is a clustered key, then I wouldn't worry about updating a column to itself. The problem with the 1st situation is that you are going to cause "procedure cache bloat", where you have many similar plans all taking up your plan cache because they are a slightly different iteration of the update.

从性能角度来看,2和3是等价的。如果您正在使用PK来确定要更新哪一行,并且它是一个集群密钥,那么我就不用担心将列更新为自己。第一种情况的问题是,您将导致“过程缓存膨胀”,在这里您有许多类似的计划,它们都占用了您的计划缓存,因为它们是更新的略有不同的迭代。

If you plan on doing massive updates, I might hesitate to recommend updating all columns, since it may cause FK look-ups, etc.

如果您计划进行大量更新,我可能不建议您更新所有列,因为它可能导致FK查找等等。

Thanks, Eric

谢谢你,埃里克

#3


0  

I'd vote for p.1 mixed with p.2, i.e. dynamically build a parametrized UPDATE statement that will update only changed columns. This will work for the case when your read/write rate is on the 'read' side and you're not doing updates too frequently so we can safely trade query plan caching for (physical) update performance.

我投票给p。1混合p。2,动态构建一个参数化更新语句,只更新已更改的列。当您的读/写速率在“读”端,并且您不太频繁地进行更新时,这样我们就可以安全地交换查询计划缓存以获得(物理)更新性能。