This question is in regards to the best practice for handling many inserts or updates using Microsoft Entity Framework. The problem is that we wrote a long-running program which pulls back thousands of records from the database, and then updates a single field on each of those records, one-by-one. Much to our dismay, we realized that each of these records that were updated were locked for the duration of the time in which the ObjectContext was not disposed. Below is some pseudocode (doesn't actually run) to illustrate:
此问题涉及使用Microsoft Entity Framework处理许多插入或更新的最佳实践。问题是我们编写了一个长期运行的程序,它从数据库中提取数千条记录,然后逐个更新每条记录上的单个字段。令我们沮丧的是,我们意识到更新的每个记录都在ObjectContext未被处理的时间内被锁定。下面是一些伪代码(实际上没有运行)来说明:
using(ObjectContext context = new ObjectContext())
{
var myRecords = context.CreateObjectSet<MyType>().AsQueryable();
foreach(var record in myRecords)
{
record.MyField = "updated!";
context.SaveChanges();
//--do something really slow like call an external web service
}
}
The problem is that we need to do many updates without any regard for transactions. We were surprised to realize that calling context.SaveChanges() actually creates the lock on the records and does not release it until the ObjectContext is disposed. We especially do NOT want to lock the records in the database as this is a high-traffic system and the program could potentially run for hours.
问题是我们需要做很多更新而不考虑交易。我们惊讶地发现调用context.SaveChanges()实际上会创建对记录的锁定,并且在释放ObjectContext之前不会释放它。我们特别不想锁定数据库中的记录,因为这是一个高流量系统,程序可能会运行数小时。
So the question is: what is the optimal way to do many updates in Microsoft Entity Framework 4 WITHOUT doing them all on one long transaction that locks the DB? We are hoping that the answer is not to create a new ObjectContext for every single update...
所以问题是:在Microsoft Entity Framework 4中进行许多更新的最佳方法是什么,而不是在锁定数据库的一个长事务上完成所有更新?我们希望答案不是为每个更新创建一个新的ObjectContext ...
4 个解决方案
#1
7
Entity framework on top of SQL server by default uses read committed transaction isolation level and transaction is committed at the end of SaveChanges
. If you suspect other behavior it must be by the rest of your code (are you using TransactionScope
? - you didn't show it in your code) or it must be some bug.
默认情况下,SQL服务器顶部的实体框架使用读取已提交事务隔离级别,并且在SaveChanges结束时提交事务。如果您怀疑其他行为,则必须使用其余代码(您使用的是TransactionScope吗? - 您没有在代码中显示它)或者它必定是一些错误。
Also your approach is wrong. If you want to save each record separately you should also load each record separately. EF is definitely bad choice for this type of applications. Even if you use only single SaveChange
for updating all your records it will still make single roundtrip to database for each update.
你的方法也错了。如果要单独保存每条记录,还应单独加载每条记录。 EF对于这类应用来说绝对是不错的选择。即使您只使用单个SaveChange来更新所有记录,它仍然会为每次更新单个往返数据库。
#2
4
Those locks are not created by Entity Framework. EF only supports optimistic concurrency, pessimistic locking is not supported with EF.
这些锁不是由Entity Framework创建的。 EF仅支持乐观并发,EF不支持悲观锁定。
I think the locking you experience is a result of your SQL Server configuration. Perhaps if your Transaction Isolation Level on the server is set to REPEATABLE READ this might cause the locks after each query. But I am not sure which configuration setting could be exactly the problem. More details are here.
我认为您遇到的锁定是SQL Server配置的结果。也许如果服务器上的事务隔离级别设置为REPEATABLE READ,则可能会在每次查询后导致锁定。但我不确定哪个配置设置可能正好是问题。更多细节在这里。
Edit:
编辑:
Another helpful article about transactions and transaction isolation in EF is here. It strongly recommends to always set the isolation level explicitely. Quote from the article:
关于EF中的事务和事务隔离的另一篇有用的文章就在这里。强烈建议始终明确设置隔离级别。从文章引用:
If you don't take control of [the isolation level], you have no idea in which transaction isolation level your queries will be running. After all, you don't know where the connection that you got from the pool has been [...] You simply inherit the last used isolation level on the connection, so you have no idea which type of locks are taken (or worse: ignored) by your queries and for how long these locks will be held. On a busy database, this will definitely lead to random errors, time-outs and deadlocks.
如果您不控制[隔离级别],则无法确定将在哪个事务隔离级别运行查询。毕竟,你不知道你从池中获得的连接在哪里[...]你只是继承了连接上最后使用的隔离级别,所以你不知道采取了哪种类型的锁(或者更糟:忽略)您的查询以及这些锁将持有多长时间。在繁忙的数据库中,这肯定会导致随机错误,超时和死锁。
#3
2
I may be wrong, but I believe you should not be calling SaveChanges() every single time since that applies the changes to the database at that point. Instead, apply SaveChanges() at the end of your object changes, or use a counter to do it less frequently.
我可能错了,但我相信你不应该每次都调用SaveChanges(),因为那时将更改应用于数据库。相反,在对象更改结束时应用SaveChanges(),或使用计数器更少地执行此操作。
#4
1
In our application we had a similar scenario, avoid locking as much as possible running a massive select and then creating a lot of inserts after some in memory operation.
在我们的应用程序中,我们有类似的场景,尽可能避免锁定运行大量选择,然后在内存操作中创建大量插入。
- If you want to read everything upfront
- 如果你想提前阅读所有内容
Solution A) Use a transaction scope that includes read and update PRO: Data safely updated CONS: Locks caused by read (repeatable reads) and update
解决方案A)使用包含读取和更新的事务范围PRO:数据安全更新缺点:读取(可重复读取)和更新导致的锁定
Solution B) Do not use a transaction and update all the data together PRO: Data safely updated, but the data you read may have been changed in the meanwhile CONS: Locks caused by the update for the entire duration (EF create by default a transaction)
解决方案B)不要使用事务并一起更新所有数据PRO:数据安全更新,但您读取的数据可能在同时更改CONS:由整个持续时间更新引起的锁定(EF默认情况下创建一个事务)
Solution C) Update in batches instead of all the data all together (usable only if the select is not locking the tables, otherwise you get the same behaviour as B PRO: Shorter and smaller locks in the updated tables CONS: You increase the change of being affected by data obsolescence
解决方案C)批量更新而不是所有数据一起更新(仅当select不锁定表时才可用,否则您获得与B PRO相同的行为:更新表中的更短和更小的锁定CONS:您增加了更改受数据过时影响
- If you want (and can) read in batches
- 如果你想(并且可以)批量阅读
Solution D) Breaking down the problem and splitting the reads can facilitate you to reduce lock so you can use a transaction scope to wrap both read and write (as sol. A) PRO: Data safely updated CONS: Locks caused by read (repeatable reads) and update, the impacts vary based on the batch size and nature of the query itself
解决方案D)分解问题并拆分读取可以帮助您减少锁定,这样您就可以使用事务范围来包装读取和写入(作为sol.A)PRO:数据安全更新CONS:由读取引起的锁定(可重复读取) )并且更新,影响因批量大小和查询本身的性质而异
Solution E) Do not use transactions, so only the update will produce small locks (as sol. B) PRO: Data safely updated, but the data you read may have been changed in the meanwhile CONS: Locks caused by the updates
解决方案E)不要使用事务,因此只有更新才会产生小锁(如sol.B)PRO:数据安全更新,但您读取的数据可能在同时改变了CONS:由更新引起的锁
As @Ladislav correctly pointed, multiple inserts are really inefficient and a quick profiling on the database shows you how the ORM magic fails in this case. If you want to use EF to perform batch operations such inserts, update and deletes, I recommend you to have a look at this: EF Utilities
正如@Ladislav正确指出的那样,多个插入实际上效率很低,并且数据库上的快速分析会向您显示ORM魔法在这种情况下是如何失败的。如果您想使用EF执行批量操作,例如插入,更新和删除,我建议您看一下:EF Utilities
I tend to test locks using this query, I hope may help to understand better what is going on.
我倾向于使用此查询来测试锁,我希望可能有助于更好地了解正在发生的事情。
SELECT
OBJECT_NAME(p.OBJECT_ID) AS TableName,
resource_type,
resource_description
FROM
sys.dm_tran_locks l JOIN
sys.partitions p ON
l.resource_associated_entity_id = p.hobt_id
#1
7
Entity framework on top of SQL server by default uses read committed transaction isolation level and transaction is committed at the end of SaveChanges
. If you suspect other behavior it must be by the rest of your code (are you using TransactionScope
? - you didn't show it in your code) or it must be some bug.
默认情况下,SQL服务器顶部的实体框架使用读取已提交事务隔离级别,并且在SaveChanges结束时提交事务。如果您怀疑其他行为,则必须使用其余代码(您使用的是TransactionScope吗? - 您没有在代码中显示它)或者它必定是一些错误。
Also your approach is wrong. If you want to save each record separately you should also load each record separately. EF is definitely bad choice for this type of applications. Even if you use only single SaveChange
for updating all your records it will still make single roundtrip to database for each update.
你的方法也错了。如果要单独保存每条记录,还应单独加载每条记录。 EF对于这类应用来说绝对是不错的选择。即使您只使用单个SaveChange来更新所有记录,它仍然会为每次更新单个往返数据库。
#2
4
Those locks are not created by Entity Framework. EF only supports optimistic concurrency, pessimistic locking is not supported with EF.
这些锁不是由Entity Framework创建的。 EF仅支持乐观并发,EF不支持悲观锁定。
I think the locking you experience is a result of your SQL Server configuration. Perhaps if your Transaction Isolation Level on the server is set to REPEATABLE READ this might cause the locks after each query. But I am not sure which configuration setting could be exactly the problem. More details are here.
我认为您遇到的锁定是SQL Server配置的结果。也许如果服务器上的事务隔离级别设置为REPEATABLE READ,则可能会在每次查询后导致锁定。但我不确定哪个配置设置可能正好是问题。更多细节在这里。
Edit:
编辑:
Another helpful article about transactions and transaction isolation in EF is here. It strongly recommends to always set the isolation level explicitely. Quote from the article:
关于EF中的事务和事务隔离的另一篇有用的文章就在这里。强烈建议始终明确设置隔离级别。从文章引用:
If you don't take control of [the isolation level], you have no idea in which transaction isolation level your queries will be running. After all, you don't know where the connection that you got from the pool has been [...] You simply inherit the last used isolation level on the connection, so you have no idea which type of locks are taken (or worse: ignored) by your queries and for how long these locks will be held. On a busy database, this will definitely lead to random errors, time-outs and deadlocks.
如果您不控制[隔离级别],则无法确定将在哪个事务隔离级别运行查询。毕竟,你不知道你从池中获得的连接在哪里[...]你只是继承了连接上最后使用的隔离级别,所以你不知道采取了哪种类型的锁(或者更糟:忽略)您的查询以及这些锁将持有多长时间。在繁忙的数据库中,这肯定会导致随机错误,超时和死锁。
#3
2
I may be wrong, but I believe you should not be calling SaveChanges() every single time since that applies the changes to the database at that point. Instead, apply SaveChanges() at the end of your object changes, or use a counter to do it less frequently.
我可能错了,但我相信你不应该每次都调用SaveChanges(),因为那时将更改应用于数据库。相反,在对象更改结束时应用SaveChanges(),或使用计数器更少地执行此操作。
#4
1
In our application we had a similar scenario, avoid locking as much as possible running a massive select and then creating a lot of inserts after some in memory operation.
在我们的应用程序中,我们有类似的场景,尽可能避免锁定运行大量选择,然后在内存操作中创建大量插入。
- If you want to read everything upfront
- 如果你想提前阅读所有内容
Solution A) Use a transaction scope that includes read and update PRO: Data safely updated CONS: Locks caused by read (repeatable reads) and update
解决方案A)使用包含读取和更新的事务范围PRO:数据安全更新缺点:读取(可重复读取)和更新导致的锁定
Solution B) Do not use a transaction and update all the data together PRO: Data safely updated, but the data you read may have been changed in the meanwhile CONS: Locks caused by the update for the entire duration (EF create by default a transaction)
解决方案B)不要使用事务并一起更新所有数据PRO:数据安全更新,但您读取的数据可能在同时更改CONS:由整个持续时间更新引起的锁定(EF默认情况下创建一个事务)
Solution C) Update in batches instead of all the data all together (usable only if the select is not locking the tables, otherwise you get the same behaviour as B PRO: Shorter and smaller locks in the updated tables CONS: You increase the change of being affected by data obsolescence
解决方案C)批量更新而不是所有数据一起更新(仅当select不锁定表时才可用,否则您获得与B PRO相同的行为:更新表中的更短和更小的锁定CONS:您增加了更改受数据过时影响
- If you want (and can) read in batches
- 如果你想(并且可以)批量阅读
Solution D) Breaking down the problem and splitting the reads can facilitate you to reduce lock so you can use a transaction scope to wrap both read and write (as sol. A) PRO: Data safely updated CONS: Locks caused by read (repeatable reads) and update, the impacts vary based on the batch size and nature of the query itself
解决方案D)分解问题并拆分读取可以帮助您减少锁定,这样您就可以使用事务范围来包装读取和写入(作为sol.A)PRO:数据安全更新CONS:由读取引起的锁定(可重复读取) )并且更新,影响因批量大小和查询本身的性质而异
Solution E) Do not use transactions, so only the update will produce small locks (as sol. B) PRO: Data safely updated, but the data you read may have been changed in the meanwhile CONS: Locks caused by the updates
解决方案E)不要使用事务,因此只有更新才会产生小锁(如sol.B)PRO:数据安全更新,但您读取的数据可能在同时改变了CONS:由更新引起的锁
As @Ladislav correctly pointed, multiple inserts are really inefficient and a quick profiling on the database shows you how the ORM magic fails in this case. If you want to use EF to perform batch operations such inserts, update and deletes, I recommend you to have a look at this: EF Utilities
正如@Ladislav正确指出的那样,多个插入实际上效率很低,并且数据库上的快速分析会向您显示ORM魔法在这种情况下是如何失败的。如果您想使用EF执行批量操作,例如插入,更新和删除,我建议您看一下:EF Utilities
I tend to test locks using this query, I hope may help to understand better what is going on.
我倾向于使用此查询来测试锁,我希望可能有助于更好地了解正在发生的事情。
SELECT
OBJECT_NAME(p.OBJECT_ID) AS TableName,
resource_type,
resource_description
FROM
sys.dm_tran_locks l JOIN
sys.partitions p ON
l.resource_associated_entity_id = p.hobt_id