Maybe there's no simple answer to this question, but I ask in case someone has, if not a simple answer, at least an insight.
也许对这个问题没有简单的答案,但我会问,如果有人,如果不是一个简单的答案,至少有一个见解。
I've had a number of occasions where I create a loop that goes through many records in a database table performing some update, and where I could legitimately do one big commit at the end, or commit each record as I processed it. i.e. committing one at a time would not create any data integrity issues.
我曾经多次创建一个循环,它遍历数据库表中执行某些更新的许多记录,并且我可以合法地在最后执行一次大提交,或者在我处理它时提交每个记录。即,一次提交一个不会产生任何数据完整性问题。
Is there a clear case for which is better?
有明确的案例哪个更好?
What brings it to mind is that I had one such program that I recently switched from a single big commit to a bunch of little commits because it was a fairly long running program -- about 80 minutes -- and it failed half way through on bad data. I fixed the problem and re-ran, but then it had to start over again from the beginning when I could have had it just process the previously unprocessed records.
让我想到的是,我有一个这样的程序,我最近从一个大的提交切换到一堆小提交,因为它是一个相当长时间运行的程序 - 大约80分钟 - 并且它在一半时间内通过坏数据。我修复了问题并重新运行,但是当我可以让它只处理以前未处理的记录时,它必须从头开始重新开始。
I noticed when I made this change that the run time was about the same either way.
我注意到,当我做出这个改变时,运行时间大致相同。
3 个解决方案
#1
3
Assuming that the ability to rollback the entire persistence is not needed (in which case there is only one answer; commit outside), committing inside the loop keeps the transaction log smaller, but requires more roundtrips to the DB. Committing outside the loop is the exact opposite. Which is faster depends on the average operation count and amount of data to be committed overall. For a routine that persists about 10-20 records, commit outside the loop. For 1m-2m records, I'd commit in batches.
假设不需要回滚整个持久性的能力(在这种情况下只有一个答案;在外部提交),在循环内部提交会使事务日志更小,但需要更多往返数据库。在循环之外提交恰恰相反。哪个更快取决于平均操作计数和总体提交的数据量。对于持续大约10-20条记录的例程,在循环外部进行提交。对于1m-2m的记录,我会批量提交。
#2
1
I think the answer is do you need to rollback all if one fails? If yes, put the transaction outside, otherwise put it inside. Of course I almost never would write a loop to do an update anyway except to process fairly large batches of records. If you are doing row-by-row updates, there are better, more performant methods.
我认为答案是,如果一个失败,你需要回滚所有吗?如果是,请将交易置于外部,否则将其置于内部。当然,除了处理相当大批量的记录之外,我几乎从不会写一个循环来进行更新。如果您正在进行逐行更新,那么有更好,更高效的方法。
#3
0
In terms of performance, it is generally better to do one big commit at the end (let network traffic, normally less work for the DB).
在性能方面,通常最好在最后进行一次大提交(让网络流量通常少用于DB)。
This of course depends on many factors, such as the indexing on the table, amount of data etc.
这当然取决于许多因素,例如表格索引,数据量等。
What should be driving your decision is how important each update is - should it be a transaction in and of itself? Does an update of many items make sense? What happens if the loop fails halfway?
应该推动您做出决定的是每次更新的重要性 - 它本身应该是一项交易吗?许多项目的更新是否有意义?如果循环中途失败会发生什么?
Answering those questions will give you the right way to do this in your application for that process - you may arrive at different ways to handle the commit depending on the application context .
回答这些问题将为您提供在您的应用程序中执行此过程的正确方法 - 您可能会根据应用程序上下文以不同的方式处理提交。
#1
3
Assuming that the ability to rollback the entire persistence is not needed (in which case there is only one answer; commit outside), committing inside the loop keeps the transaction log smaller, but requires more roundtrips to the DB. Committing outside the loop is the exact opposite. Which is faster depends on the average operation count and amount of data to be committed overall. For a routine that persists about 10-20 records, commit outside the loop. For 1m-2m records, I'd commit in batches.
假设不需要回滚整个持久性的能力(在这种情况下只有一个答案;在外部提交),在循环内部提交会使事务日志更小,但需要更多往返数据库。在循环之外提交恰恰相反。哪个更快取决于平均操作计数和总体提交的数据量。对于持续大约10-20条记录的例程,在循环外部进行提交。对于1m-2m的记录,我会批量提交。
#2
1
I think the answer is do you need to rollback all if one fails? If yes, put the transaction outside, otherwise put it inside. Of course I almost never would write a loop to do an update anyway except to process fairly large batches of records. If you are doing row-by-row updates, there are better, more performant methods.
我认为答案是,如果一个失败,你需要回滚所有吗?如果是,请将交易置于外部,否则将其置于内部。当然,除了处理相当大批量的记录之外,我几乎从不会写一个循环来进行更新。如果您正在进行逐行更新,那么有更好,更高效的方法。
#3
0
In terms of performance, it is generally better to do one big commit at the end (let network traffic, normally less work for the DB).
在性能方面,通常最好在最后进行一次大提交(让网络流量通常少用于DB)。
This of course depends on many factors, such as the indexing on the table, amount of data etc.
这当然取决于许多因素,例如表格索引,数据量等。
What should be driving your decision is how important each update is - should it be a transaction in and of itself? Does an update of many items make sense? What happens if the loop fails halfway?
应该推动您做出决定的是每次更新的重要性 - 它本身应该是一项交易吗?许多项目的更新是否有意义?如果循环中途失败会发生什么?
Answering those questions will give you the right way to do this in your application for that process - you may arrive at different ways to handle the commit depending on the application context .
回答这些问题将为您提供在您的应用程序中执行此过程的正确方法 - 您可能会根据应用程序上下文以不同的方式处理提交。