脚本与SQL Server中的表并发运行

时间:2022-06-26 08:52:15

I have identical Python scripts I need to run on multiple servers all targeting the same table on a DB server. The script takes 5-20 seconds to run, and must run every 5 minutes.

我有相同的Python脚本,我需要在多个服务器上运行,所有服务器都针对数据库服务器上的同一个表。该脚本需要5-20秒才能运行,并且必须每5分钟运行一次。

Server1 --->  -------------
              |  DB Table |
Server2 --->  -------------

The script looks at a single table that looks like this:

该脚本查看单​​个表,如下所示:

Type | many other fields | DirtyBit  |  Owner
 --------------------------------------------
  X  | ...               | UnUsed    |   NULL
  X  | ...               | UnUsed    |   NULL
  X  | ...               | UnUsed    |   NULL
  Y  | ...               | UnUsed    |   NULL
  Y  | ...               | UnUsed    |   NULL

The script does the following:

该脚本执行以下操作:

  1. Grab all records of type X (in a transaction) where DirtyBit is UnUsed and Owner is NULL.,

    抓取类型为X的所有记录(在事务中),其中DirtyBit为UnUsed且Owner为NULL。

  2. Update all the records, set DirtyBit to InUse, and Owner to Server1.

    更新所有记录,将DirtyBit设置为InUse,将Owner设置为Server1。

  3. Perform some operations on the data in Python.

    在Python中对数据执行一些操作。

  4. Update all the records according to the operations in 3. Set DirtyBit back to UnUsed, and Owner back to NULL

    根据3中的操作更新所有记录。将DirtyBit设置回UnUsed,并将Owner返回NULL

Because the script is running on multiple servers, the DirtyBit/Owner combination works to ensure the scripts aren't stepping on each other. Also, note that each row in the table is independent of all the others.

由于脚本在多个服务器上运行,因此DirtyBit / Owner组合可以确保脚本不会相互踩踏。另请注意,表中的每一行都独立于所有其他行。

Question: is this a sensible approach to getting the scripts to run concurrently? Is there anyway the database can handle this for me (maybe changing the Transaction Isolation Level?). Ideally, I want this, if the scripts happen to run at the same time:

问题:这是一种让脚本同时运行的合理方法吗?无论如何,数据库可以为我处理这个问题(可能更改事务隔离级别?)。理想情况下,如果脚本碰巧同时运行,我想要这个:

  1. Script on Server 1 starts running.

    服务器1上的脚本开始运行。

  2. Script on Server 2 starts running, notices that 1 is running, and thus decides it doesn't need to run.

    服务器2上的脚本开始运行,注意1正在运行,因此决定它不需要运行。

  3. Script on Server 1 finishes, updates all the data.

    服务器1上的脚本完成,更新所有数据。

2 个解决方案

#1


2  

Developing solutions that base on a concurrent access and modification of the data is always a very sensible thing. They're also prone to errors that happen very rarely and are hard to find.

开发基于并发访问和修改数据的解决方案始终是一件非常明智的事情。他们也很容易发生很少发生并且很难找到的错误。

In your case, what you want to do is to, in fact, serialize access to your table, not only updates. That is, allow only one thread (transaction) to fetch the data it needs (where DirtyBit is UnUsed and Owner is NULL) and mark those rows as "used". I'm quite sure that your current solution doesn't work properly. Why? Consider such a scenario:

在您的情况下,您要做的事实上是序列化对表的访问,而不仅仅是更新。也就是说,只允许一个线程(事务)获取它需要的数据(其中DirtyBit是UnUsed而Owner是NULL)并将这些行标记为“used”。我很确定您当前的解决方案无法正常运行。为什么?考虑这样的场景:

  1. transaction 1 begins
  2. 交易1开始

  3. transaction 2 begins
  4. 交易2开始

  5. transaction 1 reads the data from table
  6. 事务1从表中读取数据

  7. transaction 2 reads the data from table - it is allowed to in shared lock mode. It reads the same data as transaction 1 did
  8. 事务2从表中读取数据 - 允许它以共享锁定模式。它读取与事务1相同的数据

  9. transaction 1 updates the table
  10. 事务1更新表

  11. transaction 2 wants to update the table, but it's blocked by transaction 1 - it holds
  12. 事务2想要更新表,但它被事务1阻止 - 它成立

  13. transaction 1 commits
  14. 事务1提交

  15. now transaction 2 may update the data and commit them
  16. 现在,事务2可以更新数据并提交它们

As a result both transactions 1 and 2 read the same rows and your script on both servers will operate on them. You may easily reproduce such a scenario manually operating on the database.

因此,事务1和2都读取相同的行,并且两个服务器上的脚本都将对它们进行操作。您可以轻松地重现手动操作数据库的此类方案。

You can avoid it explicitly acquiring exclusive table lock. This would look like this:

您可以避免它明确获取独占表锁。这看起来像这样:

begin transaction;

select * from test where DirtyBit = 'UnUsed' and Owner is null (TABLOCKX);

update test set DirtyBit = 'Used', Owner = 'Server1' where id in (...);

commit;

Here, the (TABLOCKX) will cause the other transactions to wait until this transaction commits or rollbacks - they will not be able to read the data. Does this solve your problem?

这里,(TABLOCKX)将导致其他事务等待此事务提交或回滚 - 它们将无法读取数据。这会解决您的问题吗?

But... if you can avoid concurrency in this specific case, I'd recommend you to do so (because of the first paragraph of my response).

但是......如果你能在这种特殊情况下避免并发,我建议你这样做(因为我的回复的第一段)。

#2


1  

I wouldn't take the approach you've used here. Home-grown solutions like this tend to be brittle.

我不会采取你在这里使用的方法。像这样的本土解决方案往往很脆弱。

This looks like a good problem for a scheduled job, with concurrency controlled via sp_getapplock:

对于预定作业来说,这看起来是一个很好的问题,通过sp_getapplock控制并发:

#1


2  

Developing solutions that base on a concurrent access and modification of the data is always a very sensible thing. They're also prone to errors that happen very rarely and are hard to find.

开发基于并发访问和修改数据的解决方案始终是一件非常明智的事情。他们也很容易发生很少发生并且很难找到的错误。

In your case, what you want to do is to, in fact, serialize access to your table, not only updates. That is, allow only one thread (transaction) to fetch the data it needs (where DirtyBit is UnUsed and Owner is NULL) and mark those rows as "used". I'm quite sure that your current solution doesn't work properly. Why? Consider such a scenario:

在您的情况下,您要做的事实上是序列化对表的访问,而不仅仅是更新。也就是说,只允许一个线程(事务)获取它需要的数据(其中DirtyBit是UnUsed而Owner是NULL)并将这些行标记为“used”。我很确定您当前的解决方案无法正常运行。为什么?考虑这样的场景:

  1. transaction 1 begins
  2. 交易1开始

  3. transaction 2 begins
  4. 交易2开始

  5. transaction 1 reads the data from table
  6. 事务1从表中读取数据

  7. transaction 2 reads the data from table - it is allowed to in shared lock mode. It reads the same data as transaction 1 did
  8. 事务2从表中读取数据 - 允许它以共享锁定模式。它读取与事务1相同的数据

  9. transaction 1 updates the table
  10. 事务1更新表

  11. transaction 2 wants to update the table, but it's blocked by transaction 1 - it holds
  12. 事务2想要更新表,但它被事务1阻止 - 它成立

  13. transaction 1 commits
  14. 事务1提交

  15. now transaction 2 may update the data and commit them
  16. 现在,事务2可以更新数据并提交它们

As a result both transactions 1 and 2 read the same rows and your script on both servers will operate on them. You may easily reproduce such a scenario manually operating on the database.

因此,事务1和2都读取相同的行,并且两个服务器上的脚本都将对它们进行操作。您可以轻松地重现手动操作数据库的此类方案。

You can avoid it explicitly acquiring exclusive table lock. This would look like this:

您可以避免它明确获取独占表锁。这看起来像这样:

begin transaction;

select * from test where DirtyBit = 'UnUsed' and Owner is null (TABLOCKX);

update test set DirtyBit = 'Used', Owner = 'Server1' where id in (...);

commit;

Here, the (TABLOCKX) will cause the other transactions to wait until this transaction commits or rollbacks - they will not be able to read the data. Does this solve your problem?

这里,(TABLOCKX)将导致其他事务等待此事务提交或回滚 - 它们将无法读取数据。这会解决您的问题吗?

But... if you can avoid concurrency in this specific case, I'd recommend you to do so (because of the first paragraph of my response).

但是......如果你能在这种特殊情况下避免并发,我建议你这样做(因为我的回复的第一段)。

#2


1  

I wouldn't take the approach you've used here. Home-grown solutions like this tend to be brittle.

我不会采取你在这里使用的方法。像这样的本土解决方案往往很脆弱。

This looks like a good problem for a scheduled job, with concurrency controlled via sp_getapplock:

对于预定作业来说,这看起来是一个很好的问题,通过sp_getapplock控制并发: