避免后台工作同时由两名员工同时运行。

时间:2022-04-01 07:08:50

I have a daemon that runs background jobs requested by our webservice. We have 4 workers running simultaneously.

我有一个守护进程,它运行我们的webservice请求的后台作业。我们有4个工人同时工作。

Sometimes a job is executed twice at the same time, because two workers decided to run that job. To avoid this situation we tried several things:

有时一份工作同时执行两次,因为两名工人决定执行这项工作。为了避免这种情况,我们做了几件事:

  1. Since our jobs comes from our databases, we added a flag called executed, that prevents other works to get a job that has already been started to execute; This does not solve the problem, sometimes the delay with our database is enough to have simultaneous executions;
  2. 由于我们的作业来自我们的数据库,我们添加了一个名为execute的标志,它阻止其他工作获得已经开始执行的作业;这并不能解决问题,有时数据库的延迟就足以同时执行;
  3. Added memcached in the system (all workers run in the same system), but somehow we had simultaneous jobs running today -- memcached does not solve for multiple servers as well.
  4. 在系统中添加了memcached(所有工作人员都运行在同一个系统中),但是不知何故,我们今天同时运行作业——memcached并不能解决多个服务器的问题。

Here is the following logic we are currently using:

以下是我们目前使用的逻辑:

// We create our memcached server
$memcached = new Memcached();
$memcached->addServer("127.0.0.1", 11211);

// Checkup every 5 seconds for operations
while (true) {
    // Gather all operations TODO
    // In this query, we do not accept operations that are set
    // as executed already.
    $result = findDaemonOperationsPendingQuery();

    // We have some results!
    if (mysqli_num_rows($result) > 0) {
        $op = mysqli_fetch_assoc($result);
        echo "Found an operation todo #" . $op['id'] . "\n";

        // Set operation as executed
        setDaemonOperationAsDone($op['id'], 'executed');

        // Verifies if operation is happening on memcached
        if (get_memcached_operation($memcached, $op['id'])) {
            echo "\tOperation id already executing...\n";
            continue;

        } else {
            // Set operation on memcached
            set_memcached_operation($memcached, $op['id']);
        }

        ... do our stuff
    }
}

How this kind of problem is usually solved? I looked up on the internet and found out a library called Gearman, but I'm not convinced that it will solve my problems when we have multiple servers.

这类问题通常如何解决?我在网上查了查,找到了一个叫做Gearman的库,但是我不相信它能解决我的问题,当我们有多个服务器的时候。

Another thing I thought was to predefine a daemon to run the operation at insertion, and create a failsafe exclusive daemon that runs operations set by daemons that are out of service.

我想到的另一件事是预先定义一个守护进程,以便在插入时运行操作,并创建一个failsafe独占守护进程,该守护进程运行由退出服务的守护进程设置的操作。

Any ideas?

什么好主意吗?

Thanks.

谢谢。

2 个解决方案

#1


2  

An alternative solution to using locks and transactions, assuming each worker has an id.

使用锁和事务的替代解决方案,假设每个worker都有一个id。

In your loop run:

在循环运行:

UPDATE operations SET worker_id = :wid WHERE worker_id IS NULL LIMIT 1;

SELECT * FROM operations where executed = 0 and worker_id = :wid;

The update is a single operation which is atomic and you are only setting worker_id if it is not yet set so no worries about race conditions. Setting the worker_id makes it clear who owns the operation. The update will only assign one operation because of the LIMIT 1.

更新是一个单独的操作,是原子操作,如果还没有设置worker_id,那么您只需要设置worker_id,因此不必担心竞争条件。设置worker_id可以明确谁拥有操作。更新只会分配一个操作,因为限制1。

#2


2  

You have a typical concurrency problem.

您有一个典型的并发问题。

  1. Worker 1 reads the table, select a job
  2. Worker 1读取表,选择一个job
  3. Worker 1 update the table to mark the job as 'assigned' or whatever
  4. Worker 1更新表,将作业标记为“已分配”或其他
  5. Oh but wait, between 1 and 2, worker 2 read the table as well, and since the job wasn't yet marked a 'assigned', worker 2 selected the same job
  6. 哦,等等,在1和2之间,工人2也读了表格,因为工作还没有标注“分配”,工人2选择了相同的工作。

The way to solve this is to use transactions and locks, in particular SELECT.. FOR UPDATE. It'll go like this:

解决这个问题的方法是使用事务和锁,特别是SELECT..为更新。它会像这样:

  1. Worker 1 starts a transaction (START TRANSACTION) and tries to acquire an exclusive lock SELECT * FROM jobs [...] FOR UPDATE
  2. Worker 1启动一个事务(启动事务)并试图从jobs[…为更新
  3. Worker 2 does the same. Except he has to wait because Worker 1 already has the lock.
  4. 工人2也是如此。但是他必须等待,因为工人1已经有锁了。
  5. Worker 1 updates the table to say he's now working on the job and commit the transaction immediately. This releases the lock for other workers to select jobs. Worker 1 can now safely start working on this job.
  6. Worker 1更新表,表示他正在处理作业并立即提交事务。这将释放其他工作人员选择作业的锁。工人1现在可以安全地开始这项工作了。
  7. Worker 2 can now read the table and acquire a lock. Since the table has been updated, worker 2 will select a different job.
  8. Worker 2现在可以读取表并获取一个锁。由于表已经更新,worker 2将选择一个不同的作业。

EDIT: Specific comment about your PHP code:

编辑:关于PHP代码的具体评论:

  • Your comment says you are fetching all the jobs that needs to be done at once in each worker. You should only select one, do it, select one, do it, etc.
  • 您的评论说,您正在获取每个worker中需要同时完成的所有作业。你应该只选择一个,做它,选择一个,做它,等等。
  • You are setting the flag 'executed' when in fact it's not (yet) executed. You need a 'assigned' flag, and a different 'executed' flag.
  • 您正在设置标志“已执行”,而实际上它尚未执行。您需要一个“分配”标志,和一个不同的“执行”标志。

#1


2  

An alternative solution to using locks and transactions, assuming each worker has an id.

使用锁和事务的替代解决方案,假设每个worker都有一个id。

In your loop run:

在循环运行:

UPDATE operations SET worker_id = :wid WHERE worker_id IS NULL LIMIT 1;

SELECT * FROM operations where executed = 0 and worker_id = :wid;

The update is a single operation which is atomic and you are only setting worker_id if it is not yet set so no worries about race conditions. Setting the worker_id makes it clear who owns the operation. The update will only assign one operation because of the LIMIT 1.

更新是一个单独的操作,是原子操作,如果还没有设置worker_id,那么您只需要设置worker_id,因此不必担心竞争条件。设置worker_id可以明确谁拥有操作。更新只会分配一个操作,因为限制1。

#2


2  

You have a typical concurrency problem.

您有一个典型的并发问题。

  1. Worker 1 reads the table, select a job
  2. Worker 1读取表,选择一个job
  3. Worker 1 update the table to mark the job as 'assigned' or whatever
  4. Worker 1更新表,将作业标记为“已分配”或其他
  5. Oh but wait, between 1 and 2, worker 2 read the table as well, and since the job wasn't yet marked a 'assigned', worker 2 selected the same job
  6. 哦,等等,在1和2之间,工人2也读了表格,因为工作还没有标注“分配”,工人2选择了相同的工作。

The way to solve this is to use transactions and locks, in particular SELECT.. FOR UPDATE. It'll go like this:

解决这个问题的方法是使用事务和锁,特别是SELECT..为更新。它会像这样:

  1. Worker 1 starts a transaction (START TRANSACTION) and tries to acquire an exclusive lock SELECT * FROM jobs [...] FOR UPDATE
  2. Worker 1启动一个事务(启动事务)并试图从jobs[…为更新
  3. Worker 2 does the same. Except he has to wait because Worker 1 already has the lock.
  4. 工人2也是如此。但是他必须等待,因为工人1已经有锁了。
  5. Worker 1 updates the table to say he's now working on the job and commit the transaction immediately. This releases the lock for other workers to select jobs. Worker 1 can now safely start working on this job.
  6. Worker 1更新表,表示他正在处理作业并立即提交事务。这将释放其他工作人员选择作业的锁。工人1现在可以安全地开始这项工作了。
  7. Worker 2 can now read the table and acquire a lock. Since the table has been updated, worker 2 will select a different job.
  8. Worker 2现在可以读取表并获取一个锁。由于表已经更新,worker 2将选择一个不同的作业。

EDIT: Specific comment about your PHP code:

编辑:关于PHP代码的具体评论:

  • Your comment says you are fetching all the jobs that needs to be done at once in each worker. You should only select one, do it, select one, do it, etc.
  • 您的评论说,您正在获取每个worker中需要同时完成的所有作业。你应该只选择一个,做它,选择一个,做它,等等。
  • You are setting the flag 'executed' when in fact it's not (yet) executed. You need a 'assigned' flag, and a different 'executed' flag.
  • 您正在设置标志“已执行”,而实际上它尚未执行。您需要一个“分配”标志,和一个不同的“执行”标志。