信号安全使用sem_wait（）/ sem_post（）

I am trying to create a wrapper on Linux which controls how many concurrent executions of something are allowed at once. To do so, I am using a system wide counting semaphore. I create the semaphore, do a sem_wait(), launch the child process and then do a sem_post() when the child terminates. That is fine.

我试图在Linux上创建一个包装器,它控制一次允许多少次并发执行。为此,我使用系统范围的计数信号量。我创建信号量,执行sem_wait(),启动子进程,然后在子进程终止时执行sem_post()。那样就好。

The problem is how to safely handle signals sent to this wrapper. If it doesn't catch signals, the command might terminate without doing a sem_post(), causing the semaphore count to permanently decrease by one. So, I created a signal handler which does the sem_post(). But still, there is a problem.

问题是如何安全地处理发送给这个包装器的信号。如果它没有捕获信号,则命令可能会在不执行sem_post()的情况下终止,从而导致信号量计数永久减少1。所以,我创建了一个执行sem_post()的信号处理程序。但是,仍有一个问题。

If the handler is attached before the sem_wait() is performed, a signal could arrive before the sem_wait() completes, causing a sem_post() to occur without a sem_wait(). The reverse is possible if I do the sem_wait() before setting up the signal handler.

如果在执行sem_wait()之前附加了处理程序,则信号可能在sem_wait()完成之前到达,从而导致sem_post()在没有sem_wait()的情况下发生。如果我在设置信号处理程序之前执行sem_wait(),则可以反过来。

The obvious next step was to block signals during the setup of the handler and the sem_wait(). This is pseudocode of what I have now:

显而易见的下一步是在处理程序设置和sem_wait()期间阻止信号。这是我现在拥有的伪代码:

void handler(int sig)
{
  sem_post(sem);
  exit(1);
}

...
sigprocmask(...);   /* Block signals */
sigaction(...);     /* Set signal handler */
sem_wait(sem);
sigprocmask(...);   /* Unblock signals */
RunChild();
sem_post(sem);
exit(0);

The problem now is that the sem_wait() can block and during that time, signals are blocked. A user attempting to kill the process may end up resorting to "kill -9" which is behaviour I don't want to encourage since I cannot handle that case no matter what. I could use sem_trywait() for a small time and test sigpending() but that impacts fairness because there is no longer a guarantee that the process waiting on the semaphore the longest will get to run next.

现在的问题是sem_wait()可以阻塞,在此期间,信号被阻塞。试图杀死进程的用户可能最终诉诸“kill -9”,这是我不想鼓励的行为,因为无论如何我都无法处理这种情况。我可以使用sem_trywait()一小段时间并测试sigpending()但这会影响公平性,因为不再保证等待信号量最长的进程将在下一次运行。

Is there a truly safe solution here which allows me to handle signals during semaphore acquisition? I am considering resorting to a "Do I have the semaphore" global and removing the signal blocking but that is not 100% safe since acquiring the semaphore and setting the global isn't atomic but might be better than blocking signals while waiting.

这里有一个真正安全的解决方案,允许我在信号量采集过程中处理信号吗?我正在考虑求助于“我有信号量”全局并消除信号阻塞,但这并非100%安全,因为获取信号量并设置全局不是原子的,但可能比等待时阻塞信号更好。

3 个解决方案

#1

Are you sure sem_wait() causes signals to be blocked? I don't think this is the case. The man page for sem_wait() says that the EINTR error code is returned from sem_wait() if it is interrupted by a signal.

你确定sem_wait()导致信号被阻止吗?我不认为是这种情况。 sem_wait()的手册页说如果它被信号中断,则从sem_wait()返回EINTR错误代码。

You should be able to handle this error code and then your signals will be received. Have you run into a case where signals have not been received?

您应该能够处理此错误代码,然后您的信号将被接收。您是否遇到未收到信号的情况?

I would make sure you handle the error codes that sem_wait() can return. Although it may be rare, if you want to be 100% sure you want to cover 100% of your bases.

我会确保你处理sem_wait()可以返回的错误代码。虽然可能很少见,但如果你想100%确定你想要100%的基数。

#2

Are you sure you are approaching the problem correctly? If you want to wait for a child terminating, you may want to use the waitpid() system call. As you observed, it is not reliable to expect the child to do the sem_post() if it may receive signals.

您确定正确接近问题吗?如果要等待子进程终止,可能需要使用waitpid()系统调用。正如您所观察到的那样,如果孩子可能接收信号,那么期望孩子做sem_post()是不可靠的。

#3

I know this is old, but for the benefit of those still reading this courtesy of Google...

我知道这已经过时了,但为了那些仍在阅读Google礼貌的人的利益......

The simplest (and only?) robust solution to this problem is to use a System V semaphore, which allows the client to acquire the semaphore resource in a way which is automatically returned by the kernel NO MATTER HOW THE PROCESS EXITS.

这个问题的最简单(也是唯一的)稳健解决方案是使用System V信号量,它允许客户端以一种内核自动返回的方式获取信号量资源。无论过程是如何排出的。

#1