Environment: a RedHat-like distro, 2.6.39 kernel, glibc 2.12.
环境:一个红色的发行版,2.6.39内核,glibc 2.12。
I fully expect that if a signal was delivered while accept() was in progress, accept should fail, leaving errno==EINTR. However, mine doesn't do that, and I'm wondering why. Below are the sample program, and strace output.
我完全期望,如果在accept()过程中发送了一个信号,那么accept应该失败,从而导致errno==EINTR。然而,我并没有这么做,我很好奇为什么。下面是示例程序和strace输出。
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <signal.h>
#include <errno.h>
#include <arpa/inet.h>
#include <string.h>
static void sigh(int);
int main(int argc, char ** argv) {
int s;
struct sockaddr_in sin;
if ((s = socket(AF_INET, SOCK_STREAM, 0))<0) {
perror("socket");
return 1;
}
memset(&sin, 0, sizeof(struct sockaddr_in));
sin.sin_family = AF_INET;
if (bind(s, (struct sockaddr*)&sin, sizeof(struct sockaddr_in))) {
perror("bind");
return 1;
}
if (listen(s, 5)) {
perror("listen");
}
signal(SIGQUIT, sigh);
while (1) {
socklen_t sl = sizeof(struct sockaddr_in);
int rc = accept(s, (struct sockaddr*)&sin, &sl);
if (rc<0) {
if (errno == EINTR) {
printf("accept restarted\n");
continue;
}
perror("accept");
return 1;
}
printf("accepted fd %d\n", rc);
close(rc);
}
}
void sigh(int s) {
signal(s, sigh);
unsigned char p[100];
int i = 0;
while (s) {
p[i++] = '0'+(s%10);
s/=10;
}
write(1, "sig ", 4);
for (i--; i>=0; i--) {
write(1, &p[i], 1);
}
write(1, "\n", 1);
}
strace output:
strace输出:
execve("./accept", ["./accept"], [/* 57 vars */]) = 0
<skipped>
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(3, 5) = 0
rt_sigaction(SIGQUIT, {0x4008c4, [QUIT], SA_RESTORER|SA_RESTART, 0x30b7e329a0}, {SIG_DFL, [], 0}, 8) = 0
accept(3, 0x7fffe3e3c500, [16]) = ? ERESTARTSYS (To be restarted)
--- SIGQUIT (Quit) @ 0 (0) ---
rt_sigaction(SIGQUIT, {0x4008c4, [QUIT], SA_RESTORER|SA_RESTART, 0x30b7e329a0}, {0x4008c4, [QUIT], SA_RESTORER|SA_RESTART, 0x30b7e329a0}, 8) = 0
write(1, "sig ", 4sig ) = 4
write(1, "3", 13) = 1
write(1, "\n", 1
) = 1
rt_sigreturn(0x1) = 43
accept(3, ^C <unfinished ...>
2 个解决方案
#1
1
Just when I was about to post this, the "SA_RESTART" flag in strace output caught my attention. signal(2) man page says that signal() "...calls sigaction(2) using flags that supply BSD semantics..." starting from glibc2.
就在我准备发布这篇文章时,strace输出中的“SA_RESTART”标志引起了我的注意。信号(2)man page表示信号()“…从glibc2开始,使用提供BSD语义的标志调用sigaction(2)。
The SA_RESTART flag "...makes certain system calls restartable across signals...", which hides the process of restarting a call from the user. So, this is not specific to accept(), a number of other system calls are also affected, not that there is a clear list of which ones.
SA_RESTART国旗”…使某些系统调用可以跨信号重新启动……,它隐藏了重新启动用户调用的过程。因此,这并不是特定于accept()的,其他一些系统调用也会受到影响,而不是有一个清晰的列表列出哪些调用。
So, if you need to react to a signal from a thread that may be blocked on a system call, you should use sigaction() to set your signal handlers, and not signal(). Below is the modified sample program that does exactly that, for reference.
因此,如果您需要对来自可能在系统调用中被阻塞的线程的信号作出响应,您应该使用sigaction()来设置信号处理程序,而不是signal()。下面是修改后的示例程序,它正是这样做的,以供参考。
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <signal.h>
#include <errno.h>
#include <arpa/inet.h>
#include <string.h>
static void sigh(int);
static struct sigaction sa;
int main(int argc, char ** argv) {
int s;
struct sockaddr_in sin;
if ((s = socket(AF_INET, SOCK_STREAM, 0))<0) {
perror("socket");
return 1;
}
memset(&sin, 0, sizeof(struct sockaddr_in));
sin.sin_family = AF_INET;
if (bind(s, (struct sockaddr*)&sin, sizeof(struct sockaddr_in))) {
perror("bind");
return 1;
}
if (listen(s, 5)) {
perror("listen");
}
memset(&sa, 0, sizeof(struct sigaction));
sa.sa_handler = sigh;
sigemptyset(&sa.sa_mask);
sigaction(SIGQUIT, &sa, 0);
while (1) {
socklen_t sl = sizeof(struct sockaddr_in);
int rc = accept(s, (struct sockaddr*)&sin, &sl);
if (rc<0) {
if (errno == EINTR) {
printf("accept restarted\n");
continue;
}
perror("accept");
return 1;
}
printf("accepted fd %d\n", rc);
close(rc);
}
}
void sigh(int s) {
sigaction(SIGQUIT, &sa, 0);
unsigned char p[100];
int i = 0;
while (s) {
p[i++] = '0'+(s%10);
s/=10;
}
write(1, "sig ", 4);
for (i--; i>=0; i--) {
write(1, &p[i], 1);
}
write(1, "\n", 1);
}
And strace:
strace:
execve("./accept", ["./accept"], [/* 57 vars */]) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(3, 5) = 0
rt_sigaction(SIGQUIT, {0x400994, [], SA_RESTORER, 0x30b7e329a0}, NULL, 8) = 0
accept(3, 0x7fffb626be90, [16]) = ? ERESTARTSYS (To be restarted)
--- SIGQUIT (Quit) @ 0 (0) ---
rt_sigaction(SIGQUIT, {0x400994, [], SA_RESTORER, 0x30b7e329a0}, NULL, 8) = 0
write(1, "sig ", 4) = 4
write(1, "3", 13) = 1
write(1, "\n", 1) = 1
rt_sigreturn(0x1) = -1 EINTR (Interrupted system call)
write(1, "accept restarted\n", 17) = 17
accept(3,
#2
1
Within Unix Network Programming book, there is a section which says:
在Unix网络编程书籍中,有一节说:
We used the term "slow system call" to describe
accept
, and we use this term for any system call that can block forever. That is, the system call need never return. Most networking functions fall into this category. For example, there is no guarantee that a server's call toaccept
will ever return, if there are no clients that will connect to the server. Similarly, our server's call toread
in Figure 5.3 will never return if the client never sends a line for the server to echo. Other examples of slow system calls are reads and writes of pipes and terminal devices. A notable exception is disk I/O, which usually returns to the caller (assuming no catastrophic hardware failure).我们使用术语“慢系统调用”来描述accept,我们使用这个术语来描述任何可能永久阻塞的系统调用。也就是说,系统调用不需要返回。大多数联网功能都属于这一类。例如,如果没有连接到服务器的客户端,则不能保证服务器的accept调用将永远返回。类似地,如果客户端从不向服务器发送要回显的行,则服务器对图5.3中读取的调用将永远不会返回。其他慢系统调用的例子是管道和终端设备的读写。一个值得注意的例外是磁盘I/O,它通常返回给调用者(假设没有灾难性的硬件故障)。
The basic rule that applies here is that when a process is blocked in a slow system call and the process catches a signal and the signal handler returns, the system call can return an error of
EINTR
. Some kernels automatically restart some interrupted system calls. For portability, when we write a program that catches signals (most concurrent servers catchSIGCHLD
), we must be prepared for slow system calls to returnEINTR
. Portability problems are caused by the qualifiers "can" and "some," which were used earlier, and the fact that support for the POSIXSA_RESTART
flag is optional. Even if an implementation supports theSA_RESTART
flag, not all interrupted system calls may automatically be restarted. Most Berkeley-derived implementations, for example, never automatically restart select, and some of these implementations never restartaccept
orrecvfrom
.这里应用的基本规则是,当一个进程在缓慢的系统调用中被阻塞,并且该进程捕获一个信号并返回信号处理程序时,系统调用可以返回一个EINTR错误。一些内核会自动重启一些中断的系统调用。对于可移植性,当我们编写一个捕获信号的程序(大多数并发服务器捕获SIGCHLD)时,我们必须准备好响应缓慢的系统调用以返回EINTR。可移植性问题是由前面使用的限定符“can”和“some”引起的,并且支持POSIX SA_RESTART标志是可选的。即使实现支持SA_RESTART标志,也不能自动重启所有中断的系统调用。例如,大多数基于berkeley的实现都不会自动重启select,其中一些实现永远不会重新启动accept或recvfrom。
#1
1
Just when I was about to post this, the "SA_RESTART" flag in strace output caught my attention. signal(2) man page says that signal() "...calls sigaction(2) using flags that supply BSD semantics..." starting from glibc2.
就在我准备发布这篇文章时,strace输出中的“SA_RESTART”标志引起了我的注意。信号(2)man page表示信号()“…从glibc2开始,使用提供BSD语义的标志调用sigaction(2)。
The SA_RESTART flag "...makes certain system calls restartable across signals...", which hides the process of restarting a call from the user. So, this is not specific to accept(), a number of other system calls are also affected, not that there is a clear list of which ones.
SA_RESTART国旗”…使某些系统调用可以跨信号重新启动……,它隐藏了重新启动用户调用的过程。因此,这并不是特定于accept()的,其他一些系统调用也会受到影响,而不是有一个清晰的列表列出哪些调用。
So, if you need to react to a signal from a thread that may be blocked on a system call, you should use sigaction() to set your signal handlers, and not signal(). Below is the modified sample program that does exactly that, for reference.
因此,如果您需要对来自可能在系统调用中被阻塞的线程的信号作出响应,您应该使用sigaction()来设置信号处理程序,而不是signal()。下面是修改后的示例程序,它正是这样做的,以供参考。
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <signal.h>
#include <errno.h>
#include <arpa/inet.h>
#include <string.h>
static void sigh(int);
static struct sigaction sa;
int main(int argc, char ** argv) {
int s;
struct sockaddr_in sin;
if ((s = socket(AF_INET, SOCK_STREAM, 0))<0) {
perror("socket");
return 1;
}
memset(&sin, 0, sizeof(struct sockaddr_in));
sin.sin_family = AF_INET;
if (bind(s, (struct sockaddr*)&sin, sizeof(struct sockaddr_in))) {
perror("bind");
return 1;
}
if (listen(s, 5)) {
perror("listen");
}
memset(&sa, 0, sizeof(struct sigaction));
sa.sa_handler = sigh;
sigemptyset(&sa.sa_mask);
sigaction(SIGQUIT, &sa, 0);
while (1) {
socklen_t sl = sizeof(struct sockaddr_in);
int rc = accept(s, (struct sockaddr*)&sin, &sl);
if (rc<0) {
if (errno == EINTR) {
printf("accept restarted\n");
continue;
}
perror("accept");
return 1;
}
printf("accepted fd %d\n", rc);
close(rc);
}
}
void sigh(int s) {
sigaction(SIGQUIT, &sa, 0);
unsigned char p[100];
int i = 0;
while (s) {
p[i++] = '0'+(s%10);
s/=10;
}
write(1, "sig ", 4);
for (i--; i>=0; i--) {
write(1, &p[i], 1);
}
write(1, "\n", 1);
}
And strace:
strace:
execve("./accept", ["./accept"], [/* 57 vars */]) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(3, 5) = 0
rt_sigaction(SIGQUIT, {0x400994, [], SA_RESTORER, 0x30b7e329a0}, NULL, 8) = 0
accept(3, 0x7fffb626be90, [16]) = ? ERESTARTSYS (To be restarted)
--- SIGQUIT (Quit) @ 0 (0) ---
rt_sigaction(SIGQUIT, {0x400994, [], SA_RESTORER, 0x30b7e329a0}, NULL, 8) = 0
write(1, "sig ", 4) = 4
write(1, "3", 13) = 1
write(1, "\n", 1) = 1
rt_sigreturn(0x1) = -1 EINTR (Interrupted system call)
write(1, "accept restarted\n", 17) = 17
accept(3,
#2
1
Within Unix Network Programming book, there is a section which says:
在Unix网络编程书籍中,有一节说:
We used the term "slow system call" to describe
accept
, and we use this term for any system call that can block forever. That is, the system call need never return. Most networking functions fall into this category. For example, there is no guarantee that a server's call toaccept
will ever return, if there are no clients that will connect to the server. Similarly, our server's call toread
in Figure 5.3 will never return if the client never sends a line for the server to echo. Other examples of slow system calls are reads and writes of pipes and terminal devices. A notable exception is disk I/O, which usually returns to the caller (assuming no catastrophic hardware failure).我们使用术语“慢系统调用”来描述accept,我们使用这个术语来描述任何可能永久阻塞的系统调用。也就是说,系统调用不需要返回。大多数联网功能都属于这一类。例如,如果没有连接到服务器的客户端,则不能保证服务器的accept调用将永远返回。类似地,如果客户端从不向服务器发送要回显的行,则服务器对图5.3中读取的调用将永远不会返回。其他慢系统调用的例子是管道和终端设备的读写。一个值得注意的例外是磁盘I/O,它通常返回给调用者(假设没有灾难性的硬件故障)。
The basic rule that applies here is that when a process is blocked in a slow system call and the process catches a signal and the signal handler returns, the system call can return an error of
EINTR
. Some kernels automatically restart some interrupted system calls. For portability, when we write a program that catches signals (most concurrent servers catchSIGCHLD
), we must be prepared for slow system calls to returnEINTR
. Portability problems are caused by the qualifiers "can" and "some," which were used earlier, and the fact that support for the POSIXSA_RESTART
flag is optional. Even if an implementation supports theSA_RESTART
flag, not all interrupted system calls may automatically be restarted. Most Berkeley-derived implementations, for example, never automatically restart select, and some of these implementations never restartaccept
orrecvfrom
.这里应用的基本规则是,当一个进程在缓慢的系统调用中被阻塞,并且该进程捕获一个信号并返回信号处理程序时,系统调用可以返回一个EINTR错误。一些内核会自动重启一些中断的系统调用。对于可移植性,当我们编写一个捕获信号的程序(大多数并发服务器捕获SIGCHLD)时,我们必须准备好响应缓慢的系统调用以返回EINTR。可移植性问题是由前面使用的限定符“can”和“some”引起的,并且支持POSIX SA_RESTART标志是可选的。即使实现支持SA_RESTART标志,也不能自动重启所有中断的系统调用。例如,大多数基于berkeley的实现都不会自动重启select,其中一些实现永远不会重新启动accept或recvfrom。