信号之Signals综合分析(一)

1. 基本概念

Signals are software interrupts. Signals are classic examples of asynchronous events. They occur at what appear to be random times to the process. The process can’t simply test a variable (such as errno) to see whether a signal has occurred; instead, the process has to tell the kernel “if and when this signal occurs, do the following.”

Let the default action apply.

Catch the signal

Ignore the signal

　　以上的引文，都是非常重要的概念，但是特别强调catch the signal，这句话。我的理解：catch仅仅指的是有意识使用自定义函数对信号的捕获，不同于default的捕获。关于这点signal函数中，有所提到。

2. 重要API

singal:

#include <signal.h>
void (*signal(int signo, void (*func)(int)))(int);
Returns: previous disposition of signal if OK, SIG_ERR on error

　　如下是APUE书中，对于signal的实现：

#include "apue.h"
/* Reliable version of signal(), using POSIX sigaction(). */
Sigfunc * signal(int signo, Sigfunc *func)
{
struct sigaction act, oact;
        act.sa_handler = func;
        sigemptyset(&act.sa_mask);
        act.sa_flags = 0;
if (signo == SIGALRM) {
#ifdef SA_INTERRUPT
                act.sa_flags |= SA_INTERRUPT;
#endif
        }else {
                act.sa_flags |= SA_RESTART;
        }
if (sigaction(signo, &act, &oact) < 0)
return(SIG_ERR);
return(oact.sa_handler);
}

The prototype for the signal function states that the function requires two arguments and returns a pointer to a function that returns nothing (void).

从语法上说，上面的signal原型返回的是一个指向函数的指针，所以具体看以上的实现中return(oact.sa_handler)也确实是原来的handler函数。（ocat =original action）
func的参数可以被二个宏定义取代： SIG_IGN、SIG_DFL，同时返回的value值可以被：SIG_ERR取代。从C的语法分析，这三个也就是个宏定义：

#define SIG_ERR (void (*)())-1
#define SIG_DFL (void (*)())0
#define SIG_IGN (void (*)())1

When we send the SIGTERM signal, the process is terminated, since it doesn’t catch the signal, and the default action for the signal is termination

kill raise

#include <signal.h>
int kill(pid_t pid, int signo);
int raise(int signo);

　　这里唯一需要关注的点就是kill信号能发给谁：

pid > 0 The signal is sent to the process whose process ID is pid.

pid == 0 The signal is sent to all processes whose process group ID equals the process group ID of the sender and for which the sender has permission to send the signal.

　　这样的设计是遵从一个准则：

real or effective user ID of the sender has to equal the real or effective user ID of the receiver.

　　Kill发送的信号，正好是当前进程所unblock的信号，那么此时会在kill函数返回前，执行信号的处理函数。

alarm and pause

include <unistd.h>
unsigned int alarm(unsigned int seconds);
Returns: 0 or number of seconds until previously set alarm

when we call alarm, a previously registered alarm clock for the process has not yet expired, the number of seconds left for that alarm clock is returned as the value of this function. That previously registered alarm clock is replaced by the new value.

If a previously registered alarm clock for the process has not yet expired and if the seconds value is 0, the previous alarm clock is canceled. The number of seconds left for that previous alarm clock is still returned as the value of the function.

　　使用alarm的时候要小心：如果我们准备catch这个alarm所产生的信号，那么一定应该是设置handler在前，之后再call alarm函数。否则因为调度原因，程序并不一定总是有效的。

#include <unistd.h>
int pause(void);
Returns: −1 with errno set to EINTR

　　pasue设计的本应该就是被信号打断的，所以检查其返回值为-1，应该是设计者当做正常对待的事情。 APUE上列举了使用alarm pause实现sleep函数的例子，但是作为一个可以经得起各种情况推敲的函数，设计的时候却不是那么简单，总结下要考虑一下几点：

因为使用alarm所以我们需考虑是否会和函数外部的alarm冲突，并充分考虑避免冲突的方式。
在处理alarm信号的时候，覆盖函数外部的alarm信号处理函数，这点要注意解决。
还是可能在alarm()与pasue()之间发生调度，从而使pasue()卡死。

　　对于alarm的特性，处理一些具有阻塞性质的actions，比如读一些很慢的设备等等，但是这样的程序设计也是需要考虑多个陷阱。如果使用longjmp实现，就必需考虑到longjmp的弊端：

对栈的破坏，准确说是一些automatic variable和一些register variable可能要回滚以前的值。
无法避免在信号处理套嵌中，abort前一层信号处理函数。

Signal Sets

#include <signal.h>
int sigemptyset(sigset_t *set);
int sigfillset(sigset_t *set);
int sigaddset(sigset_t *set, int signo);
int sigdelset(sigset_t *set, int signo);
All four return: 0 if OK, −1 on error
int sigismember(const sigset_t *set, int signo);
Returns: 1 if true, 0 if false, −1 on error

Sigprocmask

int sigprocmask(int how, const sigset_t *restrict set, sigset_t *restrict oset);
Returns: 0 if OK, −1 on error

　　oset返回旧的值，set是要设置的新值，how决定如何对待这些新值：

SIG_BLOCK：参数set与oset进行or，所得到的新set值，置位为1的信号都会被block。
SIG_UNBLOCK:参数set与oset的补集进行and，说白了就是参数set指定的信号都会被放行。
SIG_SETMASK：直接按照参数set来进行mask值的设定。

NOTE：当sigprocmask放行一些pending的信号时，在执行该函数期间就会触发信号处理动作。

Sigpending Function

#include <signal.h>
int sigpending(sigset_t *set);
Returns: 0 if OK, −1 on error

　　仅仅是返回当前pending的信号，所以在判断当前被block的信号时非常有用。

Sigaction Function

#include <signal.h>
int sigaction(int signo, const struct sigaction *restrict act, struct sigaction *restrict oact);
Returns: 0 if OK, −1 on error

struct sigaction {
void (*sa_handler)(int); /* addr of signal handler, */
/* or SIG_IGN, or SIG_DFL */
        sigset_t sa_mask; /* additional signals to block */
int sa_flags; /* signal options, Figure 10.16 */
/* alternate handler */
void (*sa_sigaction)(int, siginfo_t *, void *);
};

两个handler,分别为sa_handler,和sa_sigaction,后者参数更多，使用哪个由flag是否设置SA_SIGINFO决定。
sa_mask如同上面的comment所说，增加额外的block信号位，但是这种改变是临时性的，随着handler的返回，当前process的mask值恢复原值。
当调用handler时，该触发信号自动被屏蔽，所以不会出现多次信号累积，这点依赖于操作系统。

Sigsetjmp and Siglongjmp

#include <setjmp.h>
int sigsetjmp(sigjmp_buf env, int savemask);
Returns: 0 if called directly, nonzero if returning from a call to siglongjmp
void siglongjmp(sigjmp_buf env, int val);

　　如果从一个handler调用longjmp，因为没有正常返回handler,而是直接到main中，其上述的系统自动屏蔽信号后的mask之后的值将无法确定！所以现在增加对process mask的存储和恢复，当savemask不为0，第一次调用sigsetjmp将存储当前process mask到env，之后随着调用siglongjmp恢复之前保存的值。

Sigsuspend

#include <signal.h>
int sigsuspend(const sigset_t *sigmask);
Returns: −1 with errno set to EINTR

we need a way to both restore the signal mask and put the process to sleep in a single atomic operation

　　对以下的代码进行分析：

/* block SIGINT and save current signal mask */
if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)
        err_sys("SIG_BLOCK error");
/* critical region of code */
/* restore signal mask, which unblocks SIGINT */
if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)
        err_sys("SIG_SETMASK error");
/* window is open */
pause(); /* wait for signal to occur */

　　如果要消除窗口时间，那么需要一个原子操作，所以sigsuspend操作应运而生。如果用该函数实现上面的操作就应该如下面修改：

/* block SIGINT and save current signal mask */
if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)
        err_sys("SIG_BLOCK error");
/* critical region of code */
if(sigsuspend(&oldmask)!= -1)
        err_sys("SIG_SUSPEND ERROR");
if(sigprocmask(SIG_SETMASK,&oldmask,NULL)< 0)
        err_sys("SIG_SETMASK error");

　　以上直接简单的调了个顺序，但是如果你期待的信号不发生的话，那么程序将block下去，同时sigsupend 和sigaction对进程当前的sig_mask的改变都是暂时性的，当该函数返回时sig_mask 恢复原来的值。

Abort

include <stdlib.h>
void abort(void);
This function never returns

　　实际上MAN手册这里的描述比较好，以下总结源自MAN手册^[1]：

abort()函数首先unblock SIGARBR，然后再raise这个信号给调用函数。
当显示指定catch function或者忽略时，当执行handler function返回的时候依然终止程序，除非你用上面提到过的longjmp。
abort()会自动关闭打开的流，
abort()永远不会return。

System

parent should be blocked while the system function is executing. Indeed, this is what POSIX.1 specifies. Otherwise, when the child created by system terminates, it would fool the caller of system into thinking that one of its own children terminated.

　　那么对于在system函数执行期间，屏蔽INT和QUIT的原因，原文这样给出：

Since the command that is executed by system can be an interactive command (as is the ed program in this example) and since the caller of system gives up control while the program executes, waiting for it to finish, the caller of system should not be receiving these two terminal-generated signals.

　　因为job-control的特性，一个进程调用system函数，产生的子进程和原本的进程同属于前台工作组，可以接收来自终端的信号。

int system(const char *cmdstring) /* with appropriate signal handling */
{
        pid_t pid;
int status;
struct sigaction ignore, saveintr, savequit;
        sigset_t chldmask, savemask;
if (cmdstring == NULL)
return(1); /* always a command processor with UNIX */
        ignore.sa_handler = SIG_IGN; /* ignore SIGINT and SIGQUIT */
        sigemptyset(&ignore.sa_mask);
        ignore.sa_flags = 0;
if (sigaction(SIGINT, &ignore, &saveintr) < 0)
return(-1);
if (sigaction(SIGQUIT, &ignore, &savequit) < 0)
return(-1);
        sigemptyset(&chldmask); /* now block SIGCHLD */
        sigaddset(&chldmask, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &chldmask, &savemask) < 0)
return(-1);
if ((pid = fork()) < 0) {
                status = -1; /* probably out of processes */
        } else if (pid == 0) { /* child */
/* restore previous signal actions & reset signal mask */
                sigaction(SIGINT, &saveintr, NULL);
                sigaction(SIGQUIT, &savequit, NULL);
                sigprocmask(SIG_SETMASK, &savemask, NULL);
                execl("/bin/sh", "sh", "-c", cmdstring, (char *)0);
                _exit(127); /* exec error */
        } else { /* parent */
while (waitpid(pid, &status, 0) < 0)
if (errno != EINTR) {
                                status = -1; /* error other than EINTR from waitpid() */
break;
                        }
        }
/* restore previous signal actions & reset signal mask */
if (sigaction(SIGINT, &saveintr, NULL) < 0)
return(-1);
if (sigaction(SIGQUIT, &savequit, NULL) < 0)
return(-1);
if (sigprocmask(SIG_SETMASK, &savemask, NULL) < 0)
return(-1);
return(status);
}

　　waitpid所等待的进程应该是/bin/sh，另一方面，可以断定waitpid的实现不依赖于SIGCHILD，特别的execl调用的时候，到底继承了原来进程的那些资源需要注意。

Sleep

#include <unistd.h>
unsigned int sleep(unsigned int seconds);
Returns: 0 or number of unslept seconds

#include <time.h>
int clock_nanosleep(clockid_t clock_id, int flags,
const struct timespec *reqtp, struct timespec *remtp);
Returns: 0 if slept for requested time or error number on failure

#include <time.h>
int nanosleep(const struct timespec *reqtp, struct timespec *remtp);
Returns: 0 if slept for requested time or −1 on error

sleep如果使用alarm实现的话，要考虑本文最前面alarm部分的思考。
linux使用nanosleep实现sleep，而其并不产生任何信号，所以无需担心会和其他函数发送交叉。
使用绝对时间，会提高精确度，这话是指着调度说的，也就是说因为频繁调度，而让实时任务的时间要求不能满足。

Sigqueue

#include <signal.h>
int sigqueue(pid_t pid, int signo, const union sigval value)
Returns: 0 if OK, −1 on error

　　有的场合，信号的发生的次数是不可忽略的。此时有必要将信号以队列的形式存储起来！linux中对该特性的支持情况：

supports sigqueue

queues signals even if the caller doesn’t use the SA_SIGINFO flag

3. 程序设计时的重要概念

3.1 fork

When a process calls fork, the child inherits the parent’s signal dispositions. Here, since the child starts off with a copy of the parent’s memory image, the address of a signal-catching function has meaning in the child.

　　这意味着，如果在fork之前的对于signal的一些处理，将被子进程所继承。这样的继承，是源自于fork的实现原理。

3.2 Interrupted System Calls

　　一般来说，不希望system calls被信号所中断，但是因为一些系统调用可能让程序处于无限期的block状态，这种时候该特性就非常实用了。
　　作为事情的两面性，当我们可以在设计程序时考虑到这点，有针对的去中断一些系统调用，但是我们又必须对中断系统调用所返回的值做处理，这无疑增加了写应用时的设计工作量。
　　所以automatic restart就被引入，但是并非所有的场合，都希望被中断的系统调用还要继续重新发起，所以一些设计就又出现了：

On FreeBSD 8.0, Linux 3.2.0, and Mac OS X 10.6.8, when signal handlers are installed with the signal function, interrupted system calls will be restarted.

　　目前掌握的信息为：

signal默认支持automatic restart特性
sigaction 作为补充对于automaic restart特性采取可选方式，默认不开启。

3.3 Reentrant Functions

　　正在执行一些函数的时候，信号处理函数很可能被调用，这一般来说不会造成什么问题。但是如果此时在信号处理函数中又调用了被打断的函数，这种时候问题可能就产生了。针对这种情况，有两种方式解决这种问题：

使该函数reentrant
做一些信号屏蔽工作

　　如果我们在信号处理函数中改变了一些global value，那么将覆盖进入信号处理函数的时候原来的值。所以一种方式就是在进入信号处理函数时保存这些变量值，并在退出信号处理函数之前，恢复原来的值:

Therefore, as a general rule, when calling the functions listed in Figure 10.4 from a signal handler, we should save and restore errno.

3.4 SIGCLD Semantics

首先看看两个定义：Zombie Process与SIGCHLD

UNIX System terminology, a process that has terminated, but whose parent has not yet waited for it, is called a zombie.

SIGCHLD : Whenever a process terminates or stops, the SIGCHLD signal is sent to the parent. By default, this signal is ignored, so the parent must catch this signal if it wants to be notified whenever a child’s status changes. The normal action in the signal-catching function is to call one of the wait functions to fetch the child’s process ID and termination status.

3.5 Signal process

　　APUE中的关键部分：

First, a signal is generated for a process (or sent to a process) when the event that causes the signal occurs.

When the signal is generated, the kernel usually sets a flag of some form in the process table.

We say that a signal is delivered to a process when the action for a signal is taken. During the time between the generation of a signal and its delivery, the signal is said to be pending

A process has the option of blocking the delivery of a signal. If a signal that is blocked is generated for a process, and if the action for that signal is either the default action or to catch the signal, then the signal remains pending for the process until the process either (a) unblocks the signal or (b) changes the action to ignore the signal.

The system determines what to do with a blocked signal when the signal is delivered, not when it’s generated. This allows the process to change the action for the signal before it’s delivered.

Each process has a signal mask that defines the set of signals currently blocked from delivery to that process. Each process has a signal mask that defines the set of signals currently blocked from delivery to that process.

　　一个事件发生的时候，信号就会产生，之后内核会对该进程置位信号标志，delivery用来描述一个信号被处理的时刻，而在信号产生到信号被处理前信号被称为pending。 block一个信号，代表在其采取动作之前对其进行遮掩，无论你处理信号的方式是什么，pending flag被清除的时刻就是take action的时候，系统清除flag标记。

4. 实验代码

实验一

题目：wait系列函数和SIGCHILD的关系？

实验二

题目：The process creates a file and writes the integer 0 to the file. The process then calls fork, and the parent and child alternate incrementing the counter in the file. Each time the counter is incremented, print which process (parent or child) is doing the increment.

实验三

题目：Write a program that calls fwrite with a large buffer (about one gigabyte). Before calling fwrite, call alarm to schedule a signal in 1 second. In your signal handler, print that the signal was caught and return. Does the call to fwrite complete? What’s happening?

实验四

题目：Modify Figure 3.5 as follows: (a) change BUFFSIZE to 100; (b) catch the SIGXFSZ signal using the signal_intr function, printing a message when it’s caught, and returning from the signal handler; and (c) print the return value from write if the requested number of bytes wasn’t written. Modify the soft RLIMIT_FSIZE resource limit (Section 7.11) to 1,024 bytes and run your new program, copying a file that is larger than 1,024 bytes. (Try to set the soft resource limit from your shell. If you can’t do this from your shell, call setrlimit directly from the program.)

参考文献

[1] abort : https://linux.die.net/man/3/abort

秒客网