Context:
I have a linux[1] system that manages a series of third party daemon's with which interactions are limited to shell[2] init scripts, i.e. only {start|restart|stop|status} are available.
我有一个linux [1]系统管理一系列第三方守护进程,其中交互仅限于shell [2] init脚本,即只有{start | restart | stop | status}可用。
Problem:
Processes can assume the PID of a previously running process, the status of processes are checked by inspecting the presence of a running processes with it's PID.
进程可以假定先前运行的进程的PID,通过检查正在运行的进程的PID来检查进程的状态。
Example:
Process A run's with PID 123, subsequently dies, process B initialises with PID 123 and the status command responds with an unauthentic (erroneous) "OK". In other words, we only check for the presence of a process from its PID to validate that the process is running, we assume that should a process with this PID exist, it is the process in question.
进程A使用PID 123运行,随后死亡,进程B使用PID 123进行初始化,状态命令以非真实(错误)“OK”响应。换句话说,我们只检查来自其PID的进程是否存在以验证进程是否正在运行,我们假设如果存在具有此PID的进程,则它是有问题的进程。
Proposed solutions:
- Interrogate the process, using the PID, to ensure the command/daemon running as that PID is as expected. The problem with this solution is that both the command and PID need to match; multiple bits of information thus need to be maintained and kept in sync, and add addition complexity to error/edge conditions.
- Correlate the creation time of the PID file with the start time of the process, if the process is within a certain delta of the PID file creation time, we can be fairly certain that the command/daemon running is as expected.
使用PID查询进程,以确保命令/守护程序运行,因为PID是预期的。这个解决方案的问题是命令和PID都需要匹配;因此,需要维护多位信息并保持同步,并且增加了错误/边缘条件的附加复杂性。
将PID文件的创建时间与进程的开始时间相关联,如果进程在PID文件创建时间的某个增量内,我们可以相当确定命令/守护进程是否按预期运行。
Is there a standard way to ratify the authenticity of a process/PID file, beyond presence of a process running with that PID? I.e. I (as the system) want to know if you (the process) are running and if you are who I think you are (A and not B).
有没有一种标准方法可以批准进程/ PID文件的真实性,除了存在使用该PID运行的进程之外?即我(作为系统)想要知道你(过程)是否正在运行,以及你是否是我认为你是谁(A而不是B)。
Assuming we have elected to implement the second solution proposed above, what confidence interval/delta between the PID creation time and process start time is reasonable? Here, reasonable means acceptable compromise between type 1 / type 2 errors.
假设我们选择实施上面提出的第二个解决方案,PID创建时间和过程开始时间之间的置信区间/增量是否合理?这里,合理意味着类型1 /类型2错误之间可接受的折衷。
[1] CentOS/RHEL [2] Bash
[1] CentOS / RHEL [2] Bash
2 个解决方案
#1
5
The content of the file:
文件内容:
/proc/{PID}/cmdline
is the command line used to start the process. Is that what you need?
是用于启动进程的命令行。这就是你需要的吗?
#2
0
My solution was to capture the command (via /proc/PID/cmdline
) along with the relative start time. Using the absolute start time (via ps -p PID -o lstart=
) might appear to work, but you'll get confusing results if your system clock changes (e.g. from an NTP update, or Daylight Savings).
我的解决方案是捕获命令(通过/ proc / PID / cmdline)以及相对的开始时间。使用绝对开始时间(通过ps -p PID -o lstart =)似乎可行,但如果系统时钟发生变化(例如,从NTP更新或夏令时),您将得到令人困惑的结果。
Here's my implementation:
这是我的实现:
# Prints enough detail to confirm a PID still refers to the same process.
# In other words, even if a PID is recycled by a call to the same process the
# output of this command should still be different. This is not guaranteed
# across reboots.
proc_detail() {
local pid=${1:?Must specify PID}
# the process' commandline, if it's running
# ensures a non-existant PID will never have the same output as a running
# process, and helps debugging
cat "/proc/$pid/cmdline" 2> /dev/null && echo
# this is the number of seconds after boot that the process started
# https://unix.stackexchange.com/a/274722/19157
# in theory this could collide if the same process were restarted in the same
# second and assigned the same PID, but PIDs are assigned in order so this
# seems acceptably unlikely for now.
echo "$(($(cut -d. -f1 < /proc/uptime) - \
$(ps -p "$pid" -o etimes= 2> /dev/null || echo "0")))"
}
I also decided to store this output in /dev/shm
so that it's cleared automatically for me on shutdown. There are other viable options (such as a @reboot
cronjob) but for my use case writing to a tmpfs
was easy and clean.
我还决定将此输出存储在/ dev / shm中,以便在关闭时自动清除它。还有其他可行的选项(例如@reboot cronjob)但是对于我的用例,写入tmpfs非常简单和干净。
#1
5
The content of the file:
文件内容:
/proc/{PID}/cmdline
is the command line used to start the process. Is that what you need?
是用于启动进程的命令行。这就是你需要的吗?
#2
0
My solution was to capture the command (via /proc/PID/cmdline
) along with the relative start time. Using the absolute start time (via ps -p PID -o lstart=
) might appear to work, but you'll get confusing results if your system clock changes (e.g. from an NTP update, or Daylight Savings).
我的解决方案是捕获命令(通过/ proc / PID / cmdline)以及相对的开始时间。使用绝对开始时间(通过ps -p PID -o lstart =)似乎可行,但如果系统时钟发生变化(例如,从NTP更新或夏令时),您将得到令人困惑的结果。
Here's my implementation:
这是我的实现:
# Prints enough detail to confirm a PID still refers to the same process.
# In other words, even if a PID is recycled by a call to the same process the
# output of this command should still be different. This is not guaranteed
# across reboots.
proc_detail() {
local pid=${1:?Must specify PID}
# the process' commandline, if it's running
# ensures a non-existant PID will never have the same output as a running
# process, and helps debugging
cat "/proc/$pid/cmdline" 2> /dev/null && echo
# this is the number of seconds after boot that the process started
# https://unix.stackexchange.com/a/274722/19157
# in theory this could collide if the same process were restarted in the same
# second and assigned the same PID, but PIDs are assigned in order so this
# seems acceptably unlikely for now.
echo "$(($(cut -d. -f1 < /proc/uptime) - \
$(ps -p "$pid" -o etimes= 2> /dev/null || echo "0")))"
}
I also decided to store this output in /dev/shm
so that it's cleared automatically for me on shutdown. There are other viable options (such as a @reboot
cronjob) but for my use case writing to a tmpfs
was easy and clean.
我还决定将此输出存储在/ dev / shm中,以便在关闭时自动清除它。还有其他可行的选项(例如@reboot cronjob)但是对于我的用例,写入tmpfs非常简单和干净。