如何确保正在运行的进程是我期望运行的进程？

Context:

I have a linux[1] system that manages a series of third party daemon's with which interactions are limited to shell[2] init scripts, i.e. only {start|restart|stop|status} are available.

我有一个linux [1]系统管理一系列第三方守护进程,其中交互仅限于shell [2] init脚本,即只有{start | restart | stop | status}可用。

Problem:

Processes can assume the PID of a previously running process, the status of processes are checked by inspecting the presence of a running processes with it's PID.

进程可以假定先前运行的进程的PID,通过检查正在运行的进程的PID来检查进程的状态。

Example:

Process A run's with PID 123, subsequently dies, process B initialises with PID 123 and the status command responds with an unauthentic (erroneous) "OK". In other words, we only check for the presence of a process from its PID to validate that the process is running, we assume that should a process with this PID exist, it is the process in question.

进程A使用PID 123运行,随后死亡,进程B使用PID 123进行初始化,状态命令以非真实(错误)“OK”响应。换句话说,我们只检查来自其PID的进程是否存在以验证进程是否正在运行,我们假设如果存在具有此PID的进程,则它是有问题的进程。

Proposed solutions:

Interrogate the process, using the PID, to ensure the command/daemon running as that PID is as expected. The problem with this solution is that both the command and PID need to match; multiple bits of information thus need to be maintained and kept in sync, and add addition complexity to error/edge conditions.

使用PID查询进程,以确保命令/守护程序运行,因为PID是预期的。这个解决方案的问题是命令和PID都需要匹配;因此,需要维护多位信息并保持同步,并且增加了错误/边缘条件的附加复杂性。

Correlate the creation time of the PID file with the start time of the process, if the process is within a certain delta of the PID file creation time, we can be fairly certain that the command/daemon running is as expected.

将PID文件的创建时间与进程的开始时间相关联,如果进程在PID文件创建时间的某个增量内,我们可以相当确定命令/守护进程是否按预期运行。

Is there a standard way to ratify the authenticity of a process/PID file, beyond presence of a process running with that PID? I.e. I (as the system) want to know if you (the process) are running and if you are who I think you are (A and not B).

有没有一种标准方法可以批准进程/ PID文件的真实性,除了存在使用该PID运行的进程之外?即我(作为系统)想要知道你(过程)是否正在运行,以及你是否是我认为你是谁(A而不是B)。

Assuming we have elected to implement the second solution proposed above, what confidence interval/delta between the PID creation time and process start time is reasonable? Here, reasonable means acceptable compromise between type 1 / type 2 errors.

假设我们选择实施上面提出的第二个解决方案,PID创建时间和过程开始时间之间的置信区间/增量是否合理?这里,合理意味着类型1 /类型2错误之间可接受的折衷。

[1] CentOS/RHEL [2] Bash

[1] CentOS / RHEL [2] Bash

2 个解决方案

#1

The content of the file:

文件内容:

/proc/{PID}/cmdline

is the command line used to start the process. Is that what you need?

是用于启动进程的命令行。这就是你需要的吗?

#2

My solution was to capture the command (via /proc/PID/cmdline) along with the relative start time. Using the absolute start time (via ps -p PID -o lstart=) might appear to work, but you'll get confusing results if your system clock changes (e.g. from an NTP update, or Daylight Savings).

我的解决方案是捕获命令(通过/ proc / PID / cmdline)以及相对的开始时间。使用绝对开始时间(通过ps -p PID -o lstart =)似乎可行,但如果系统时钟发生变化(例如,从NTP更新或夏令时),您将得到令人困惑的结果。

Here's my implementation:

这是我的实现:

# Prints enough detail to confirm a PID still refers to the same process.
# In other words, even if a PID is recycled by a call to the same process the
# output of this command should still be different. This is not guaranteed
# across reboots.
proc_detail() {
  local pid=${1:?Must specify PID}
  # the process' commandline, if it's running
  # ensures a non-existant PID will never have the same output as a running
  # process, and helps debugging
  cat "/proc/$pid/cmdline" 2> /dev/null && echo
  # this is the number of seconds after boot that the process started
  # https://unix.stackexchange.com/a/274722/19157
  # in theory this could collide if the same process were restarted in the same
  # second and assigned the same PID, but PIDs are assigned in order so this
  # seems acceptably unlikely for now.
  echo "$(($(cut -d. -f1 < /proc/uptime) - \
           $(ps -p "$pid" -o etimes= 2> /dev/null || echo "0")))"
}

I also decided to store this output in /dev/shm so that it's cleared automatically for me on shutdown. There are other viable options (such as a @reboot cronjob) but for my use case writing to a tmpfs was easy and clean.

我还决定将此输出存储在/ dev / shm中,以便在关闭时自动清除它。还有其他可行的选项(例如@reboot cronjob)但是对于我的用例,写入tmpfs非常简单和干净。

#1