如何运行长期(无限)Python进程?

时间:2020-12-05 20:55:24

I've recently started experimenting with using Python for web development. So far I've had some success using Apache with mod_wsgi and the Django web framework for Python 2.7. However I have run into some issues with having processes constantly running, updating information and such.

我最近开始尝试使用Python进行Web开发。到目前为止,我已经使用Apache与mod_wsgi和用于Python 2.7的Django Web框架取得了一些成功。但是,我遇到了一些问题,即流程不断运行,更新信息等等。

I have written a script I call "daemonManager.py" that can start and stop all or individual python update loops (Should I call them Daemons?). It does that by forking, then loading the module for the specific functions it should run and starting an infinite loop. It saves a PID file in /var/run to keep track of the process. So far so good. The problems I've encountered are:

我编写了一个脚本,我称之为“daemonManager.py”,可以启动和停止所有或单个python更新循环(我应该称之为守护进程吗?)。它通过分叉,然后为应该运行的特定函数加载模块并启动无限循环来实现。它将一个PID文件保存在/ var / run中以跟踪进程。到现在为止还挺好。我遇到的问题是:

  • Now and then one of the processes will just quit. I check ps in the morning and the process is just gone. No errors were logged (I'm using the logging module), and I'm covering every exception I can think of and logging them. Also I don't think these quitting processes has anything to do with my code, because all my processes run completely different code and exit at pretty similar intervals. I could be wrong of course. Is it normal for Python processes to just die after they've run for days/weeks? How should I tackle this problem? Should I write another daemon that periodically checks if the other daemons are still running? What if that daemon stops? I'm at a loss on how to handle this.

    偶尔会有一个进程退出。我早上检查ps,过程刚刚结束。没有记录错误(我正在使用日志记录模块),我正在报道我能想到的每个异常并记录它们。此外,我不认为这些退出过程与我的代码有任何关系,因为我的所有进程都运行完全不同的代码并以非常相似的间隔退出。我当然错了。 Python进程在运行数天/数周后死亡是否正常?我该如何解决这个问题?我应该编写另一个守护进程来定期检查其他守护进程是否仍在运行吗?如果守护进程停止怎么办?我对如何处理这个问题感到很茫然。

  • How can I programmatically know if a process is still running or not? I'm saving the PID files in /var/run and checking if the PID file is there to determine whether or not the process is running. But if the process just dies of unexpected causes, the PID file will remain. I therefore have to delete these files every time a process crashes (a couple of times per week), which sort of defeats the purpose. I guess I could check if a process is running at the PID in the file, but what if another process has started and was assigned the PID of the dead process? My daemon would think that the process is running fine even if it's long dead. Again I'm at a loss just how to deal with this.

    如何以编程方式知道进程是否仍在运行?我将PID文件保存在/ var / run中,并检查PID文件是否在那里以确定进程是否正在运行。但是如果进程刚刚因意外原因而死,则PID文件将保留。因此,每次进程崩溃(每周几次)时,我都必须删除这些文件,这会破坏目的。我想我可以检查一个进程是否正在文件中的PID上运行,但如果另一个进程已启动并被分配了死进程的PID怎么办?我的守护进程会认为这个进程运行正常,即使它已经很久了。我再次对如何应对这种情况感到茫然。

Any useful answer on how to best run infinite Python processes, hopefully also shedding some light on the above problems, I will accept

关于如何最好地运行无限Python进程的任何有用的答案,希望也能解释上述问题,我会接受


I'm using Apache 2.2.14 on an Ubuntu machine.
My Python version is 2.7.2

我在Ubuntu机器上使用Apache 2.2.14。我的Python版本是2.7.2

3 个解决方案

#1


24  

I'll open by stating that this is one way to manage a long running process (LRP) -- not de facto by any stretch.

我将通过声明这是管理长时间运行流程(LRP)的一种方式打开 - 事实上并非如此。

In my experience, the best possible product comes from concentrating on the specific problem you're dealing with, while delegating supporting tech to other libraries. In this case, I'm referring to the act of backgrounding processes (the art of the double fork), monitoring, and log redirection.

根据我的经验,最好的产品来自于专注于您正在处理的特定问题,同时将支持技术委托给其他图书馆。在这种情况下,我指的是后台进程(双叉的艺术),监视和日志重定向的行为。

My favorite solution is http://supervisord.org/

我最喜欢的解决方案是http://supervisord.org/

Using a system like supervisord, you basically write a conventional python script that performs a task while stuck in an "infinite" loop.

使用像supervisord这样的系统,你基本上可以编写一个传统的python脚本来执行任务,同时陷入“无限”循环。

#!/usr/bin/python

import sys
import time

def main_loop():
    while 1:
        # do your stuff...
        time.sleep(0.1)

if __name__ == '__main__':
    try:
        main_loop()
    except KeyboardInterrupt:
        print >> sys.stderr, '\nExiting by user request.\n'
        sys.exit(0)

Writing your script this way makes it simple and convenient to develop and debug (you can easily start/stop it in a terminal, watching the log output as events unfold). When it comes time to throw into production, you simply define a supervisor config that calls your script (here's the full example for defining a "program", much of which is optional: http://supervisord.org/configuration.html#program-x-section-example).

以这种方式编写脚本使得开发和调试变得简单方便(您可以在终端中轻松启动/停止它,在事件展开时观察日志输出)。当投入生产时,您只需定义一个调用脚本的超级用户配置(这里是定义“程序”的完整示例,其中大部分是可选的:http://supervisord.org/configuration.html#program -x截面-例子)。

Supervisor has a bunch of configuration options so I won't enumerate them, but I will say that it specifically solves the problems you describe:

Supervisor有很多配置选项,所以我不会枚举它们,但我会说它专门解决了你描述的问题:

  • Backgrounding/Daemonizing
  • PID tracking (can be configured to restart a process should it terminate unexpectedly)
  • PID跟踪(可配置为在意外终止时重启进程)

  • Log normally in your script (stream handler if using logging module rather than printing) but let supervisor redirect to a file for you.
  • 通常在您的脚本中记录(流处理程序,如果使用日志记录模块而不是打印),但让管理员重定向到您的文件。

#2


2  

I assume you are running Unix/Linux but you don't really say. I have no direct advice on your issue. So I don't expect to be the "right" answer to this question. But there is something to explore here.

我假设您正在运行Unix / Linux,但您并没有真正说出来。我对你的问题没有直接的建议。所以我不希望成为这个问题的“正确”答案。但这里有一些值得探索的东西。

First, if your daemons are crashing, you should fix that. Only programs with bugs should crash. Perhaps you should launch them under a debugger and see what happens when they crash (if that's possible). Do you have any trace logging in these processes? If not, add them. That might help diagnose your crash.

首先,如果你的守护进程崩溃,你应该解决这个问题。只有带错误的程序才会崩溃。也许您应该在调试器下启动它们,看看它们崩溃时会发生什么(如果可能的话)。您是否在这些流程中有任何跟踪记录?如果没有,请添加它们。这可能有助于诊断您的崩溃。

Second, are your daemons providing services (opening pipes and waiting for requests) or are they performing periodic cleanup? If they are periodic cleanup processes you should use cron to launch them periodically rather then have them run in an infinite loop. Cron processes should be preferred over daemon processes. Similarly, if they are services that open ports and service requests, have you considered making them work with INETD? Again, a single daemon (inetd) should be preferred to a bunch of daemon processes.

第二,你的守护进程是提供服务(打开管道和等待请求)还是他们执行定期清理?如果它们是定期清理过程,你应该使用cron定期启动它们,而不是让它们在无限循环中运行。 Cron进程应优先于守护进程。同样,如果它们是打开端口和服务请求的服务,您是否考虑过让它们与INETD一起使用?同样,单个守护进程(inetd)应该优先于一堆守护进程。

Third, saving a PID in a file is not very effective, as you've discovered. Perhaps a shared IPC, like a semaphore, would work better. I don't have any details here though.

第三,正如您所发现的那样,将PID保存在文件中并不是非常有效。也许共享的IPC,如信号量,会更好地工作。我虽然没有任何细节。

Fourth, sometimes I need stuff to run in the context of the website. I use a cron process that calls wget with a maintenance URL. You set a special cookie and include the cookie info in with wget command line. If the special cookie doesn't exist, return 403 rather than performing the maintenance process. The other benefit here is login to the database and other environmental concerns of avoided since the code that serves normal web pages are serving the maintenance process.

第四,有时我需要在网站的上下文中运行的东西。我使用一个使用维护URL调用wget的cron进程。您设置了一个特殊的cookie并使用wget命令行包含cookie信息。如果特殊cookie不存在,则返回403而不是执行维护过程。这里的另一个好处是登录数据库以及避免的其他环境问题,因为服务于普通网页的代码正在为维护过程提供服务。

Hope that gives you ideas. I think avoiding daemons if you can is the best place to start. If you can run your python within mod_wsgi that saves you having to support multiple "environments". Debugging a process that fails after running for days at a time is just brutal.

希望能给你一些想法。我认为如果可以的话,避免使用守护进程是最好的起点。如果你可以在mod_wsgi中运行你的python,这可以节省你必须支持多个“环境”。调试一次运行几天后失败的进程是非常残酷的。

#3


2  

You should consider Python processes as able to run "forever" assuming you don't have any memory leaks in your program, the Python interpreter, or any of the Python libraries / modules that you are using. (Even in the face of memory leaks, you might be able to run forever if you have sufficient swap space on a 64-bit machine. Decades, if not centuries, should be doable. I've had Python processes survive just fine for nearly two years on limited hardware -- before the hardware needed to be moved.)

假设您的程序,Python解释器或您正在使用的任何Python库/模块中没有任何内存泄漏,您应该认为Python进程能够“永远”运行。 (即使在内存泄漏的情况下,如果你在64位机器上有足够的交换空间,你也许可以永远运行。数十年,如果不是几个世纪,应该是可行的。我已经让Python进程幸存下来了。在硬件有限的两年 - 在需要移动硬件之前。)

Ensuring programs restart when they die used to be very simple back when Linux distributions used SysV-style init -- you just add a new line to the /etc/inittab and init(8) would spawn your program at boot and re-spawn it if it dies. (I know of no mechanism to replicate this functionality with the new upstart init-replacement that many distributions are using these days. I'm not saying it is impossible, I just don't know how to do it.)

当Linux发行版使用SysV样式的init时,确保程序在死时重新启动非常简单 - 只需在/ etc / inittab中添加一行新行,init(8)将在启动时生成程序并重新生成它如果它死了(我知道没有机制可以使用新的upstart init-replacement来复制这个功能,许多发行版现在正在使用它。我不是说这是不可能的,我只是不知道该怎么做。)

But even the init(8) mechanism of years gone by wasn't as flexible as some would have liked. The daemontools package by DJB is one example of process control-and-monitoring tools intended to keep daemons living forever. The Linux-HA suite provides another similar tool, though it might provide too much "extra" functionality to be justified for this task. monit is another option.

但即使是过去几年的init(8)机制也没有一些人所希望的那么灵活。 DJB的daemontools软件包是过程控制和监视工具的一个例子,旨在让守护进程永远存在。 Linux-HA套件提供了另一个类似的工具,但它可能提供太多的“额外”功能,无法为此任务辩护。 monit是另一种选择。

#1


24  

I'll open by stating that this is one way to manage a long running process (LRP) -- not de facto by any stretch.

我将通过声明这是管理长时间运行流程(LRP)的一种方式打开 - 事实上并非如此。

In my experience, the best possible product comes from concentrating on the specific problem you're dealing with, while delegating supporting tech to other libraries. In this case, I'm referring to the act of backgrounding processes (the art of the double fork), monitoring, and log redirection.

根据我的经验,最好的产品来自于专注于您正在处理的特定问题,同时将支持技术委托给其他图书馆。在这种情况下,我指的是后台进程(双叉的艺术),监视和日志重定向的行为。

My favorite solution is http://supervisord.org/

我最喜欢的解决方案是http://supervisord.org/

Using a system like supervisord, you basically write a conventional python script that performs a task while stuck in an "infinite" loop.

使用像supervisord这样的系统,你基本上可以编写一个传统的python脚本来执行任务,同时陷入“无限”循环。

#!/usr/bin/python

import sys
import time

def main_loop():
    while 1:
        # do your stuff...
        time.sleep(0.1)

if __name__ == '__main__':
    try:
        main_loop()
    except KeyboardInterrupt:
        print >> sys.stderr, '\nExiting by user request.\n'
        sys.exit(0)

Writing your script this way makes it simple and convenient to develop and debug (you can easily start/stop it in a terminal, watching the log output as events unfold). When it comes time to throw into production, you simply define a supervisor config that calls your script (here's the full example for defining a "program", much of which is optional: http://supervisord.org/configuration.html#program-x-section-example).

以这种方式编写脚本使得开发和调试变得简单方便(您可以在终端中轻松启动/停止它,在事件展开时观察日志输出)。当投入生产时,您只需定义一个调用脚本的超级用户配置(这里是定义“程序”的完整示例,其中大部分是可选的:http://supervisord.org/configuration.html#program -x截面-例子)。

Supervisor has a bunch of configuration options so I won't enumerate them, but I will say that it specifically solves the problems you describe:

Supervisor有很多配置选项,所以我不会枚举它们,但我会说它专门解决了你描述的问题:

  • Backgrounding/Daemonizing
  • PID tracking (can be configured to restart a process should it terminate unexpectedly)
  • PID跟踪(可配置为在意外终止时重启进程)

  • Log normally in your script (stream handler if using logging module rather than printing) but let supervisor redirect to a file for you.
  • 通常在您的脚本中记录(流处理程序,如果使用日志记录模块而不是打印),但让管理员重定向到您的文件。

#2


2  

I assume you are running Unix/Linux but you don't really say. I have no direct advice on your issue. So I don't expect to be the "right" answer to this question. But there is something to explore here.

我假设您正在运行Unix / Linux,但您并没有真正说出来。我对你的问题没有直接的建议。所以我不希望成为这个问题的“正确”答案。但这里有一些值得探索的东西。

First, if your daemons are crashing, you should fix that. Only programs with bugs should crash. Perhaps you should launch them under a debugger and see what happens when they crash (if that's possible). Do you have any trace logging in these processes? If not, add them. That might help diagnose your crash.

首先,如果你的守护进程崩溃,你应该解决这个问题。只有带错误的程序才会崩溃。也许您应该在调试器下启动它们,看看它们崩溃时会发生什么(如果可能的话)。您是否在这些流程中有任何跟踪记录?如果没有,请添加它们。这可能有助于诊断您的崩溃。

Second, are your daemons providing services (opening pipes and waiting for requests) or are they performing periodic cleanup? If they are periodic cleanup processes you should use cron to launch them periodically rather then have them run in an infinite loop. Cron processes should be preferred over daemon processes. Similarly, if they are services that open ports and service requests, have you considered making them work with INETD? Again, a single daemon (inetd) should be preferred to a bunch of daemon processes.

第二,你的守护进程是提供服务(打开管道和等待请求)还是他们执行定期清理?如果它们是定期清理过程,你应该使用cron定期启动它们,而不是让它们在无限循环中运行。 Cron进程应优先于守护进程。同样,如果它们是打开端口和服务请求的服务,您是否考虑过让它们与INETD一起使用?同样,单个守护进程(inetd)应该优先于一堆守护进程。

Third, saving a PID in a file is not very effective, as you've discovered. Perhaps a shared IPC, like a semaphore, would work better. I don't have any details here though.

第三,正如您所发现的那样,将PID保存在文件中并不是非常有效。也许共享的IPC,如信号量,会更好地工作。我虽然没有任何细节。

Fourth, sometimes I need stuff to run in the context of the website. I use a cron process that calls wget with a maintenance URL. You set a special cookie and include the cookie info in with wget command line. If the special cookie doesn't exist, return 403 rather than performing the maintenance process. The other benefit here is login to the database and other environmental concerns of avoided since the code that serves normal web pages are serving the maintenance process.

第四,有时我需要在网站的上下文中运行的东西。我使用一个使用维护URL调用wget的cron进程。您设置了一个特殊的cookie并使用wget命令行包含cookie信息。如果特殊cookie不存在,则返回403而不是执行维护过程。这里的另一个好处是登录数据库以及避免的其他环境问题,因为服务于普通网页的代码正在为维护过程提供服务。

Hope that gives you ideas. I think avoiding daemons if you can is the best place to start. If you can run your python within mod_wsgi that saves you having to support multiple "environments". Debugging a process that fails after running for days at a time is just brutal.

希望能给你一些想法。我认为如果可以的话,避免使用守护进程是最好的起点。如果你可以在mod_wsgi中运行你的python,这可以节省你必须支持多个“环境”。调试一次运行几天后失败的进程是非常残酷的。

#3


2  

You should consider Python processes as able to run "forever" assuming you don't have any memory leaks in your program, the Python interpreter, or any of the Python libraries / modules that you are using. (Even in the face of memory leaks, you might be able to run forever if you have sufficient swap space on a 64-bit machine. Decades, if not centuries, should be doable. I've had Python processes survive just fine for nearly two years on limited hardware -- before the hardware needed to be moved.)

假设您的程序,Python解释器或您正在使用的任何Python库/模块中没有任何内存泄漏,您应该认为Python进程能够“永远”运行。 (即使在内存泄漏的情况下,如果你在64位机器上有足够的交换空间,你也许可以永远运行。数十年,如果不是几个世纪,应该是可行的。我已经让Python进程幸存下来了。在硬件有限的两年 - 在需要移动硬件之前。)

Ensuring programs restart when they die used to be very simple back when Linux distributions used SysV-style init -- you just add a new line to the /etc/inittab and init(8) would spawn your program at boot and re-spawn it if it dies. (I know of no mechanism to replicate this functionality with the new upstart init-replacement that many distributions are using these days. I'm not saying it is impossible, I just don't know how to do it.)

当Linux发行版使用SysV样式的init时,确保程序在死时重新启动非常简单 - 只需在/ etc / inittab中添加一行新行,init(8)将在启动时生成程序并重新生成它如果它死了(我知道没有机制可以使用新的upstart init-replacement来复制这个功能,许多发行版现在正在使用它。我不是说这是不可能的,我只是不知道该怎么做。)

But even the init(8) mechanism of years gone by wasn't as flexible as some would have liked. The daemontools package by DJB is one example of process control-and-monitoring tools intended to keep daemons living forever. The Linux-HA suite provides another similar tool, though it might provide too much "extra" functionality to be justified for this task. monit is another option.

但即使是过去几年的init(8)机制也没有一些人所希望的那么灵活。 DJB的daemontools软件包是过程控制和监视工具的一个例子,旨在让守护进程永远存在。 Linux-HA套件提供了另一个类似的工具,但它可能提供太多的“额外”功能,无法为此任务辩护。 monit是另一种选择。