让主管正确停止芹菜工人

I meet a lot weird thing when using celery. Such as, I update tasks.py, supervisorctl reload(restart), but tasks is wrong. Some tasks seems disappear and so on.
Today I found that because supervisorctl stop all can not stop all celery workers. And only kill -9 'pgrep python' can kill them all.

使用芹菜时,我遇到了很多奇怪的事情。比如,我更新tasks.py,supervisorctl reload(重启),但任务错误。有些任务似乎消失了,等等。今天我发现,因为supervisorctl停止所有不能阻止所有芹菜工人。只有杀死-9'pgrep python'才能杀死所有人。

situation:

    root@ubuntu12:/data/www/article_fetcher# supervisorctl
    celery_beat                      RUNNING    pid 29597, uptime 0:52:18
    celery_worker1                   RUNNING    pid 29556, uptime 0:52:20
    celery_worker2                   RUNNING    pid 29570, uptime 0:52:19
    celery_worker3                   RUNNING    pid 29557, uptime 0:52:20
    celery_worker4                   RUNNING    pid 29586, uptime 0:52:18
    uwsgi                            RUNNING    pid 29604, uptime 0:52:18
    supervisor> stop all
    celery_beat: stopped
    celery_worker2: stopped
    celery_worker4: stopped
    celery_worker3: stopped
    uwsgi: stopped
    celery_worker1: stopped
    supervisor> status
    celery_beat                      STOPPED    Aug 04 11:05 AM
    celery_worker1                   STOPPED    Aug 04 11:05 AM
    celery_worker2                   STOPPED    Aug 04 11:05 AM
    celery_worker3                   STOPPED    Aug 04 11:05 AM
    celery_worker4                   STOPPED    Aug 04 11:05 AM
    uwsgi                            STOPPED    Aug 04 11:05 AM

processes:

root@ubuntu12:~# ps -aux|grep 'python'
Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html
root      8683  0.0  0.1  61420 11768 ?        Ss   Aug03   0:27 /usr/bin/python /usr/bin/supervisord
root     29310  0.1  0.1  57120 11344 pts/2    S+   11:05   0:00 /usr/bin/python /usr/bin/supervisorctl
nobody   29556  2.2  0.5 132484 45988 ?        S    11:06   0:00 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W1 -Ofair --app=celery_worker:app
nobody   29557  2.2  0.5 132480 45996 ?        S    11:06   0:00 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W3 -Ofair --app=celery_worker:app
nobody   29570  2.4  0.5 132740 45996 ?        S    11:06   0:00 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W2 -Ofair --app=celery_worker:app
nobody   29571 26.9  1.4 217688 115804 ?       R    11:06   0:09 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W3 -Ofair --app=celery_worker:app
nobody   29572 33.7  0.7 158396 59808 ?        R    11:06   0:12 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W3 -Ofair --app=celery_worker:app
nobody   29573 29.6  1.4 215176 115928 ?       R    11:06   0:10 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W1 -Ofair --app=celery_worker:app
nobody   29574 27.2  1.4 218244 118180 ?       R    11:06   0:09 /data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W3 -Ofair --app=celery_worker:app
......
......
......

I found this question:Stopping Supervisor doesn't stop Celery workers, but it is asking different thing, the accepted answer supervisorctl stop all do not work actually.So I decide find the right way.

我发现了这个问题:停止主管不会阻止Celery工作者,但它会问不同的事情,接受的答案supervisorctl停止所有实际上都不起作用。所以我决定找到正确的方法。

1 个解决方案

#1

I look into supervisor docs and find this:

我调查了主管文档并找到了这个:

killasgroup

If true, when resorting to send SIGKILL to the program to terminate it send it to its whole process group instead, taking care of its children as well, useful e.g with Python programs using multiprocessing.

如果为true,那么当尝试将SIGKILL发送到程序以终止它时,将其发送到整个进程组,同时照顾它的子进程,这对于使用多处理的Python程序很有用。

Default: false

Required: No.

Introduced: 3.0a11

Then I think that each worker create 4 child process(by cpu cores) become a process group, that's why supervisorctl stop all do not work.
So I add killasgroup to supervisord.conf:

然后我认为每个工作者创建4个子进程(通过cpu核心)成为一个进程组,这就是为什么supervisorctl停止所有不工作的原因。所以我将killasgroup添加到supervisord.conf:

    [program:celery_worker1]
    ; Set full path to celery program if using virtualenv

    directory=/data/www/article_fetcher

    command=/data/www/article_fetcher/venv/bin/python /data/www/article_fetcher/manage.py celery worker -n W2 -Ofair --app=celery_worker:app
    user=nobody
    numprocs=1
    stdout_logfile=/data/www/article_fetcher/logs/celery.log
    stderr_logfile=/data/www/article_fetcher/logs/celery.log
    autostart=true
    autorestart=true
    startsecs=5
    killasgroup=true

    .....
    .....

Then supervisorctl stop all really stop celery workers! very well~

然后supervisorctl停止所有真正停止芹菜工人!非常好〜

#1