如何在Python中后台运行长时间运行的作业

时间:2021-10-02 07:12:26

I have a web-service that runs long-running jobs (in the order of several hours). I am developing this using Flask, Gunicorn, and nginx.

我有一个运行长时间运行的作业的web服务(大约几个小时)。我正在用烧瓶,牛角和nginx开发这个。

What I am thinking of doing is to have the route which takes a long time to complete, call a function that creates a thread. The function will then return a guid back to the route, and the route will return a url (using the guid) that the user can use to check progress. I am making the thread a daemon (thread.daemon = True) so that the thread exits if my calling code exits (unexpectedly).

我想做的是有一个需要很长时间完成的路线,调用一个创建线程的函数。然后,函数将返回guid到路由,路由将返回一个url(使用guid),用户可以使用该url检查进程。我正在使线程成为一个守护进程(线程)。守护进程= True),以便在调用代码退出时线程退出(意外)。

Is this the correct approach to use? It works, but that doesn't mean that it is correct.

这是正确的使用方法吗?它是有效的,但这并不意味着它是正确的。

my_thread = threading.Thread(target=self._run_audit, args=())
my_thread.daemon = True
my_thread.start()

4 个解决方案

#1


6  

The more regular approch to handle such issue is extract the action from the base application and call it outside, using a task manager system like Celery.

处理此类问题的更常规的方法是从基本应用程序中提取操作并调用它,使用任务管理器系统(如芹菜)。

Using this tutorial you can create your task and trigger it from your web application.

使用本教程,您可以创建任务并从web应用程序触发它。

from flask import Flask

app = Flask(__name__)
app.config.update(
    CELERY_BROKER_URL='redis://localhost:6379',
    CELERY_RESULT_BACKEND='redis://localhost:6379'
)
celery = make_celery(app)


@celery.task()
def add_together(a, b):
    return a + b

Then you can run:

然后您可以运行:

>>> result = add_together.delay(23, 42)
>>> result.wait()
65

Just remember you need to run worker separately:

只要记住你需要分开运行工人:

celery -A your_application worker

#2


10  

Celery and RQ is overengineering for simple task. Take a look at this docs - https://docs.python.org/3/library/concurrent.futures.html

芹菜和RQ是对简单任务的过度设计。看看这个文档——https://docs.python.org/3/library/concurrent.futures.html

Also check example, how to run long-running jobs in background for Flask app - https://*.com/a/39008301/5569578

还可以检查示例,如何在Flask应用程序的后台运行长时间运行的作业——https://*.com/a/39008301/5569578

#3


4  

Well, Although your approach is not incorrect, basicly it may lead your program run out of available threads. As Ali mentioned, a general approach is to use Job Queues like RQ or Celery. However you don't need to extract functions to use those libraries. For Flask, I recommend you to use Flask-RQ. It's simple to start:

虽然你的方法是正确的,但基本上它会导致你的程序耗尽可用的线程。正如Ali提到的,一般的方法是使用RQ或芹菜等作业队列。但是,您不需要提取函数来使用这些库。对于长颈瓶,我建议您使用长颈瓶。这是简单的开始:

RQ

pip install flask-rq

Just remember to install Redis before using it in your Flask app.

在使用Flask应用程序之前,请记住安装Redis。

And simply use @Job Decorator in your Flask functions:

在你的Flask函数中简单地使用@Job Decorator:

from flask.ext.rq import job


@job
def process(i):
    #  Long stuff to process


process.delay(3)

And finally you need rqworker to start the worker:

最后,你需要rqworker来启动这个worker:

rqworker

rqworker

You can see RQ docs for more info. RQ designed for simple long running processes.

您可以查看RQ文档获得更多信息。RQ设计用于简单的长时间运行的进程。

Celery

Celery is more complicated, has huge list of features and is not recommended if you are new to job queues and distributed processing methods.

芹菜更复杂,有大量的特性,如果您对作业队列和分布式处理方法不熟悉,不推荐使用芹菜。

Greenlets

Greenlets have switches. Let you to switch between long running processes. You can use greenlets for running processes. The benefit is you don't need Redis and other worker, instead you have to re-design your functions to be compatible:

一种绿色小鸟有开关。让您在长时间运行的进程之间进行切换。您可以使用greenlet来运行进程。好处是你不需要Redis和其他工作人员,相反,你必须重新设计你的功能以使其兼容:

from greenlet import greenlet

def test1():
    print 12
    gr2.switch()
    print 34

def test2():
    print 56
    gr1.switch()
    print 78

gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch()

#4


2  

Your approach is fine and will totally work, but why reinvent the background worker for python web applications when a widely accepted solution exists, namely celery.

您的方法很好,并且完全可以工作,但是如果存在广泛接受的解决方案(即芹菜),那么为什么要为python web应用程序重新创建后台工作程序呢?

I'd need to see a lot tests before I trusted any home rolled code for such an important task.

我需要查看大量的测试,然后才能为如此重要的任务信任任何自卷代码。

Plus celery gives you features like task persistence and the ability to distribute workers across multiple machines.

此外,芹菜还提供了任务持久性和跨多台机器分发员工的功能。

#1


6  

The more regular approch to handle such issue is extract the action from the base application and call it outside, using a task manager system like Celery.

处理此类问题的更常规的方法是从基本应用程序中提取操作并调用它,使用任务管理器系统(如芹菜)。

Using this tutorial you can create your task and trigger it from your web application.

使用本教程,您可以创建任务并从web应用程序触发它。

from flask import Flask

app = Flask(__name__)
app.config.update(
    CELERY_BROKER_URL='redis://localhost:6379',
    CELERY_RESULT_BACKEND='redis://localhost:6379'
)
celery = make_celery(app)


@celery.task()
def add_together(a, b):
    return a + b

Then you can run:

然后您可以运行:

>>> result = add_together.delay(23, 42)
>>> result.wait()
65

Just remember you need to run worker separately:

只要记住你需要分开运行工人:

celery -A your_application worker

#2


10  

Celery and RQ is overengineering for simple task. Take a look at this docs - https://docs.python.org/3/library/concurrent.futures.html

芹菜和RQ是对简单任务的过度设计。看看这个文档——https://docs.python.org/3/library/concurrent.futures.html

Also check example, how to run long-running jobs in background for Flask app - https://*.com/a/39008301/5569578

还可以检查示例,如何在Flask应用程序的后台运行长时间运行的作业——https://*.com/a/39008301/5569578

#3


4  

Well, Although your approach is not incorrect, basicly it may lead your program run out of available threads. As Ali mentioned, a general approach is to use Job Queues like RQ or Celery. However you don't need to extract functions to use those libraries. For Flask, I recommend you to use Flask-RQ. It's simple to start:

虽然你的方法是正确的,但基本上它会导致你的程序耗尽可用的线程。正如Ali提到的,一般的方法是使用RQ或芹菜等作业队列。但是,您不需要提取函数来使用这些库。对于长颈瓶,我建议您使用长颈瓶。这是简单的开始:

RQ

pip install flask-rq

Just remember to install Redis before using it in your Flask app.

在使用Flask应用程序之前,请记住安装Redis。

And simply use @Job Decorator in your Flask functions:

在你的Flask函数中简单地使用@Job Decorator:

from flask.ext.rq import job


@job
def process(i):
    #  Long stuff to process


process.delay(3)

And finally you need rqworker to start the worker:

最后,你需要rqworker来启动这个worker:

rqworker

rqworker

You can see RQ docs for more info. RQ designed for simple long running processes.

您可以查看RQ文档获得更多信息。RQ设计用于简单的长时间运行的进程。

Celery

Celery is more complicated, has huge list of features and is not recommended if you are new to job queues and distributed processing methods.

芹菜更复杂,有大量的特性,如果您对作业队列和分布式处理方法不熟悉,不推荐使用芹菜。

Greenlets

Greenlets have switches. Let you to switch between long running processes. You can use greenlets for running processes. The benefit is you don't need Redis and other worker, instead you have to re-design your functions to be compatible:

一种绿色小鸟有开关。让您在长时间运行的进程之间进行切换。您可以使用greenlet来运行进程。好处是你不需要Redis和其他工作人员,相反,你必须重新设计你的功能以使其兼容:

from greenlet import greenlet

def test1():
    print 12
    gr2.switch()
    print 34

def test2():
    print 56
    gr1.switch()
    print 78

gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch()

#4


2  

Your approach is fine and will totally work, but why reinvent the background worker for python web applications when a widely accepted solution exists, namely celery.

您的方法很好,并且完全可以工作,但是如果存在广泛接受的解决方案(即芹菜),那么为什么要为python web应用程序重新创建后台工作程序呢?

I'd need to see a lot tests before I trusted any home rolled code for such an important task.

我需要查看大量的测试,然后才能为如此重要的任务信任任何自卷代码。

Plus celery gives you features like task persistence and the ability to distribute workers across multiple machines.

此外,芹菜还提供了任务持久性和跨多台机器分发员工的功能。