在Rails中组织工人进程的最佳方式是什么?

时间:2020-12-25 09:19:11

I frequently have some code that should be run either on a schedule or as a background process with some parameters. The common element is that they are run outside the dispatch process, but need access to the Rails environment (and possibly the parameters passed in).

我经常有一些代码应该在调度中运行,或者作为带有一些参数的后台进程运行。常见的元素是它们在分派过程之外运行,但是需要访问Rails环境(可能传入的参数)。

What's a good way to organize this and why? If you like to use a particular plugin or gem, explain why you find it convenient--don't just list a plugin you use.

有什么好的组织方式,为什么?如果你喜欢使用某个插件或gem,请解释一下为什么你觉得它很方便——不要仅仅列出你使用的插件。

4 个解决方案

#1


5  

For me, not wanting to maintain a lot of extra infrastructure is a key priority, so I have used database-backed queues that are run outside of Rails.

对我来说,不希望维护大量额外的基础设施是关键的优先级,因此我使用了运行在Rails之外的数据库支持队列。

In my case, I've used background_job and delayed_job. With background_job, the worker was kept running via cron, so there was no daemon management. With delayed_job, I'm using Heroku and letting them worry about that.

在我的例子中,我使用了background_job和delayed_job。使用background_job,工作人员通过cron继续运行,因此没有守护进程管理。使用delayed_job,我使用Heroku让他们担心这个。

With delayed_job you can pass in as many arguments as your background worker needs to run.

使用delayed_job,您可以输入所需运行的所有参数。

Delayed::Job.enqueue(MyJob.new(param[:one], param[:two], param[:three])

I have not found a good solution to running stuff on a schedule, aside from using script/runner via cron (I prefer to use script/runner over a Rake task because I find it easier to test the code).

除了通过cron使用脚本/runner(我更喜欢使用脚本/runner而不是Rake任务,因为我发现测试代码更容易)之外,我还没有找到一个很好的解决方案来在日程表上运行东西。

I've never had to have a regularly scheduled background process that needed access to a particular Rails request so that hasn't been too much of a problem.

我从未有过需要访问特定Rails请求的定期后台进程,所以这并不是什么大问题。

I know there are other, cooler systems with more features but this has worked OK for me, and helps me avoid dealing with setting up a lot of new services to manage.

我知道还有其他更酷的系统有更多的特性,但这对我来说还可以,并帮助我避免设置大量的新服务来管理。

#2


7  

I really don't like gems like delayed_job and background_job that persist to a database for the purpose of running asynchronous jobs. It just seems dirty to me. Transient stuff doesn't belong in a database.

我真的不喜欢像delayed_job和background_job这样为了运行异步作业而持久化到数据库的gem。这对我来说太脏了。暂时的东西不属于数据库。

I'm a huge fan of message queues for dealing with asynchronous tasks, even when you don't have the need for massive scalability. The way I see it, message queues are the ideal "lingua franca" for complex systems. With a message queue, in most cases, you have no restriction on the technologies or languages that are involved in whatever it is that you're building. The benefits of low-concurrency message queue usage is probably most noticeable in an "enterprisey" environment where integration is always a massive pain. Additionally, message queues are ideal when your asynchronous workflow involves multiple steps. RabbitMQ is my personal favorite.

我非常喜欢处理异步任务的消息队列,即使您不需要大量的可伸缩性。在我看来,消息队列是复杂系统的理想“通用语言”。对于消息队列,在大多数情况下,您对所构建的技术或语言没有任何限制。在“enterprisey”环境中,使用低并发消息队列的好处可能最为明显,在这种环境中,集成总是非常痛苦。此外,当异步工作流涉及多个步骤时,消息队列是理想的。RabbitMQ是我个人的最爱。

For example, consider the scenario where you're building a search engine. People can submit URIs to be indexed. Obviously, you don't want to retrieve and index the page in-request. So you build around a message queue: The form submission target takes the URI, throws it in the message queue to be indexed. The next available spider process pops the URI off the queue, retrieves the page, finds all links, pushes each of them back onto the queue if they are unknown, and caches the content. Finally, a new message is pushed onto a second queue for the indexer process to deal with the cached content. Indexer process pops that message off the queue, and indexes the cached content. Oversimplified of course — search engines are a lot of work, but you get the idea.

例如,考虑构建搜索引擎的场景。人们可以提交uri以进行索引。显然,您不希望检索和索引请求中的页面。因此,您构建一个消息队列:表单提交目标获取URI,并将其放入消息队列中进行索引。下一个可用的爬行器进程将URI从队列中取出,检索页面,查找所有链接,如果它们是未知的,则将它们推回队列,并缓存内容。最后,将一个新消息推入第二个队列,以便索引器进程处理缓存的内容。Indexer进程将该消息从队列中取出,并对缓存的内容进行索引。当然过度简化了——搜索引擎有很多工作要做,但你懂的。

As for the actual daemons, obviously, I'm partial to my own library (ChainGang), but it's really just a wrapper around Kernel.fork() that gives you a convenient place to deal with setup and teardown code. It's also not quite done yet. The daemon piece is far less important than the message queue, really.

至于实际的守护进程,显然,我偏爱自己的库(ChainGang),但它实际上只是一个围绕Kernel.fork()的包装器,它为您提供了一个方便的地方来处理setup和teardown代码。它还没有完全完成。实际上,守护进程片段远没有消息队列重要。

Regarding the Rails environment, well, that's probably best left as an exercise for the reader, since memory usage is going to be a significant factor what with the long-running process. You don't want to load anything you don't have to. Incidentally, this is one area that DataMapper kicks ActiveRecord's butt soundly. Environment initialization is well-documented, and there's a lot fewer dependencies that come into play, making the whole kit and caboodle significantly more realistic.

对于Rails环境,最好还是留给读者做练习,因为内存的使用将会是长期运行过程的重要因素。你不想装载任何你不需要的东西。顺便说一句,这是一个数据映射器可以很好地打击ActiveRecord的区域。环境初始化是有很好的文档记录的,并且有更少的依赖关系可以发挥作用,使整个工具包和文件集更加真实。

The one thing I don't like about cron+rake is that rake is virtually guaranteed to print to standard output, and cron tends to be excessively chatty if your cron jobs produce output. I like to put all my cron tasks in an appropriately named directory, then make a rake task that wraps them, so that it's trivial to run them manually. It's a shame that rake does this, because I'd really prefer to have the option to take advantage of dependencies. In any case, you just point cron directly at the scripts rather than running them via cron.

我不喜欢cron+rake的一点是,它实际上保证了rake能输出到标准输出,而且如果cron任务产生输出,那么cron就会变得非常健谈。我喜欢将所有cron任务放在一个适当命名的目录中,然后创建一个rake任务来包装它们,这样手工运行它们就变得很简单了。rake这么做真是太可惜了,因为我真的更愿意选择利用依赖关系。无论如何,您只需直接指向脚本的cron,而不是通过cron运行它们。

I'm currently in the middle of building a web app that relies heavily on asynchronous processes, and I have to say, I'm very, very glad I decided not to use Rails.

我现在正在构建一个依赖于异步进程的web应用程序,我不得不说,我非常非常高兴我决定不使用Rails。

#3


2  

I have a system that receives requests and then needs to call several external systems using web-services. Some of these requests take longer than a user can be expected to wait and I use an enterprise queuing system(activemq) to handle these requests.

我有一个系统接收请求,然后需要使用web服务调用几个外部系统。其中一些请求的等待时间比用户预期的要长,我使用企业队列系统(activemq)来处理这些请求。

I am using the ActiveMessaging plugin to do this. This allows me to marshall the request and place it on a queue for asynchronous processing with access to the request data, however you will need to write a polling service if you want to wait for the response.

我正在使用ActiveMessaging插件来完成这个任务。这允许我对请求进行马歇尔处理,并将其放置到一个队列中,以便异步处理请求数据,但是如果您希望等待响应,则需要编写一个轮询服务。

I have seen Ryan Bates railscast on Starling and Workling and they look promising but I haven't used them.

我看过Ryan Bates railscast的《Starling and Workling》,他们看起来很有前途,但我没有使用过。

#4


0  

For regularly scheduled tasks, I just use rake tasks. It's simple, easily tested, easily understood and integrates well with the Rails environment. Then just execute these rake tasks with a cron job at whatever interval you require (I use whenever to manage these jobs because I'm slightly cron-illiterate).

对于定期安排的任务,我只使用rake任务。它简单、易于测试、易于理解并与Rails环境很好地集成。然后,在你需要的任何时间内,用cron作业来执行这些rake任务(我用任何时间来管理这些工作,因为我是一个有点不懂的人)。

#1


5  

For me, not wanting to maintain a lot of extra infrastructure is a key priority, so I have used database-backed queues that are run outside of Rails.

对我来说,不希望维护大量额外的基础设施是关键的优先级,因此我使用了运行在Rails之外的数据库支持队列。

In my case, I've used background_job and delayed_job. With background_job, the worker was kept running via cron, so there was no daemon management. With delayed_job, I'm using Heroku and letting them worry about that.

在我的例子中,我使用了background_job和delayed_job。使用background_job,工作人员通过cron继续运行,因此没有守护进程管理。使用delayed_job,我使用Heroku让他们担心这个。

With delayed_job you can pass in as many arguments as your background worker needs to run.

使用delayed_job,您可以输入所需运行的所有参数。

Delayed::Job.enqueue(MyJob.new(param[:one], param[:two], param[:three])

I have not found a good solution to running stuff on a schedule, aside from using script/runner via cron (I prefer to use script/runner over a Rake task because I find it easier to test the code).

除了通过cron使用脚本/runner(我更喜欢使用脚本/runner而不是Rake任务,因为我发现测试代码更容易)之外,我还没有找到一个很好的解决方案来在日程表上运行东西。

I've never had to have a regularly scheduled background process that needed access to a particular Rails request so that hasn't been too much of a problem.

我从未有过需要访问特定Rails请求的定期后台进程,所以这并不是什么大问题。

I know there are other, cooler systems with more features but this has worked OK for me, and helps me avoid dealing with setting up a lot of new services to manage.

我知道还有其他更酷的系统有更多的特性,但这对我来说还可以,并帮助我避免设置大量的新服务来管理。

#2


7  

I really don't like gems like delayed_job and background_job that persist to a database for the purpose of running asynchronous jobs. It just seems dirty to me. Transient stuff doesn't belong in a database.

我真的不喜欢像delayed_job和background_job这样为了运行异步作业而持久化到数据库的gem。这对我来说太脏了。暂时的东西不属于数据库。

I'm a huge fan of message queues for dealing with asynchronous tasks, even when you don't have the need for massive scalability. The way I see it, message queues are the ideal "lingua franca" for complex systems. With a message queue, in most cases, you have no restriction on the technologies or languages that are involved in whatever it is that you're building. The benefits of low-concurrency message queue usage is probably most noticeable in an "enterprisey" environment where integration is always a massive pain. Additionally, message queues are ideal when your asynchronous workflow involves multiple steps. RabbitMQ is my personal favorite.

我非常喜欢处理异步任务的消息队列,即使您不需要大量的可伸缩性。在我看来,消息队列是复杂系统的理想“通用语言”。对于消息队列,在大多数情况下,您对所构建的技术或语言没有任何限制。在“enterprisey”环境中,使用低并发消息队列的好处可能最为明显,在这种环境中,集成总是非常痛苦。此外,当异步工作流涉及多个步骤时,消息队列是理想的。RabbitMQ是我个人的最爱。

For example, consider the scenario where you're building a search engine. People can submit URIs to be indexed. Obviously, you don't want to retrieve and index the page in-request. So you build around a message queue: The form submission target takes the URI, throws it in the message queue to be indexed. The next available spider process pops the URI off the queue, retrieves the page, finds all links, pushes each of them back onto the queue if they are unknown, and caches the content. Finally, a new message is pushed onto a second queue for the indexer process to deal with the cached content. Indexer process pops that message off the queue, and indexes the cached content. Oversimplified of course — search engines are a lot of work, but you get the idea.

例如,考虑构建搜索引擎的场景。人们可以提交uri以进行索引。显然,您不希望检索和索引请求中的页面。因此,您构建一个消息队列:表单提交目标获取URI,并将其放入消息队列中进行索引。下一个可用的爬行器进程将URI从队列中取出,检索页面,查找所有链接,如果它们是未知的,则将它们推回队列,并缓存内容。最后,将一个新消息推入第二个队列,以便索引器进程处理缓存的内容。Indexer进程将该消息从队列中取出,并对缓存的内容进行索引。当然过度简化了——搜索引擎有很多工作要做,但你懂的。

As for the actual daemons, obviously, I'm partial to my own library (ChainGang), but it's really just a wrapper around Kernel.fork() that gives you a convenient place to deal with setup and teardown code. It's also not quite done yet. The daemon piece is far less important than the message queue, really.

至于实际的守护进程,显然,我偏爱自己的库(ChainGang),但它实际上只是一个围绕Kernel.fork()的包装器,它为您提供了一个方便的地方来处理setup和teardown代码。它还没有完全完成。实际上,守护进程片段远没有消息队列重要。

Regarding the Rails environment, well, that's probably best left as an exercise for the reader, since memory usage is going to be a significant factor what with the long-running process. You don't want to load anything you don't have to. Incidentally, this is one area that DataMapper kicks ActiveRecord's butt soundly. Environment initialization is well-documented, and there's a lot fewer dependencies that come into play, making the whole kit and caboodle significantly more realistic.

对于Rails环境,最好还是留给读者做练习,因为内存的使用将会是长期运行过程的重要因素。你不想装载任何你不需要的东西。顺便说一句,这是一个数据映射器可以很好地打击ActiveRecord的区域。环境初始化是有很好的文档记录的,并且有更少的依赖关系可以发挥作用,使整个工具包和文件集更加真实。

The one thing I don't like about cron+rake is that rake is virtually guaranteed to print to standard output, and cron tends to be excessively chatty if your cron jobs produce output. I like to put all my cron tasks in an appropriately named directory, then make a rake task that wraps them, so that it's trivial to run them manually. It's a shame that rake does this, because I'd really prefer to have the option to take advantage of dependencies. In any case, you just point cron directly at the scripts rather than running them via cron.

我不喜欢cron+rake的一点是,它实际上保证了rake能输出到标准输出,而且如果cron任务产生输出,那么cron就会变得非常健谈。我喜欢将所有cron任务放在一个适当命名的目录中,然后创建一个rake任务来包装它们,这样手工运行它们就变得很简单了。rake这么做真是太可惜了,因为我真的更愿意选择利用依赖关系。无论如何,您只需直接指向脚本的cron,而不是通过cron运行它们。

I'm currently in the middle of building a web app that relies heavily on asynchronous processes, and I have to say, I'm very, very glad I decided not to use Rails.

我现在正在构建一个依赖于异步进程的web应用程序,我不得不说,我非常非常高兴我决定不使用Rails。

#3


2  

I have a system that receives requests and then needs to call several external systems using web-services. Some of these requests take longer than a user can be expected to wait and I use an enterprise queuing system(activemq) to handle these requests.

我有一个系统接收请求,然后需要使用web服务调用几个外部系统。其中一些请求的等待时间比用户预期的要长,我使用企业队列系统(activemq)来处理这些请求。

I am using the ActiveMessaging plugin to do this. This allows me to marshall the request and place it on a queue for asynchronous processing with access to the request data, however you will need to write a polling service if you want to wait for the response.

我正在使用ActiveMessaging插件来完成这个任务。这允许我对请求进行马歇尔处理,并将其放置到一个队列中,以便异步处理请求数据,但是如果您希望等待响应,则需要编写一个轮询服务。

I have seen Ryan Bates railscast on Starling and Workling and they look promising but I haven't used them.

我看过Ryan Bates railscast的《Starling and Workling》,他们看起来很有前途,但我没有使用过。

#4


0  

For regularly scheduled tasks, I just use rake tasks. It's simple, easily tested, easily understood and integrates well with the Rails environment. Then just execute these rake tasks with a cron job at whatever interval you require (I use whenever to manage these jobs because I'm slightly cron-illiterate).

对于定期安排的任务,我只使用rake任务。它简单、易于测试、易于理解并与Rails环境很好地集成。然后,在你需要的任何时间内,用cron作业来执行这些rake任务(我用任何时间来管理这些工作,因为我是一个有点不懂的人)。