在Ruby中并行运行命令行进程

时间:2022-09-20 20:57:35

I'm using PhantomJS, a command-line tool, to render images of websites, and I want to run a number of these in parallel instead of doing one after the other. How can I do this?

我使用了一个命令行工具PhantomJS来渲染网站的图像,我想要并行地运行其中的一些,而不是一个接着一个。我该怎么做呢?

3 个解决方案

#1


3  

Here's an Example using Resque. Note I've left escaping out for brevity... you should never pass external inputs directly into shell commands.

这里有一个使用Resque的例子。注意,为了简洁,我省略了转义。不应该将外部输入直接传递到shell命令中。

class RasterizeWebPageJob
  @queue = :screenshots
  def self.perform(url)
    system("/usr/bin/env DISPLAY=:1 phantomjs rasterize.js #{url} ...")
  end
end

10.times { Resque.enqueue(RasterizeWebPageJob, "http://google.com/") }

Provided you're running enough workers (and there are workers available), they'll execute in parallel. The important thing here is that you put separate jobs onto the queue instead of processing multiple screenshots from within the one job.

如果您运行了足够的worker(并且有可用的worker),它们将并行执行。这里重要的是,您将独立的作业放入队列中,而不是从一个作业中处理多个屏幕快照。

I'd advise against using Thread.new in a Rails controller. Queues are much easier (and safer) to manage than Threads.

我建议不要用线。Rails控制器的新特性。队列比线程更易于管理(也更安全)。

#2


1  

There are multiple ways of doing it. What you are looking for is to do asynchronous jobs in the background. This video may help: http://railscasts.com/episodes/128-starling-and-workling

做这件事有多种方法。您要寻找的是在后台执行异步作业。本视频可能有帮助:http://railscasts.com/des/128-starling -and-workling

#3


0  

I think what these other answers may be missing is providing a basic education on a design pattern that you'll want to use. Yes, Resque or Starling and Workling or Resque combined with Foreman will be great solutions, but you'll probably want to know why.

我认为这些其他的答案可能会遗漏的是提供一个关于你想要使用的设计模式的基础教育。是的,Resque或Starling和Workling或Resque结合Foreman是很好的解决方案,但是您可能想知道为什么。

I believe the pattern you'll want to use is the Observer Pattern or Publisher-Subscriber or PubSub, for short. The idea is similar to how a printer might work, in the simplest case.

我相信您想要使用的模式是观察者模式或发布者-订阅者或PubSub(简称PubSub)。在最简单的情况下,这个想法与打印机的工作原理相似。

A person (publisher) clicks print in say, a web browser. Then, asynchronously, the printer prints them. The printer, if it's not on, will pick up the messages when it turns on. If multiple people send documents to the printer, the printer will select them in order (FIFO) and then process (print) them. If there are multiple printers listening to the same queue (this is where the metaphor breaks down since you usually don't have that), then they can select messages in turn to process the queue faster.

一个人(出版商)点击打印,比方说web浏览器。然后,异步地,打印机打印它们。如果打印机没有打开,当它打开时,它会接收到信息。如果多人向打印机发送文档,打印机将按顺序(FIFO)选择它们,然后处理(打印)它们。如果有多个打印机正在监听相同的队列(由于通常没有这样的队列,因此这个隐喻就失效了),那么它们可以依次选择消息,以更快地处理队列。

Resque and other PubSub gems, projects, JARs (you're not limited to Ruby) implement this design pattern.

Resque和其他PubSub gems、project、jar(不限于Ruby)实现此设计模式。

More info about the pattern here (note that the Java Observable is a class which is a bad design decision. You can implement your own):

关于此模式的更多信息(请注意,Java Observable是一个错误的设计决策的类)。你可以执行你自己的):

http://ruby-doc.org/stdlib-2.0/libdoc/observer/rdoc/Observable.html http://docs.oracle.com/javase/7/docs/api/java/util/Observable.html http://en.wikipedia.org/wiki/Observer_pattern http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern

http://ruby doc.org/stdlib - 2.0 - / - libdoc/observer/rdoc/observable.html http://docs.oracle.com/javase/7/docs/api/java/util/Observable.html http://en.wikipedia.org/wiki/Observer_pattern http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern

For our processing, we use Resque for smaller tasks, but you're still limited to global interpreter lock and other issues like having to deploy your code to a server, install gems, etc. We now use Storm (https://github.com/nathanmarz/storm) to handle our stream processing and it works way better. Storm may be overkill for what you're trying to do, depending on how many images you're processing in a day.

对于我们的处理,我们使用Resque来处理较小的任务,但是仍然局限于全局解释器锁和其他问题,比如必须将代码部署到服务器、安装gems等。我们现在使用Storm (https://github.com/nathanmarz/storm)来处理流处理,效果更好。根据你一天要处理的图片数量,风暴可能会使你想要做的事情显得有些过头。

#1


3  

Here's an Example using Resque. Note I've left escaping out for brevity... you should never pass external inputs directly into shell commands.

这里有一个使用Resque的例子。注意,为了简洁,我省略了转义。不应该将外部输入直接传递到shell命令中。

class RasterizeWebPageJob
  @queue = :screenshots
  def self.perform(url)
    system("/usr/bin/env DISPLAY=:1 phantomjs rasterize.js #{url} ...")
  end
end

10.times { Resque.enqueue(RasterizeWebPageJob, "http://google.com/") }

Provided you're running enough workers (and there are workers available), they'll execute in parallel. The important thing here is that you put separate jobs onto the queue instead of processing multiple screenshots from within the one job.

如果您运行了足够的worker(并且有可用的worker),它们将并行执行。这里重要的是,您将独立的作业放入队列中,而不是从一个作业中处理多个屏幕快照。

I'd advise against using Thread.new in a Rails controller. Queues are much easier (and safer) to manage than Threads.

我建议不要用线。Rails控制器的新特性。队列比线程更易于管理(也更安全)。

#2


1  

There are multiple ways of doing it. What you are looking for is to do asynchronous jobs in the background. This video may help: http://railscasts.com/episodes/128-starling-and-workling

做这件事有多种方法。您要寻找的是在后台执行异步作业。本视频可能有帮助:http://railscasts.com/des/128-starling -and-workling

#3


0  

I think what these other answers may be missing is providing a basic education on a design pattern that you'll want to use. Yes, Resque or Starling and Workling or Resque combined with Foreman will be great solutions, but you'll probably want to know why.

我认为这些其他的答案可能会遗漏的是提供一个关于你想要使用的设计模式的基础教育。是的,Resque或Starling和Workling或Resque结合Foreman是很好的解决方案,但是您可能想知道为什么。

I believe the pattern you'll want to use is the Observer Pattern or Publisher-Subscriber or PubSub, for short. The idea is similar to how a printer might work, in the simplest case.

我相信您想要使用的模式是观察者模式或发布者-订阅者或PubSub(简称PubSub)。在最简单的情况下,这个想法与打印机的工作原理相似。

A person (publisher) clicks print in say, a web browser. Then, asynchronously, the printer prints them. The printer, if it's not on, will pick up the messages when it turns on. If multiple people send documents to the printer, the printer will select them in order (FIFO) and then process (print) them. If there are multiple printers listening to the same queue (this is where the metaphor breaks down since you usually don't have that), then they can select messages in turn to process the queue faster.

一个人(出版商)点击打印,比方说web浏览器。然后,异步地,打印机打印它们。如果打印机没有打开,当它打开时,它会接收到信息。如果多人向打印机发送文档,打印机将按顺序(FIFO)选择它们,然后处理(打印)它们。如果有多个打印机正在监听相同的队列(由于通常没有这样的队列,因此这个隐喻就失效了),那么它们可以依次选择消息,以更快地处理队列。

Resque and other PubSub gems, projects, JARs (you're not limited to Ruby) implement this design pattern.

Resque和其他PubSub gems、project、jar(不限于Ruby)实现此设计模式。

More info about the pattern here (note that the Java Observable is a class which is a bad design decision. You can implement your own):

关于此模式的更多信息(请注意,Java Observable是一个错误的设计决策的类)。你可以执行你自己的):

http://ruby-doc.org/stdlib-2.0/libdoc/observer/rdoc/Observable.html http://docs.oracle.com/javase/7/docs/api/java/util/Observable.html http://en.wikipedia.org/wiki/Observer_pattern http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern

http://ruby doc.org/stdlib - 2.0 - / - libdoc/observer/rdoc/observable.html http://docs.oracle.com/javase/7/docs/api/java/util/Observable.html http://en.wikipedia.org/wiki/Observer_pattern http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern

For our processing, we use Resque for smaller tasks, but you're still limited to global interpreter lock and other issues like having to deploy your code to a server, install gems, etc. We now use Storm (https://github.com/nathanmarz/storm) to handle our stream processing and it works way better. Storm may be overkill for what you're trying to do, depending on how many images you're processing in a day.

对于我们的处理,我们使用Resque来处理较小的任务,但是仍然局限于全局解释器锁和其他问题,比如必须将代码部署到服务器、安装gems等。我们现在使用Storm (https://github.com/nathanmarz/storm)来处理流处理,效果更好。根据你一天要处理的图片数量,风暴可能会使你想要做的事情显得有些过头。