如何修改perl脚本以使用多个处理器？

Hi I have a simple script that takes a file and runs another Perl script on it. The script does this to every picture file in the current folder. This is running on a machine with 2 quad core Xeon processors, 16gb of ram, running RedHat Linux.

嗨,我有一个简单的脚本,它接受一个文件并在其上运行另一个Perl脚本。该脚本对当前文件夹中的每个图片文件执行此操作。这是在一台配备2个四核Xeon处理器,16GB内存,运行RedHat Linux的机器上运行的。

The first script work.pl basically calls magicplate.pl passes some parameters and the name of the file for magicplate.pl to process. Magic Plate takes about a minute to process each image. Because work.pl is preforming the same function over 100 times and because the system has multiple processors and cores I was thinking about splitting the task up so that it could run multiple times in parallel. I could split the images up to different folders if necessary. Any help would be great. Thank you

第一个脚本work.pl基本上调用magicplate.pl传递一些参数和magicplate.pl文件的名称来处理。 Magic Plate大约需要一分钟来处理每个图像。因为work.pl正在执行相同的功能超过100次,并且因为系统有多个处理器和核心,所以我正在考虑将任务拆分,以便它可以并行运行多次。如有必要,我可以将图像分割到不同的文件夹。任何帮助都会很棒。谢谢

Here is what I have so far:

这是我到目前为止:

use strict;
use warnings;


my @initialImages = <*>;

foreach my $file (@initialImages) {

    if($file =~ /.png/){
        print "processing $file...\n";
        my @tmp=split(/\./,$file);
        my $name="";
        for(my $i=0;$i<(@tmp-1);$i++) {
            if($name eq "") { $name = $tmp[$i]; } else { $name=$name.".".$tmp[$i];}
        }

        my $exten=$tmp[(@tmp-1)];
        my $orig=$name.".".$exten;

        system("perl magicPlate.pl -i ".$orig." -min 4 -max 160 -d 1");
     }
}

3 个解决方案

#1

You could use Parallel::ForkManager (set $MAX_PROCESSES to the number of files processed at the same time):

您可以使用Parallel :: ForkManager(将$ MAX_PROCESSES设置为同时处理的文件数):

use Parallel::ForkManager;
use strict;
use warnings;

my @initialImages = <*>;

foreach my $file (@initialImages) {

    if($file =~ /.png/){
        print "processing $file...\n";
        my @tmp=split(/\./,$file);
        my $name="";
        for(my $i=0;$i<(@tmp-1);$i++) {
            if($name eq "") { $name = $tmp[$i]; } else { $name=$name.".".$tmp[$i];}
        }

        my $exten=$tmp[(@tmp-1)];
        my $orig=$name.".".$exten;

  $pm = new Parallel::ForkManager($MAX_PROCESSES);
    my $pid = $pm->start and next;
        system("perl magicPlate.pl -i ".$orig." -min 4 -max 160 -d 1");
    $pm->finish; # Terminates the child process

     }
}

But as suggested by Hugmeir running perl interpreter again and again for each new file is not a good idea.

但正如Hugmeir所建议的那样,为每个新文件一次又一次地运行perl解释器并不是一个好主意。

#2

You should consider NOT creating a new process for each file that you want to process -- It's horribly inefficient, and probably what is taking most of your time here. Just loading up Perl and whatever modules you use over and over ought to be creating some overhead. I recall a poster on PerlMonks that did something similar, and ended up transforming his second script into a module, reducing the worktime from an hour to a couple of minutes. Not that you should expect such a dramatic improvement, but one can dream..

您应该考虑不为要处理的每个文件创建一个新进程 - 这非常低效,而且可能需要花费大部分时间在这里。只需加载Perl以及您反复使用的任何模块都应该会产生一些开销。我记得PerlMonks上的一张海报做了类似的事情,最后将他的第二个脚本转换成了一个模块,将工作时间从一小时缩短到几分钟。并不是说你应该期待如此显着的改善,但人们可以梦想......

With the second script refactored as a module, here's an example of thread usage, in which BrowserUK creates a thread pool, feeding it jobs through a queue.

将第二个脚本重构为模块,这是一个线程使用示例,其中BrowserUK创建一个线程池,通过队列为其提供作业。

#3

Import "maigcplate" and use threading.

导入“maigcplate”并使用线程。

Start magicplate.pl in the background (you would need to add process throttling)

在后台启动magicplate.pl(您需要添加进程限制)

Import "magicplate" and use fork (add process throttling and a kiddy reaper)

导入“魔法板”并使用fork(添加进程限制和kiddy收割者)

Make "maigcplate" a daemon with a pool of workers = # of CPUs
- use an MQ implementation for communication
- use sockets for communication

使“maigcplate”成为带有工作池的守护进程= CPU数量使用MQ实现进行通信使用套接字进行通信

Use webserver(nginx, apache, ...) and wrap in REST for a webservice

使用webserver(nginx,apache,...)并在REST中包装Web服务

etc...

All these center around creating multiple workers that can each run on their own cpu. Certain implementations will use resources better (those that don't start a new process) and be easier to implement and maintain.

所有这些都围绕创建多个工作人员,每个工作人员可以在自己的cpu上运行。某些实现将更好地使用资源(那些不启动新进程的资源)并且更易于实现和维护。

#1