Hi I have a simple script that takes a file and runs another Perl script on it. The script does this to every picture file in the current folder. This is running on a machine with 2 quad core Xeon processors, 16gb of ram, running RedHat Linux.
嗨,我有一个简单的脚本,它接受一个文件并在其上运行另一个Perl脚本。该脚本对当前文件夹中的每个图片文件执行此操作。这是在一台配备2个四核Xeon处理器,16GB内存,运行RedHat Linux的机器上运行的。
The first script work.pl basically calls magicplate.pl passes some parameters and the name of the file for magicplate.pl to process. Magic Plate takes about a minute to process each image. Because work.pl is preforming the same function over 100 times and because the system has multiple processors and cores I was thinking about splitting the task up so that it could run multiple times in parallel. I could split the images up to different folders if necessary. Any help would be great. Thank you
第一个脚本work.pl基本上调用magicplate.pl传递一些参数和magicplate.pl文件的名称来处理。 Magic Plate大约需要一分钟来处理每个图像。因为work.pl正在执行相同的功能超过100次,并且因为系统有多个处理器和核心,所以我正在考虑将任务拆分,以便它可以并行运行多次。如有必要,我可以将图像分割到不同的文件夹。任何帮助都会很棒。谢谢
Here is what I have so far:
这是我到目前为止:
use strict;
use warnings;
my @initialImages = <*>;
foreach my $file (@initialImages) {
if($file =~ /.png/){
print "processing $file...\n";
my @tmp=split(/\./,$file);
my $name="";
for(my $i=0;$i<(@tmp-1);$i++) {
if($name eq "") { $name = $tmp[$i]; } else { $name=$name.".".$tmp[$i];}
}
my $exten=$tmp[(@tmp-1)];
my $orig=$name.".".$exten;
system("perl magicPlate.pl -i ".$orig." -min 4 -max 160 -d 1");
}
}
3 个解决方案
#1
3
You could use Parallel::ForkManager (set $MAX_PROCESSES to the number of files processed at the same time):
您可以使用Parallel :: ForkManager(将$ MAX_PROCESSES设置为同时处理的文件数):
use Parallel::ForkManager;
use strict;
use warnings;
my @initialImages = <*>;
foreach my $file (@initialImages) {
if($file =~ /.png/){
print "processing $file...\n";
my @tmp=split(/\./,$file);
my $name="";
for(my $i=0;$i<(@tmp-1);$i++) {
if($name eq "") { $name = $tmp[$i]; } else { $name=$name.".".$tmp[$i];}
}
my $exten=$tmp[(@tmp-1)];
my $orig=$name.".".$exten;
$pm = new Parallel::ForkManager($MAX_PROCESSES);
my $pid = $pm->start and next;
system("perl magicPlate.pl -i ".$orig." -min 4 -max 160 -d 1");
$pm->finish; # Terminates the child process
}
}
But as suggested by Hugmeir running perl interpreter again and again for each new file is not a good idea.
但正如Hugmeir所建议的那样,为每个新文件一次又一次地运行perl解释器并不是一个好主意。
#2
7
You should consider NOT creating a new process for each file that you want to process -- It's horribly inefficient, and probably what is taking most of your time here. Just loading up Perl and whatever modules you use over and over ought to be creating some overhead. I recall a poster on PerlMonks that did something similar, and ended up transforming his second script into a module, reducing the worktime from an hour to a couple of minutes. Not that you should expect such a dramatic improvement, but one can dream..
您应该考虑不为要处理的每个文件创建一个新进程 - 这非常低效,而且可能需要花费大部分时间在这里。只需加载Perl以及您反复使用的任何模块都应该会产生一些开销。我记得PerlMonks上的一张海报做了类似的事情,最后将他的第二个脚本转换成了一个模块,将工作时间从一小时缩短到几分钟。并不是说你应该期待如此显着的改善,但人们可以梦想......
With the second script refactored as a module, here's an example of thread usage, in which BrowserUK creates a thread pool, feeding it jobs through a queue.
将第二个脚本重构为模块,这是一个线程使用示例,其中BrowserUK创建一个线程池,通过队列为其提供作业。
#3
3
- Import "maigcplate" and use threading.
- Start magicplate.pl in the background (you would need to add process throttling)
- Import "magicplate" and use fork (add process throttling and a kiddy reaper)
- Make "maigcplate" a daemon with a pool of workers = # of CPUs
- use an MQ implementation for communication
- use sockets for communication
使用MQ实现进行通信
使用套接字进行通信
- Use webserver(nginx, apache, ...) and wrap in REST for a webservice
- etc...
导入“maigcplate”并使用线程。
在后台启动magicplate.pl(您需要添加进程限制)
导入“魔法板”并使用fork(添加进程限制和kiddy收割者)
使“maigcplate”成为带有工作池的守护进程= CPU数量使用MQ实现进行通信使用套接字进行通信
使用webserver(nginx,apache,...)并在REST中包装Web服务
All these center around creating multiple workers that can each run on their own cpu. Certain implementations will use resources better (those that don't start a new process) and be easier to implement and maintain.
所有这些都围绕创建多个工作人员,每个工作人员可以在自己的cpu上运行。某些实现将更好地使用资源(那些不启动新进程的资源)并且更易于实现和维护。
#1
3
You could use Parallel::ForkManager (set $MAX_PROCESSES to the number of files processed at the same time):
您可以使用Parallel :: ForkManager(将$ MAX_PROCESSES设置为同时处理的文件数):
use Parallel::ForkManager;
use strict;
use warnings;
my @initialImages = <*>;
foreach my $file (@initialImages) {
if($file =~ /.png/){
print "processing $file...\n";
my @tmp=split(/\./,$file);
my $name="";
for(my $i=0;$i<(@tmp-1);$i++) {
if($name eq "") { $name = $tmp[$i]; } else { $name=$name.".".$tmp[$i];}
}
my $exten=$tmp[(@tmp-1)];
my $orig=$name.".".$exten;
$pm = new Parallel::ForkManager($MAX_PROCESSES);
my $pid = $pm->start and next;
system("perl magicPlate.pl -i ".$orig." -min 4 -max 160 -d 1");
$pm->finish; # Terminates the child process
}
}
But as suggested by Hugmeir running perl interpreter again and again for each new file is not a good idea.
但正如Hugmeir所建议的那样,为每个新文件一次又一次地运行perl解释器并不是一个好主意。
#2
7
You should consider NOT creating a new process for each file that you want to process -- It's horribly inefficient, and probably what is taking most of your time here. Just loading up Perl and whatever modules you use over and over ought to be creating some overhead. I recall a poster on PerlMonks that did something similar, and ended up transforming his second script into a module, reducing the worktime from an hour to a couple of minutes. Not that you should expect such a dramatic improvement, but one can dream..
您应该考虑不为要处理的每个文件创建一个新进程 - 这非常低效,而且可能需要花费大部分时间在这里。只需加载Perl以及您反复使用的任何模块都应该会产生一些开销。我记得PerlMonks上的一张海报做了类似的事情,最后将他的第二个脚本转换成了一个模块,将工作时间从一小时缩短到几分钟。并不是说你应该期待如此显着的改善,但人们可以梦想......
With the second script refactored as a module, here's an example of thread usage, in which BrowserUK creates a thread pool, feeding it jobs through a queue.
将第二个脚本重构为模块,这是一个线程使用示例,其中BrowserUK创建一个线程池,通过队列为其提供作业。
#3
3
- Import "maigcplate" and use threading.
- Start magicplate.pl in the background (you would need to add process throttling)
- Import "magicplate" and use fork (add process throttling and a kiddy reaper)
- Make "maigcplate" a daemon with a pool of workers = # of CPUs
- use an MQ implementation for communication
- use sockets for communication
使用MQ实现进行通信
使用套接字进行通信
- Use webserver(nginx, apache, ...) and wrap in REST for a webservice
- etc...
导入“maigcplate”并使用线程。
在后台启动magicplate.pl(您需要添加进程限制)
导入“魔法板”并使用fork(添加进程限制和kiddy收割者)
使“maigcplate”成为带有工作池的守护进程= CPU数量使用MQ实现进行通信使用套接字进行通信
使用webserver(nginx,apache,...)并在REST中包装Web服务
All these center around creating multiple workers that can each run on their own cpu. Certain implementations will use resources better (those that don't start a new process) and be easier to implement and maintain.
所有这些都围绕创建多个工作人员,每个工作人员可以在自己的cpu上运行。某些实现将更好地使用资源(那些不启动新进程的资源)并且更易于实现和维护。