使用命令行参数并行运行多个R脚本

时间:2022-08-06 13:54:40

I have an R script that performs analysis on one chromosome. I want to run this script repeatedly for each chromosome (1-22, X and Y). Right now I have the script set up to accept one argument from command line, the chromosome number. I want to submit multiple jobs to my server in parallel since analysis for one chromosome takes a few hours. After playing around with some options and googling everything, I'm still not sure what the best option is as I've never submitted jobs in parallel to a server (Sun Grid Engine server). I looked into GNU parallel but I'm not sure how to use it or if it even runs for R scripts. Maybe throw everything in a shell script and submit that to the server? This is a pretty basic question, but any direction would be greatly appreciated!

我有一个R脚本,可以在一条染色体上进行分析。我想为每个染色体(1-22,X和Y)重复运行这个脚本。现在我将脚本设置为从命令行接受一个参数,即染色体编号。我想并行地向我的服务器提交多个作业,因为对一条染色体的分析需要几个小时。在玩了一些选项并搜索了所有内容后,我仍然不确定最佳选择是什么,因为我从未向服务器(Sun Grid Engine服务器)并行提交作业。我研究了GNU parallel,但我不确定如何使用它,或者它是否运行R脚本。也许把所有内容都放在shell脚本中并将其提交给服务器?这是一个非常基本的问题,但任何方向都将非常感谢!

2 个解决方案

#1


0  

parallel Rscript plot_LRR_BAF_chromosome_parallel ::: {1..22} X Y

#2


0  

using GNU make with option -j , replace __CHROM__ in your R script with the chromosome name.

使用带有选项-j的GNU make,将R脚本中的__CHROM__替换为染色体名称。

chroms=1 2 3 4 5 6 7 8 9 10

define method1

$$(addsuffix .out,$(1)) : script.R
    cat $$< | sed 's/__CHROM__/$(1)/g' | R --nosave > $$@

endef

all: $(addsuffix .out,$(chroms))

$(foreach C, $(chroms),$(eval $(call method1, $(C) )))

#1


0  

parallel Rscript plot_LRR_BAF_chromosome_parallel ::: {1..22} X Y

#2


0  

using GNU make with option -j , replace __CHROM__ in your R script with the chromosome name.

使用带有选项-j的GNU make,将R脚本中的__CHROM__替换为染色体名称。

chroms=1 2 3 4 5 6 7 8 9 10

define method1

$$(addsuffix .out,$(1)) : script.R
    cat $$< | sed 's/__CHROM__/$(1)/g' | R --nosave > $$@

endef

all: $(addsuffix .out,$(chroms))

$(foreach C, $(chroms),$(eval $(call method1, $(C) )))