如何在Windows上进行并行化 - 例如?

时间:2022-06-15 14:56:15

How do I get parallelizaton of code to work in r in Windows? Include a simple example. Posting this self-answered question because this was rather unpleasant to get working. You'll find package parallel does NOT work on its own, but package snow works very well.

如何在Windows中使用r的代码并行化代码?包括一个简单的例子。发布这个自我回答的问题,因为这对于工作来说是相当不愉快的。你会发现包并行不能单独工作,但包雪工作得非常好。

4 个解决方案

#1


29  

Posting this because this took me bloody forever to figure out. Here's a simple example of parallelization in r that will let you test if things are working right for you and get you on the right path.

发布这个因为这让我感到血腥永远弄明白。这是r中并行化的一个简单示例,它可以让您测试事情是否适合您并让您走上正确的道路。

library(snow)
z=vector('list',4)
z=1:4
system.time(lapply(z,function(x) Sys.sleep(1)))
cl<-makeCluster(###YOUR NUMBER OF CORES GOES HERE ###,type="SOCK")
system.time(clusterApply(cl, z,function(x) Sys.sleep(1)))
stopCluster(cl)

You should also use library doSNOW to register foreach to the snow cluster, this will cause many packages to parallelize automatically. The command to register is registerDoSNOW(cl) (with cl being the return value from makeCluster()) , the command that undoes registration is registerDoSEQ(). Don't forget to turn off your clusters.

您还应该使用库doSNOW将foreach注册到snow集群,这将导致许多程序包自动并行化。注册命令是registerDoSNOW(cl)(cl是来自makeCluster()的返回值),撤消注册的命令是registerDoSEQ()。不要忘记关闭群集。

#2


10  

This worked for me, I used package doParallel, required 3 lines of code:

这对我有用,我用的是doParallel包,需要3行代码:

# process in parallel
library(doParallel) 
cl <- makeCluster(detectCores(), type='PSOCK')
registerDoParallel(cl)

# turn parallel processing off and run sequentially again:
registerDoSEQ()

Calculation of a random forest decreased from 180 secs to 120 secs (on a Windows computer with 4 cores).

随机森林的计算从180秒减少到120秒(在具有4个核心的Windows计算机上)。

#3


5  

I think these libraries will help you:

我认为这些库可以帮助您:

foreach (facilitates executing the loop in parallel)
doSNOW (I think you already use it)
doMC (multicore functionality of the parallel package)

May these article also help you

愿这些文章也能帮到你

http://vikparuchuri.com/blog/parallel-r-loops-for-windows-and-linux/

http://vikparuchuri.com/blog/parallel-r-loops-for-windows-and-linux/

http://www.joyofdata.de/blog/parallel-computing-r-windows-using-dosnow-foreach/

http://www.joyofdata.de/blog/parallel-computing-r-windows-using-dosnow-foreach/

#4


3  

Based on the information here I was able to convert the following code into a parallelised version that worked under R Studio on Windows 7.

根据这里的信息,我能够将以下代码转换为在Windows 7上的R Studio下工作的并行化版本。

Original code:

原始代码:

#
# Basic elbow plot function
#
wssplot <- function(data, nc=20, seed=1234){
    wss <- (nrow(data)-1)*sum(apply(data,2,var))
    for (i in 2:nc){
        set.seed(seed)
        wss[i] <- sum(kmeans(data, centers=i, iter.max=30)$withinss)}
    plot(1:nc, wss, type="b", xlab="Number of clusters", 
       ylab="Within groups sum of squares")
}

Parallelised code:

并行代码:

library("parallel")

workerFunc <- function(nc) {
  set.seed(1234)
  return(sum(kmeans(my_data_frame, centers=nc, iter.max=30)$withinss)) }

num_cores <- detectCores()
cl <- makeCluster(num_cores)
clusterExport(cl, varlist=c("my_data_frame")) 
values <- 1:20 # this represents the "nc" variable in the wssplot function
system.time(
  result <- parLapply(cl, values, workerFunc) )  # paralel execution, with time wrapper
stopCluster(cl)
plot(values, unlist(result), type="b", xlab="Number of clusters", ylab="Within groups sum of squares")

Not suggesting it's perfect or even best, just a beginner demonstrating that parallel does seem to work under Windows. Hope it helps.

并不是说它是完美的甚至是最好的,只是一个初学者证明并行似乎在Windows下工作。希望能帮助到你。

#1


29  

Posting this because this took me bloody forever to figure out. Here's a simple example of parallelization in r that will let you test if things are working right for you and get you on the right path.

发布这个因为这让我感到血腥永远弄明白。这是r中并行化的一个简单示例,它可以让您测试事情是否适合您并让您走上正确的道路。

library(snow)
z=vector('list',4)
z=1:4
system.time(lapply(z,function(x) Sys.sleep(1)))
cl<-makeCluster(###YOUR NUMBER OF CORES GOES HERE ###,type="SOCK")
system.time(clusterApply(cl, z,function(x) Sys.sleep(1)))
stopCluster(cl)

You should also use library doSNOW to register foreach to the snow cluster, this will cause many packages to parallelize automatically. The command to register is registerDoSNOW(cl) (with cl being the return value from makeCluster()) , the command that undoes registration is registerDoSEQ(). Don't forget to turn off your clusters.

您还应该使用库doSNOW将foreach注册到snow集群,这将导致许多程序包自动并行化。注册命令是registerDoSNOW(cl)(cl是来自makeCluster()的返回值),撤消注册的命令是registerDoSEQ()。不要忘记关闭群集。

#2


10  

This worked for me, I used package doParallel, required 3 lines of code:

这对我有用,我用的是doParallel包,需要3行代码:

# process in parallel
library(doParallel) 
cl <- makeCluster(detectCores(), type='PSOCK')
registerDoParallel(cl)

# turn parallel processing off and run sequentially again:
registerDoSEQ()

Calculation of a random forest decreased from 180 secs to 120 secs (on a Windows computer with 4 cores).

随机森林的计算从180秒减少到120秒(在具有4个核心的Windows计算机上)。

#3


5  

I think these libraries will help you:

我认为这些库可以帮助您:

foreach (facilitates executing the loop in parallel)
doSNOW (I think you already use it)
doMC (multicore functionality of the parallel package)

May these article also help you

愿这些文章也能帮到你

http://vikparuchuri.com/blog/parallel-r-loops-for-windows-and-linux/

http://vikparuchuri.com/blog/parallel-r-loops-for-windows-and-linux/

http://www.joyofdata.de/blog/parallel-computing-r-windows-using-dosnow-foreach/

http://www.joyofdata.de/blog/parallel-computing-r-windows-using-dosnow-foreach/

#4


3  

Based on the information here I was able to convert the following code into a parallelised version that worked under R Studio on Windows 7.

根据这里的信息,我能够将以下代码转换为在Windows 7上的R Studio下工作的并行化版本。

Original code:

原始代码:

#
# Basic elbow plot function
#
wssplot <- function(data, nc=20, seed=1234){
    wss <- (nrow(data)-1)*sum(apply(data,2,var))
    for (i in 2:nc){
        set.seed(seed)
        wss[i] <- sum(kmeans(data, centers=i, iter.max=30)$withinss)}
    plot(1:nc, wss, type="b", xlab="Number of clusters", 
       ylab="Within groups sum of squares")
}

Parallelised code:

并行代码:

library("parallel")

workerFunc <- function(nc) {
  set.seed(1234)
  return(sum(kmeans(my_data_frame, centers=nc, iter.max=30)$withinss)) }

num_cores <- detectCores()
cl <- makeCluster(num_cores)
clusterExport(cl, varlist=c("my_data_frame")) 
values <- 1:20 # this represents the "nc" variable in the wssplot function
system.time(
  result <- parLapply(cl, values, workerFunc) )  # paralel execution, with time wrapper
stopCluster(cl)
plot(values, unlist(result), type="b", xlab="Number of clusters", ylab="Within groups sum of squares")

Not suggesting it's perfect or even best, just a beginner demonstrating that parallel does seem to work under Windows. Hope it helps.

并不是说它是完美的甚至是最好的,只是一个初学者证明并行似乎在Windows下工作。希望能帮助到你。