I am calculating permutation test statistic using a for loop. I wish to speed this up using parallel processing (in particular, foreach in foreach package). I am following the instructions from: https://beckmw.wordpress.com/2014/01/21/a-brief-foray-into-parallel-processing-with-r/
我正在使用for循环计算置换检验统计量。我希望使用并行处理(特别是foreach包中的foreach)加快速度。我按照以下说明操作:https://beckmw.wordpress.com/2014/01/21/a-brief-foray-into-parallel-processing-with-r/
My original code:
我原来的代码:
library(foreach)
library(doParallel)
set.seed(10)
x = rnorm(1000)
y = rnorm(1000)
n = length(x)
nexp = 10000
perm.stat1 = numeric(n)
ptm = proc.time()
for (i in 1:nexp){
y = sample(y)
perm.stat1[i] = cor(x,y,method = "pearson")
}
proc.time()-ptm
# 1.321 seconds
However, when I used the foreach loop, I got the result much slower:
但是,当我使用foreach循环时,我得到的结果要慢得多:
cl<-makeCluster(8)
registerDoParallel(cl)
perm.stat2 = numeric(n)
ptm = proc.time()
perm.stat2 = foreach(icount(nexp), .combine=c) %dopar% {
y = sample(y)
cor(x,y,method = "pearson")
}
proc.time()-ptm
stopCluster(cl)
#3.884 seconds
Why is this happening? What did I do wrong? Thanks
为什么会这样?我做错了什么?谢谢
2 个解决方案
#1
1
You're getting bad performance because you're splitting up a small problem into 10,000 tasks, each of which takes about an eighth of a millisecond to execute. It's alright to simply turn a for
loop into a foreach
loop when the body of the loop takes a significant period of time (I used to say at least 10 seconds, but I've dropped that to at least a second nowadays), but that simple strategy doesn't work when the tasks are very small (in this case, extremely small). When the tasks are small you spend most of your time sending the tasks and receiving the results from workers. In other words, the communication overhead is greater than the computation time. Frankly, I'm amazed that you didn't get much worse performance.
你的性能很差,因为你将一个小问题分成10,000个任务,每个任务执行大约需要八分之一毫秒。当循环体需要很长一段时间(我曾经说过至少10秒,但我现在已经把它放到了至少一秒钟)时,简单地将for循环变成foreach循环是没关系的,但是那个当任务非常小时(在这种情况下,非常小),简单策略不起作用。当任务很小时,您将花费大部分时间来发送任务并从工作人员那里接收结果。换句话说,通信开销大于计算时间。坦率地说,我很惊讶你没有表现得更差。
To me, it doesn't really seem worthwhile to parallelize a problem that takes less than two seconds to execute, but you can actually get a speed up using foreach
by chunking. That is, you split the problem into smaller chunks, usually giving one chunk to each worker. Here's an example:
对我来说,并行处理执行时间不到两秒的问题并不值得,但实际上你可以通过分块来加速使用foreach。也就是说,您将问题拆分为较小的块,通常为每个工作者提供一个块。这是一个例子:
nw <- getDoParWorkers()
perm.stat1 <-
foreach(xnexp=idiv(nexp, chunks=nw), .combine=c) %dopar% {
p = numeric(xnexp)
for (i in 1:xnexp) {
y = sample(y)
p[i] = cor(x,y,method="pearson")
}
p
}
As you can see, the foreach
loop is splitting the problem into chunks, and the body of that loop contains a modified version of the original sequential code, now operating on a fraction of the entire problem.
如您所见,foreach循环将问题拆分为块,并且该循环的主体包含原始顺序代码的修改版本,现在在整个问题的一小部分上运行。
On my four core Mac laptop, this executes in 0.447 seconds, compared to 1.245 seconds for the sequential version. That seems like a very respectable speed up to me.
在我的四核Mac笔记本电脑上,这在0.447秒内执行,而顺序版本则为1.245秒。这对我来说似乎是一个非常可观的速度。
#2
0
There's a lot more computational overhead in the foreach
loop. This returns a list containing each execution of the loop's body that is then combined into a vector via the .combine=c
argument. The for
loop does not return anything, instead assigning a value to perm.stat1
as a side effect, so does not need any extra overhead.
foreach循环中有更多的计算开销。这将返回一个列表,其中包含循环体的每次执行,然后通过.combine = c参数将其合并到一个向量中。 for循环不返回任何内容,而是将一个值赋给perm.stat1作为副作用,因此不需要任何额外的开销。
Have a look at Why is foreach() %do% sometimes slower than for? for a more in-depth explaination of why foreach
is slower than for
in many cases. Where foreach
comes into its own is when the operations inside the loop are computationally intensive, making the time penalty associated with returning each value in a list insignificant by comparison. For example, the combination of rnorm
and summary
used in the Wordpress article above.
看看为什么foreach()%do%有时慢于?为了更深入地解释为什么foreach比在许多情况下慢。 foreach自成一体的是当循环内的操作是计算密集的时候,与通过比较返回列表中的每个值相关的时间损失是不重要的。例如,上面的Wordpress文章中使用的rnorm和summary的组合。
#1
1
You're getting bad performance because you're splitting up a small problem into 10,000 tasks, each of which takes about an eighth of a millisecond to execute. It's alright to simply turn a for
loop into a foreach
loop when the body of the loop takes a significant period of time (I used to say at least 10 seconds, but I've dropped that to at least a second nowadays), but that simple strategy doesn't work when the tasks are very small (in this case, extremely small). When the tasks are small you spend most of your time sending the tasks and receiving the results from workers. In other words, the communication overhead is greater than the computation time. Frankly, I'm amazed that you didn't get much worse performance.
你的性能很差,因为你将一个小问题分成10,000个任务,每个任务执行大约需要八分之一毫秒。当循环体需要很长一段时间(我曾经说过至少10秒,但我现在已经把它放到了至少一秒钟)时,简单地将for循环变成foreach循环是没关系的,但是那个当任务非常小时(在这种情况下,非常小),简单策略不起作用。当任务很小时,您将花费大部分时间来发送任务并从工作人员那里接收结果。换句话说,通信开销大于计算时间。坦率地说,我很惊讶你没有表现得更差。
To me, it doesn't really seem worthwhile to parallelize a problem that takes less than two seconds to execute, but you can actually get a speed up using foreach
by chunking. That is, you split the problem into smaller chunks, usually giving one chunk to each worker. Here's an example:
对我来说,并行处理执行时间不到两秒的问题并不值得,但实际上你可以通过分块来加速使用foreach。也就是说,您将问题拆分为较小的块,通常为每个工作者提供一个块。这是一个例子:
nw <- getDoParWorkers()
perm.stat1 <-
foreach(xnexp=idiv(nexp, chunks=nw), .combine=c) %dopar% {
p = numeric(xnexp)
for (i in 1:xnexp) {
y = sample(y)
p[i] = cor(x,y,method="pearson")
}
p
}
As you can see, the foreach
loop is splitting the problem into chunks, and the body of that loop contains a modified version of the original sequential code, now operating on a fraction of the entire problem.
如您所见,foreach循环将问题拆分为块,并且该循环的主体包含原始顺序代码的修改版本,现在在整个问题的一小部分上运行。
On my four core Mac laptop, this executes in 0.447 seconds, compared to 1.245 seconds for the sequential version. That seems like a very respectable speed up to me.
在我的四核Mac笔记本电脑上,这在0.447秒内执行,而顺序版本则为1.245秒。这对我来说似乎是一个非常可观的速度。
#2
0
There's a lot more computational overhead in the foreach
loop. This returns a list containing each execution of the loop's body that is then combined into a vector via the .combine=c
argument. The for
loop does not return anything, instead assigning a value to perm.stat1
as a side effect, so does not need any extra overhead.
foreach循环中有更多的计算开销。这将返回一个列表,其中包含循环体的每次执行,然后通过.combine = c参数将其合并到一个向量中。 for循环不返回任何内容,而是将一个值赋给perm.stat1作为副作用,因此不需要任何额外的开销。
Have a look at Why is foreach() %do% sometimes slower than for? for a more in-depth explaination of why foreach
is slower than for
in many cases. Where foreach
comes into its own is when the operations inside the loop are computationally intensive, making the time penalty associated with returning each value in a list insignificant by comparison. For example, the combination of rnorm
and summary
used in the Wordpress article above.
看看为什么foreach()%do%有时慢于?为了更深入地解释为什么foreach比在许多情况下慢。 foreach自成一体的是当循环内的操作是计算密集的时候,与通过比较返回列表中的每个值相关的时间损失是不重要的。例如,上面的Wordpress文章中使用的rnorm和summary的组合。