This code works:
这段代码:
library(plyr)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE)
While this code fails:
虽然这个代码失败:
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)
stopWorkers(workers)
>Error in do.ply(i) : task 3 failed - "subscript out of bounds"
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
I am using R 2.1.12, plyr 1.4 and doSMP 1.0-1. Has anyone figured out a way around this?
我使用的是r2.1.12、plyr 1.4和doSMP 1.0-1。有人想出办法来解决这个问题吗?
edit: In response to Andrie, here is a further illustration:
编辑:为了回应Andrie,这里有进一步的说明:
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #1
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #2
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #3
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #4
stopWorkers(workers)
The first three functions work, but they all take about 3 seconds. Function #2 gives a warning that no parallel backend is registered, and thus executes sequentially. Function #4 gives the same error I referenced in my original post.
前三个函数可以工作,但是它们都需要3秒。函数#2给出一个警告,没有注册任何并行后端,因此按顺序执行。函数#4给出了我在最初文章中引用的相同错误。
/edit: curioser and curiouser: On my mac, the following works:
在我的mac上,下面的作品:
library(plyr)
library(doMC)
registerDoMC()
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)
But this fails:
但这种失败:
library(plyr)
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)
stopWorkers(workers)
And this fails too:
这也失败:
library(plyr)
library(snow)
library(doSNOW)
cl <- makeCluster(2, type = "SOCK")
registerDoSNOW(cl)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)
stopCluster(cl)
So I suppose the various parallel back ends for foreach are not interchangeable.
所以我认为对于每一个平行的后端是不可互换的。
3 个解决方案
#1
4
While the question has been answered well by @hadley, I want to add that I think plyr now works with other foreach parallel back-ends. Here is a link to a blog entry containing an example where plyr is used in conjunction with doSNOW:
虽然@hadley很好地回答了这个问题,但我想补充一点,我认为plyr现在可以与其他并行的后端一起工作。这里是一个博客条目的链接,其中包含了plyr与doSNOW一起使用的例子:
#2
2
Just to confirm @LeeZamparo's answer, plyr
does now seem to work with snow
, at least on on Windows 7 with R version 2.15.0. The last chunk of code in the question works, though with cryptic warnings:
只是为了证实@LeeZamparo的回答,plyr现在似乎确实可以在snow上运行,至少在Windows 7上可以在R版本2.15.0上运行。问题中的最后一段代码是有效的,尽管有一些隐晦的警告:
library(plyr)
library(snow)
library(doSNOW)
cl <- makeCluster(2, type = "SOCK")
registerDoSNOW(cl)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
library(microbenchmark)
mb <- microbenchmark(
PP <- ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE),
NP <- ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE)
)
stopCluster(cl)
Cryptic warnings:
神秘的警告:
> warnings()
Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...
It's not quick, I guess that's the overhead...
它不快,我猜那是开销……
> mb
Unit: milliseconds
expr
1 NP <- ddply(x, .(V), function(df) sum(df$Z), .parallel = FALSE)
2 PP <- ddply(x, .(V), function(df) sum(df$Z), .parallel = TRUE)
min lq median uq max
1 11.91518 15.74567 20.10944 23.30453 38.09237
2 314.58008 336.81160 348.42421 358.57337 575.11220
Check it gives the expected result
检查它是否给出了预期的结果
> PP
V V1
1 X 4
2 Y 6
3 Z 5
Extra details about this session:
关于这次会议的额外细节:
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] microbenchmark_1.1-3 doSNOW_1.0.6 iterators_1.0.6
[4] foreach_1.4.0 plyr_1.7.1 snow_0.3-10
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.0 tools_2.15.0
#3
1
It turns out plyr only works with doMC, but the developer is working on it.
事实证明,plyr只适用于doMC,但是开发人员正在研究它。
#1
4
While the question has been answered well by @hadley, I want to add that I think plyr now works with other foreach parallel back-ends. Here is a link to a blog entry containing an example where plyr is used in conjunction with doSNOW:
虽然@hadley很好地回答了这个问题,但我想补充一点,我认为plyr现在可以与其他并行的后端一起工作。这里是一个博客条目的链接,其中包含了plyr与doSNOW一起使用的例子:
#2
2
Just to confirm @LeeZamparo's answer, plyr
does now seem to work with snow
, at least on on Windows 7 with R version 2.15.0. The last chunk of code in the question works, though with cryptic warnings:
只是为了证实@LeeZamparo的回答,plyr现在似乎确实可以在snow上运行,至少在Windows 7上可以在R版本2.15.0上运行。问题中的最后一段代码是有效的,尽管有一些隐晦的警告:
library(plyr)
library(snow)
library(doSNOW)
cl <- makeCluster(2, type = "SOCK")
registerDoSNOW(cl)
x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5)
library(microbenchmark)
mb <- microbenchmark(
PP <- ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE),
NP <- ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE)
)
stopCluster(cl)
Cryptic warnings:
神秘的警告:
> warnings()
Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...
It's not quick, I guess that's the overhead...
它不快,我猜那是开销……
> mb
Unit: milliseconds
expr
1 NP <- ddply(x, .(V), function(df) sum(df$Z), .parallel = FALSE)
2 PP <- ddply(x, .(V), function(df) sum(df$Z), .parallel = TRUE)
min lq median uq max
1 11.91518 15.74567 20.10944 23.30453 38.09237
2 314.58008 336.81160 348.42421 358.57337 575.11220
Check it gives the expected result
检查它是否给出了预期的结果
> PP
V V1
1 X 4
2 Y 6
3 Z 5
Extra details about this session:
关于这次会议的额外细节:
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] microbenchmark_1.1-3 doSNOW_1.0.6 iterators_1.0.6
[4] foreach_1.4.0 plyr_1.7.1 snow_0.3-10
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.0 tools_2.15.0
#3
1
It turns out plyr only works with doMC, but the developer is working on it.
事实证明,plyr只适用于doMC,但是开发人员正在研究它。