在Mac OS X版本10.6.7上,当.parallel = TRUE时,ddply较慢

时间:2021-08-07 06:58:47

I am trying to get ddply to run in parallel on my mac. The code I've used is as follows:

我想让ddply在我的Mac上并行运行。我使用的代码如下:

library(doMC)
library(ggplot2) # for the purposes of getting the baseball data.frame
registerDoMC(2)


> system.time(ddply(baseball, .(year), numcolwise(mean)))
   user  system elapsed 
  0.959   0.106   1.522 
> system.time(ddply(baseball, .(year), numcolwise(mean), .parallel=TRUE))
   user  system elapsed 
  2.221   2.790   2.552 

Why is ddply slower when I run .parallel=TRUE? I have searched online to no avail. I've also tried registerDoMC() and the results were the same.

当我运行.parallel = TRUE时,为什么ddply会变慢?我在网上搜索无济于事。我也尝试过registerDoMC(),结果是一样的。

2 个解决方案

#1


11  

The baseball data may be too small to see improvement by making the computations parallel; the overhead of passing the data to the different processes may be swamping any speedup by doing the calculations in parallel. Using the rbenchmark package:

通过使计算平行,棒球数据可能太小而无法看到改进;将数据传递给不同进程的开销可能是通过并行执行计算来淹没任何加速。使用rbenchmark包:

baseball10 <- baseball[rep(seq(length=nrow(baseball)), 10),]

benchmark(noparallel = ddply(baseball, .(year), numcolwise(mean)),
    parallel = ddply(baseball, .(year), numcolwise(mean), .parallel=TRUE),
    noparallel10 = ddply(baseball10, .(year), numcolwise(mean)),
    parallel10 = ddply(baseball10, .(year), numcolwise(mean), .parallel=TRUE),
    replications = 10)

gives results

给出结果

          test replications elapsed relative user.self sys.self user.child sys.child
1   noparallel           10   4.562 1.000000     4.145    0.408      0.000     0.000
3 noparallel10           10  14.134 3.098203     9.815    4.242      0.000     0.000
2     parallel           10  11.927 2.614423     2.394    1.107      4.836     6.891
4   parallel10           10  18.406 4.034634     4.045    2.580     10.210     9.769

With a 10 times bigger data set, the penalty for parallel is smaller. A more complicated computation would also tilt it even further in parallel's favor, likely giving it an advantage.

如果数据集大10倍,则并行的代价就会降低。一个更复杂的计算也会使它更加平行,有利于它。

This was run on a Mac OS X 10.5.8 Core 2 Duo machine.

这是在Mac OS X 10.5.8 Core 2 Duo机器上运行的。

#2


9  

Running in parallel will be slower than running sequentially when the communication costs between the nodes is greater than the calculation time of the function. In other words, it takes longer to send the data to/from the nodes than it does to perform the calculation.

当节点之间的通信成本大于函数的计算时间时,并行运行将比顺序运行慢。换句话说,与执行计算相比,向/从节点发送数据所需的时间更长。

For the same data set, the communication costs are approximately fixed, so parallel processing is going to be more useful as the time spent evaluating the function increases.

对于相同的数据集,通信成本大致是固定的,因此随着评估功能所花费的时间的增加,并行处理将变得更加有用。

UPDATE:
The code below shows 0.14 seconds (on my machine) are spent is spent evaluating .fun. That means communication has to be less than 0.07 seconds and that's not realistic for a data set the size of baseball.

更新:下面的代码显示0.14秒(在我的机器上)用于评估.fun。这意味着通信必须小于0.07秒,这对于棒球大小的数据集来说是不现实的。

Rprof()
system.time(ddply(baseball, .(year), numcolwise(mean)))
#    user  system elapsed 
#    0.28    0.02    0.30
Rprof(NULL)
summaryRprof()$by.self
#               self.time self.pct total.time total.pct
# [.data.frame       0.04    12.50       0.10     31.25
# unlist             0.04    12.50       0.10     31.25
# match              0.04    12.50       0.04     12.50
# .fun               0.02     6.25       0.14     43.75
# structure          0.02     6.25       0.12     37.50
# [[                 0.02     6.25       0.08     25.00
# FUN                0.02     6.25       0.06     18.75
# rbind.fill         0.02     6.25       0.06     18.75
# anyDuplicated      0.02     6.25       0.02      6.25
# gc                 0.02     6.25       0.02      6.25
# is.array           0.02     6.25       0.02      6.25
# list               0.02     6.25       0.02      6.25
# mean.default       0.02     6.25       0.02      6.25

Here's the parallel version with snow:

这是带雪的并行版本:

library(doSNOW)
cl <- makeSOCKcluster(2)
registerDoSNOW(cl)

Rprof()
system.time(ddply(baseball, .(year), numcolwise(mean), .parallel=TRUE))
#    user  system elapsed 
#    0.46    0.01    0.73
Rprof(NULL)
summaryRprof()$by.self
#                     self.time self.pct total.time total.pct
# .Call                    0.24    33.33       0.24     33.33
# socketSelect             0.16    22.22       0.16     22.22
# lazyLoadDBfetch          0.08    11.11       0.08     11.11
# accumulate.iforeach      0.04     5.56       0.06      8.33
# rbind.fill               0.04     5.56       0.06      8.33
# structure                0.04     5.56       0.04      5.56
# <Anonymous>              0.02     2.78       0.54     75.00
# lapply                   0.02     2.78       0.04      5.56
# constantFoldEnv          0.02     2.78       0.02      2.78
# gc                       0.02     2.78       0.02      2.78
# stopifnot                0.02     2.78       0.02      2.78
# summary.connection       0.02     2.78       0.02      2.78

#1


11  

The baseball data may be too small to see improvement by making the computations parallel; the overhead of passing the data to the different processes may be swamping any speedup by doing the calculations in parallel. Using the rbenchmark package:

通过使计算平行,棒球数据可能太小而无法看到改进;将数据传递给不同进程的开销可能是通过并行执行计算来淹没任何加速。使用rbenchmark包:

baseball10 <- baseball[rep(seq(length=nrow(baseball)), 10),]

benchmark(noparallel = ddply(baseball, .(year), numcolwise(mean)),
    parallel = ddply(baseball, .(year), numcolwise(mean), .parallel=TRUE),
    noparallel10 = ddply(baseball10, .(year), numcolwise(mean)),
    parallel10 = ddply(baseball10, .(year), numcolwise(mean), .parallel=TRUE),
    replications = 10)

gives results

给出结果

          test replications elapsed relative user.self sys.self user.child sys.child
1   noparallel           10   4.562 1.000000     4.145    0.408      0.000     0.000
3 noparallel10           10  14.134 3.098203     9.815    4.242      0.000     0.000
2     parallel           10  11.927 2.614423     2.394    1.107      4.836     6.891
4   parallel10           10  18.406 4.034634     4.045    2.580     10.210     9.769

With a 10 times bigger data set, the penalty for parallel is smaller. A more complicated computation would also tilt it even further in parallel's favor, likely giving it an advantage.

如果数据集大10倍,则并行的代价就会降低。一个更复杂的计算也会使它更加平行,有利于它。

This was run on a Mac OS X 10.5.8 Core 2 Duo machine.

这是在Mac OS X 10.5.8 Core 2 Duo机器上运行的。

#2


9  

Running in parallel will be slower than running sequentially when the communication costs between the nodes is greater than the calculation time of the function. In other words, it takes longer to send the data to/from the nodes than it does to perform the calculation.

当节点之间的通信成本大于函数的计算时间时,并行运行将比顺序运行慢。换句话说,与执行计算相比,向/从节点发送数据所需的时间更长。

For the same data set, the communication costs are approximately fixed, so parallel processing is going to be more useful as the time spent evaluating the function increases.

对于相同的数据集,通信成本大致是固定的,因此随着评估功能所花费的时间的增加,并行处理将变得更加有用。

UPDATE:
The code below shows 0.14 seconds (on my machine) are spent is spent evaluating .fun. That means communication has to be less than 0.07 seconds and that's not realistic for a data set the size of baseball.

更新:下面的代码显示0.14秒(在我的机器上)用于评估.fun。这意味着通信必须小于0.07秒,这对于棒球大小的数据集来说是不现实的。

Rprof()
system.time(ddply(baseball, .(year), numcolwise(mean)))
#    user  system elapsed 
#    0.28    0.02    0.30
Rprof(NULL)
summaryRprof()$by.self
#               self.time self.pct total.time total.pct
# [.data.frame       0.04    12.50       0.10     31.25
# unlist             0.04    12.50       0.10     31.25
# match              0.04    12.50       0.04     12.50
# .fun               0.02     6.25       0.14     43.75
# structure          0.02     6.25       0.12     37.50
# [[                 0.02     6.25       0.08     25.00
# FUN                0.02     6.25       0.06     18.75
# rbind.fill         0.02     6.25       0.06     18.75
# anyDuplicated      0.02     6.25       0.02      6.25
# gc                 0.02     6.25       0.02      6.25
# is.array           0.02     6.25       0.02      6.25
# list               0.02     6.25       0.02      6.25
# mean.default       0.02     6.25       0.02      6.25

Here's the parallel version with snow:

这是带雪的并行版本:

library(doSNOW)
cl <- makeSOCKcluster(2)
registerDoSNOW(cl)

Rprof()
system.time(ddply(baseball, .(year), numcolwise(mean), .parallel=TRUE))
#    user  system elapsed 
#    0.46    0.01    0.73
Rprof(NULL)
summaryRprof()$by.self
#                     self.time self.pct total.time total.pct
# .Call                    0.24    33.33       0.24     33.33
# socketSelect             0.16    22.22       0.16     22.22
# lazyLoadDBfetch          0.08    11.11       0.08     11.11
# accumulate.iforeach      0.04     5.56       0.06      8.33
# rbind.fill               0.04     5.56       0.06      8.33
# structure                0.04     5.56       0.04      5.56
# <Anonymous>              0.02     2.78       0.54     75.00
# lapply                   0.02     2.78       0.04      5.56
# constantFoldEnv          0.02     2.78       0.02      2.78
# gc                       0.02     2.78       0.02      2.78
# stopifnot                0.02     2.78       0.02      2.78
# summary.connection       0.02     2.78       0.02      2.78