
时间:2021-02-26 03:54:00

I'm new to cluster processing, and could use some advice as to how better to prepare data and/or the calls to functions from the parallel package. I have read thru the parallels package vignettes, so have a vague idea what's going on.


The function I want to parallelize calls the 2-D interpolation tool akima::interp . My input consists of 3 matrices (or vectors -- all the same in R): one contains the x-coordinates, one the y-coordinates, and one the "z", or data values, for a set of sample points. interp uses this to produce interpolated data on a regular grid so I can, e.g., plot the field. Once I have these 3 items set up, I cut them into "chunks" and feed them to clusterApply to execute interp chunk by chunk.

我要并行化的函数调用2-D插值工具akima :: interp。我的输入由3个矩阵(或向量 - 在R中完全相同)组成:一个包含x坐标,一个包含y坐标,一个包含一组样本点的“z”或数据值。 interp使用它来在规则网格上产生内插数据,因此我可以例如绘制字段。一旦我设置了这3个项目,我将它们切成“块”并将它们提供给clusterApply以按块执行interp chunk。

I'm using a Windows7, i7 CPU (8-core) machine. Here's the summary output from Rprof for an input data set with 1e6 points (1000x1000 if you like), and mapped onto a 1000x1000 output grid.

我正在使用Windows7,i7 CPU(8核)机器。以下是Rprof的输出数据集的摘要输出,输入数据集为1e6点(如果您愿意,则为1000x1000),并映射到1000x1000输出网格。

So my questions are: 1) It appears that "unserialize" is taking most of the time. What is this operation, and how could it be reduced? 2) In general, since each worker loads the default .Rdata file, is there any speed gained if I first save all input data to .Rdata so that it doesn't need to get passed to the workers? 3) Anything else that I'm simply unaware of that I should have done differently?

所以我的问题是:1)似乎“反序列化”占据了大部分时间。这个操作是什么,怎么可以减少? 2)一般情况下,由于每个工作人员都加载了默认的.Rdata文件,如果我首先将所有输入数据保存到.Rdata以便它不需要传递给工作人员,是否会获得任何速度? 3)其他任何我根本不知道我应该采取不同的做法?

Note: the sin, atan2, cos, +, max, min functions take place prior to the clusterApply call I make.


Rgames> summaryRprof('bigprof.txt')
                   self.time self.pct total.time total.pct
"unserialize"         329.04    99.11     329.04     99.11
"socketConnection"      1.74     0.52       1.74      0.52
"serialize"             0.96     0.29       0.96      0.29
"sin"                   0.06     0.02       0.06      0.02
"atan2"                 0.04     0.01       0.06      0.02
"cos"                   0.04     0.01       0.04      0.01
"+"                     0.02     0.01       0.02      0.01
"max"                   0.02     0.01       0.02      0.01
"min"                   0.02     0.01       0.02      0.01
"row"                   0.02     0.01       0.02      0.01
"writeLines"            0.02     0.01       0.02      0.01

                     total.time total.pct self.time self.pct
"mcswirl"                331.98    100.00      0.00     0.00
"clusterApply"           330.00     99.40      0.00     0.00
"staticClusterApply"     330.00     99.40      0.00     0.00
"FUN"                    329.06     99.12      0.00     0.00
"unserialize"            329.04     99.11    329.04    99.11
"lapply"                 329.04     99.11      0.00     0.00
"recvData"               329.04     99.11      0.00     0.00
"recvData.SOCKnode"      329.04     99.11      0.00     0.00
"makeCluster"              1.76      0.53      0.00     0.00
"makePSOCKcluster"         1.76      0.53      0.00     0.00
"newPSOCKnode"             1.76      0.53      0.00     0.00
"socketConnection"         1.74      0.52      1.74     0.52
"serialize"                0.96      0.29      0.96     0.29
"postNode"                 0.96      0.29      0.00     0.00
"sendCall"                 0.96      0.29      0.00     0.00
"sendData"                 0.96      0.29      0.00     0.00
"sendData.SOCKnode"        0.96      0.29      0.00     0.00
"sin"                      0.06      0.02      0.06     0.02
"atan2"                    0.06      0.02      0.04     0.01
"cos"                      0.04      0.01      0.04     0.01
"+"                        0.02      0.01      0.02     0.01
"max"                      0.02      0.01      0.02     0.01
"min"                      0.02      0.01      0.02     0.01
"row"                      0.02      0.01      0.02     0.01
"writeLines"               0.02      0.01      0.02     0.01
"outer"                    0.02      0.01      0.00     0.00
"system"                   0.02      0.01      0.00     0.00

[1] 0.02

[1] 331.98

1 个解决方案



When clusterApply is called, it first sends a task to each of the cluster workers, and then waits for each of them to return the corresponding result. If there are more tasks to do, it repeats that procedure until all of the tasks are complete.


The function that it uses to wait for a result from a particular worker is recvResult which ultimately calls unserialize to read data from the socket that is connected to that worker. So if the master process is spending most of its time in unserialize, then it is spending most of its time waiting for the cluster workers to return the task results, which is what you would hope to see on the master. If it was spending a lot of time in serialize, that would mean that it was spending a lot of time sending the tasks to the workers, which would be a bad sign.


Unfortunately, you can't tell how much time unserialize spends blocking, waiting for the result data to arrive, and how much time it spends actually transferring that data. The results might be easily computed by the workers and huge, or they might take a long time to compute and be tiny: there's no way to tell from the profiling data.


So to make unserialize execute faster, you need to make the workers compute their results faster, or make the results smaller, if that's possible. In addition, it might help to use the makeCluster useXDR=FALSE option. It might improve your performance by not using XDR to encode your data, making both serialize and unserialize faster.

因此,为了使反序列化更快地执行,您需要让工作人员更快地计算结果,或者使结果更小,如果可能的话。此外,使用makeCluster useXDR = FALSE选项可能会有所帮助。它可以通过不使用XDR对数据进行编码来提高性能,从而使序列化和反序列化更快。

I don't think it will help to save all input data to .Rdata since you're not spending much time sending data to the workers, as seen by the short time spent in the serialize function. I suspect that would slow you down a little bit.


The only other advice I can think of is to try using parLapply or clusterApplyLB, rather than clusterApply. I recommend using parLapply unless you have a specific reason to use one of the other functions since parLapply is often the most efficient. clusterApplyLB is useful when you have tasks that take a long but variable length of time to execute.




When clusterApply is called, it first sends a task to each of the cluster workers, and then waits for each of them to return the corresponding result. If there are more tasks to do, it repeats that procedure until all of the tasks are complete.


The function that it uses to wait for a result from a particular worker is recvResult which ultimately calls unserialize to read data from the socket that is connected to that worker. So if the master process is spending most of its time in unserialize, then it is spending most of its time waiting for the cluster workers to return the task results, which is what you would hope to see on the master. If it was spending a lot of time in serialize, that would mean that it was spending a lot of time sending the tasks to the workers, which would be a bad sign.


Unfortunately, you can't tell how much time unserialize spends blocking, waiting for the result data to arrive, and how much time it spends actually transferring that data. The results might be easily computed by the workers and huge, or they might take a long time to compute and be tiny: there's no way to tell from the profiling data.


So to make unserialize execute faster, you need to make the workers compute their results faster, or make the results smaller, if that's possible. In addition, it might help to use the makeCluster useXDR=FALSE option. It might improve your performance by not using XDR to encode your data, making both serialize and unserialize faster.

因此,为了使反序列化更快地执行,您需要让工作人员更快地计算结果,或者使结果更小,如果可能的话。此外,使用makeCluster useXDR = FALSE选项可能会有所帮助。它可以通过不使用XDR对数据进行编码来提高性能,从而使序列化和反序列化更快。

I don't think it will help to save all input data to .Rdata since you're not spending much time sending data to the workers, as seen by the short time spent in the serialize function. I suspect that would slow you down a little bit.


The only other advice I can think of is to try using parLapply or clusterApplyLB, rather than clusterApply. I recommend using parLapply unless you have a specific reason to use one of the other functions since parLapply is often the most efficient. clusterApplyLB is useful when you have tasks that take a long but variable length of time to execute.
