在R parallel :: mcparallel中，是否有可能限制任何时候使用的核心数？

In R, the mcparallel() function in the parallel package forks off a new task to a worker each time it is called. If my machine has N (physical) cores, and I fork off 2N tasks, for example, then each core starts off running two tasks, which is not desirable. I would rather like to be able to start running N tasks on N workers, and then, as each tasks finishes, submit the next task to the now-available core. Is there an easy way to do this?

在R中,并行包中的mcparallel()函数在每次调用时都会向工作者发送新任务。例如,如果我的机器具有N(物理)核心,并且我分叉2N任务,则每个核心开始运行两个任务,这是不可取的。我希望能够开始在N个工作人员上运行N个任务,然后,当每个任务完成时,将下一个任务提交给现在可用的核心。是否有捷径可寻?

My tasks take different amounts of time, so it is not an option to fork off the tasks serial in batches of N. There might be some workarounds, such as checking the number of active cores and then submitting new tasks when they become free, but does anyone know of a simple solution?

我的任务需要不同的时间,因此不能选择分批N批处理的任务。可能有一些解决方法,例如检查活动核心的数量,然后在它们空闲时提交新任务,但是有谁知道一个简单的解决方案?

I have tried setting cl <- makeForkCluster(nnodes=N), which does indeed set N cores going, but these are not then used by mcparallel(). Indeed, there appears to be no way to feed cl into mcparallel(). The latter has an option mc.affinity, but it's unclear how to use this and it doesn't seem to do what I want anyway (and according to the documentation its functionality is machine dependent).

我已经尝试设置cl < - makeForkCluster(nnodes = N),确实设置了N个核心,但mcparallel()不会使用它们。实际上,似乎没有办法将cl提供给mcparallel()。后者有一个选项mc.affinity,但目前还不清楚如何使用它,它似乎没有做我想要的东西(根据文档,它的功能取决于机器)。

2 个解决方案

#1

you have at least 2 possibilities:

你有至少2种可能性:

As mentioned above you can use mcparallel's parameters "mc.cores" or "mc.affinity". On AMD platforms "mc.affinity" is preferred since two cores share same clock. For example an FX-8350 has 8 cores, but core 0 has same clock as core 1. If you start a task for 2 cores only it is better to assign it to cores 0 and 1 rather than 0 and 2. "mc.affinity" makes that. The price is loosing load balancing.

如上所述,您可以使用mcparallel的参数“mc.cores”或“mc.affinity”。在AMD平台上,“mc.affinity”是首选,因为两个内核共享相同的时钟。例如,FX-8350有8个内核,但内核0具有与内核1相同的时钟。如果仅为2个内核启动任务,最好将其分配给内核0和1而不是0和2.“mc.affinity “这样做。价格正在减少负载平衡。

"mc.affinity" is present in recent versions of the package. See changelog to find when introduced.

“mc.affinity”出现在该软件包的最新版本中。请参阅changelog以查找引入时间。
Also you can use OS's tool for setting affinity, e.g. "taskset":

您也可以使用OS的工具来设置亲和力,例如“包括taskset”:

/usr/bin/taskset -c 0-1 /usr/bin/R ...

/ usr / bin / taskset -c 0-1 / usr / bin / R ...

Here you make your script to run on cores 0 and 1 only.

在这里,您可以使脚本仅在核心0和1上运行。

Keep in mind Linux numbers its cores starting from "0". Package parallel conforms to R's indexing and first core is core number 1.

请记住Linux编号,其核心从“0”开始。包并行符合R的索引,第一个核心是核心编号1。

#2

I'd suggest taking advantage of the higher level functions in parallel that include this functionality instead of trying to force low level functions to do what you want.

我建议并行使用包含此功能的更高级别功能,而不是试图强制低级功能来执行您想要的操作。

In this case, try writing your tasks as different arguments of a single function. Then you can use mclapply() with the mc.preschedule parameter set to TRUE and the mc.cores parameter set to the number of threads you want to use at a time. Each time a task finishes and a thread closes, a new thread will be created, operating on the next available task.

在这种情况下,尝试将您的任务编写为单个函数的不同参数。然后,您可以使用mclapply()将mc.preschedule参数设置为TRUE,并将mc.cores参数设置为您希望一次使用的线程数。每次任务完成并且线程关闭时,将创建一个新线程,对下一个可用任务进行操作。

Even if each task uses a completely different bit of code, you can create a list of functions and pass that to a wrapper function. For example, the following code executes two functions at a time.

即使每个任务使用完全不同的代码,您也可以创建一个函数列表并将其传递给包装函数。例如,以下代码一次执行两个函数。

f1 <- function(x) {x^2}
f2 <- function(x) {2*x}
f3 <- function(x) {3*x}
f4 <- function(x) {x*3}
params <- list(f1,f2,f3,f4)
wrapper <- function(f,inx){f(inx)}
output <- mclapply(params,FUN=calling,mc.preschedule=TRUE,mc.cores=2,inx=5)

If need be you could make params a list of lists including various parameters to be passed to each function as well as the function definition. I've used this approach frequently with various tasks of different lengths and it works well.

如果需要,您可以使params列出一系列列表,包括要传递给每个函数的各种参数以及函数定义。我经常使用这种方法处理不同长度的各种任务,效果很好。

Of course, it may be that your various tasks are just different calls to the same function, in which case you can use mclapply directly without having to write a wrapper function.

当然,可能是您的各种任务只是对同一个函数的不同调用,在这种情况下,您可以直接使用mclapply而无需编写包装函数。

#1