超过R中的最大DLL数

时间:2021-02-07 22:19:39

I am using RStan to sample from a large number of Gaussian Processes (GPs), i.e., using the function stan(). For every GP that I fit, another DLL gets loaded, as can be seen by running the R command

我正在使用RStan从大量高斯过程(GP)中进行采样,即使用函数stan()。对于我适合的每个GP,可以通过运行R命令来加载另一个DLL

getLoadedDLLs()

The problem I'm running into is that, because I need to fit so many unique GPs, I'm exceeding the maximum number of DLLs that can be loaded, at which point I receive the following error:

我遇到的问题是,因为我需要适应这么多独特的GP,我超过了可以加载的最大DLL数量,此时我收到以下错误:

Error in dyn.load(libLFile) : 
unable to load shared object '/var/folders/8x/n7pqd49j4ybfhrm999z3cwp81814xh/T//RtmpmXCRCy/file80d1219ef10d.so':
maximal number of DLLs reached...

As far as I can tell, this is set in Rdynload.c of the base R code, as follows:

据我所知,这是在基本R代码的Rdynload.c中设置的,如下所示:

#define MAX_NUM_DLLS 100

So, my question is, what can be done to fix this? Building R from source with a larger MAX_NUM_DLLS isn't an option, as my code will be run by collaborators who wouldn't be comfortable with that process. I've tried the naive approach of just unloading DLLs using dyn.unload() in the hopes that they'd just be reloaded when they're needed again. The unloading works fine, but when I try to use the fit again, R fairly unsurprisingly crashes with an error like:

所以,我的问题是,可以采取哪些措施来解决这个问题?使用较大的MAX_NUM_DLLS从源代码构建R不是一个选项,因为我的代码将由不熟悉该过程的协作者运行。我尝试过使用dyn.unload()卸载DLL的天真方法,希望在需要时再重新加载它们。卸载工作正常,但当我尝试再次使用适合时,R毫不奇怪地崩溃,出现如下错误:

*** caught segfault ***
address 0x121366da8, cause 'memory not mapped'

I've also tried detaching RStan in the hopes that the DLLs would be automatically unloaded, but they persist even after unloading the package (as expected, given the following in the help for detach: "detaching will not in general unload any dynamically loaded compiled code (DLLs)").

我也尝试分离RStan,希望DLL能够自动卸载,但是即使在卸载软件包之后它们仍然存在(正如预期的那样,在分离帮助中给出以下内容:“分离通常不会卸载任何动态加载的编译代码(DLLs)“)。

From this question, Can Rcpp package DLLs be unloaded without restarting R?, it seems that library.dynam.unload() might have some role in the solution, but I haven't had any success using it to unload the DLLs, and I suspect that after unloading the DLL I'd run into the same segfault as before.

从这个问题,可以在不重新启动R?的情况下卸载Rcpp包DLL,似乎library.dynam.unload()可能在解决方案中有一些作用,但是我没有成功使用它来卸载DLL,而我怀疑在卸载DLL之后我会遇到和以前一样的段错误。

EDIT: adding a minimal, fully-functional example:

编辑:添加一个最小的,功能齐全的示例:

The R code:

R代码:

require(rstan)

x <- c(1,2)
N <- length(x)

fits <- list()
for(i in 1:100)
{
    fits[i] <- stan(file="gp-sim.stan", data=list(x=x,N=N), iter=1, chains=1)
}

This code requires that the following model definition be in the working directory in a file gp-sim.stan (this model is one of the examples included with Stan):

此代码要求以下模型定义位于文件gp-sim.stan中的工作目录中(此模型是Stan中包含的示例之一):

// Sample from Gaussian process
// Fixed covar function: eta_sq=1, rho_sq=1, sigma_sq=0.1

data {
  int<lower=1> N;
  real x[N];
}
transformed data {
   vector[N] mu;
   cov_matrix[N] Sigma;
   for (i in 1:N) 
     mu[i] <- 0;
   for (i in 1:N) 
     for (j in 1:N)
       Sigma[i,j] <- exp(-pow(x[i] - x[j],2)) + if_else(i==j, 0.1, 0.0);
 }
 parameters {
   vector[N] y;
 }
 model {
   y ~ multi_normal(mu,Sigma);
 }

Note: this code takes quite some time to run, as it is creating ~100 Stan models.

注意:此代码需要相当长的时间才能运行,因为它创建了~100个Stan模型。

2 个解决方案

#1


7  

I can't speak for the issues regarding dlls, but you shouldn't need to compile the model each time. You can compile the model once and reuse it, which won't cause this problem and it will speed up your code.

我不能代表有关dll的问题,但你不应该每次都要编译模型。您可以编译模型一次并重复使用它,这不会导致此问题,它将加快您的代码。

The function stan is a wrapper for stan_model which compiles the model and the sampling method which draws samples from the model. You should run stan_model once to compile the model and save it to an object, and then use the sampling method on that object to draw samples.

函数stan是stan_model的包装器,用于编译模型和从模型中提取样本的抽样方法。您应该运行stan_model一次以编译模型并将其保存到对象,然后使用该对象上的采样方法来绘制样本。

require(rstan)

x <- c(1,2)
N <- length(x)

fits <- list()
mod <- stan_model("gp-sim.stan")
for(i in 1:100)
{
    fits[i] <- sampling(mod, data=list(x=x,N=N), iter=1, chains=1)
}

This is similar to the problem of running parallel chains, discussed in the Rstan wiki. Your code could by sped up by replace the for loop with something that processes the sampling in parallel.

这类似于在Rstan wiki中讨论的运行并行链的问题。你的代码可以通过用并行处理采样的东西替换for循环来加速。

#2


0  

Here is, what I use to run several stan models in a row (Win10, R 3.3.0).

这是我用来连续运行几个stan模型(Win10,R 3.3.0)。

I needed to not only unload the dll-files but also delete them and other temporary files. Then, the filename for me was different than found in the stan object, as Ben suggested.

我不仅需要卸载dll文件,还要删除它们和其他临时文件。然后,我的文件名与stan对象中的文件名不同,正如Ben建议的那样。

 dso_filenames <- dir(tempdir(), pattern=.Platform$dynlib.ext)
  filenames  <- dir(tempdir())
  for (i in seq(dso_filenames))
    dyn.unload(file.path(tempdir(), dso_filenames[i]))
  for (i in seq(filenames))
    if (file.exists(file.path(tempdir(), filenames[i])) & nchar(filenames[i]) < 42) # some files w/ long filenames that didn't like to be removeed
      file.remove(file.path(tempdir(), filenames[i]))

#1


7  

I can't speak for the issues regarding dlls, but you shouldn't need to compile the model each time. You can compile the model once and reuse it, which won't cause this problem and it will speed up your code.

我不能代表有关dll的问题,但你不应该每次都要编译模型。您可以编译模型一次并重复使用它,这不会导致此问题,它将加快您的代码。

The function stan is a wrapper for stan_model which compiles the model and the sampling method which draws samples from the model. You should run stan_model once to compile the model and save it to an object, and then use the sampling method on that object to draw samples.

函数stan是stan_model的包装器,用于编译模型和从模型中提取样本的抽样方法。您应该运行stan_model一次以编译模型并将其保存到对象,然后使用该对象上的采样方法来绘制样本。

require(rstan)

x <- c(1,2)
N <- length(x)

fits <- list()
mod <- stan_model("gp-sim.stan")
for(i in 1:100)
{
    fits[i] <- sampling(mod, data=list(x=x,N=N), iter=1, chains=1)
}

This is similar to the problem of running parallel chains, discussed in the Rstan wiki. Your code could by sped up by replace the for loop with something that processes the sampling in parallel.

这类似于在Rstan wiki中讨论的运行并行链的问题。你的代码可以通过用并行处理采样的东西替换for循环来加速。

#2


0  

Here is, what I use to run several stan models in a row (Win10, R 3.3.0).

这是我用来连续运行几个stan模型(Win10,R 3.3.0)。

I needed to not only unload the dll-files but also delete them and other temporary files. Then, the filename for me was different than found in the stan object, as Ben suggested.

我不仅需要卸载dll文件,还要删除它们和其他临时文件。然后,我的文件名与stan对象中的文件名不同,正如Ben建议的那样。

 dso_filenames <- dir(tempdir(), pattern=.Platform$dynlib.ext)
  filenames  <- dir(tempdir())
  for (i in seq(dso_filenames))
    dyn.unload(file.path(tempdir(), dso_filenames[i]))
  for (i in seq(filenames))
    if (file.exists(file.path(tempdir(), filenames[i])) & nchar(filenames[i]) < 42) # some files w/ long filenames that didn't like to be removeed
      file.remove(file.path(tempdir(), filenames[i]))