在R中,有没有办法在同一台机器上的R差分进程之间共享一个变量?

时间:2022-07-02 13:51:33

My problem is that I have a large model, which is slow to load to memory. To test it on many samples, I need to run some C program to generating input features for model, then run R script to predict. It takes too much time to load the model every time.

我的问题是我有一个大型模型,加载到内存很慢。要在许多样本上测试它,我需要运行一些C程序来为模型生成输入功能,然后运行R脚本进行预测。每次加载模型需要花费太多时间。

So I am wondering

所以我很想知道

1) if there is some method to keep the model ( a variable in R) in the memory.

1)如果有一些方法将模型(R中的变量)保存在存储器中。

or

要么

2) Can I run a separative process of R as a dedicated server, then all the prediction processes of R can access the variable in the server on the same machine.

2)我可以运行R作为专用服务器的分离过程,然后R的所有预测过程都可以访问同一台机器上的服务器中的变量。

The model is never changed during for all the prediction. It is a randomForest model stored in a .rdata file, which has ~500MB. Loading this model is slow.

在所有预测期间,模型永远不会改变。它是一个存储在.rdata文件中的randomForest模型,它有~500MB。加载此模型很慢。

I know that I can use parallel R (snow, doPar, etc) to perform prediction in parallel, however, this is not what I want, since it require me to change the data flow I used.

我知道我可以使用并行R(snow,doPar等)来并行执行预测,但是,这不是我想要的,因为它需要我改变我使用的数据流。

Thanks a lot.

非常感谢。

1 个解决方案

#1


2  

If you are regenerating the model every time, you can save the model as an RData file and then share it across the different machines. While it may still take time to load from disk to memory, it will save the time of regenerating.

如果每次都重新生成模型,则可以将模型另存为RData文件,然后在不同的计算机上共享。虽然从磁盘加载到内存可能仍需要一些时间,但它将节省重新生成的时间。

   save(myModel, file="path/to/file.Rda")

   # then
   load(file="path/to/file.Rda")

Edit per @VictorK's suggetsion: As Victor points out, since you are saving only a single object, saveRDS may be a better choice.

根据@ VictorK的建议进行编辑:正如Victor指出的那样,由于您只保存了一个对象,因此saveRDS可能是更好的选择。

  saveRDS(myModel, file="path/to/file.Rds")

  myModel <- readRDS(file="path/to/file.Rds")

#1


2  

If you are regenerating the model every time, you can save the model as an RData file and then share it across the different machines. While it may still take time to load from disk to memory, it will save the time of regenerating.

如果每次都重新生成模型,则可以将模型另存为RData文件,然后在不同的计算机上共享。虽然从磁盘加载到内存可能仍需要一些时间,但它将节省重新生成的时间。

   save(myModel, file="path/to/file.Rda")

   # then
   load(file="path/to/file.Rda")

Edit per @VictorK's suggetsion: As Victor points out, since you are saving only a single object, saveRDS may be a better choice.

根据@ VictorK的建议进行编辑:正如Victor指出的那样,由于您只保存了一个对象,因此saveRDS可能是更好的选择。

  saveRDS(myModel, file="path/to/file.Rds")

  myModel <- readRDS(file="path/to/file.Rds")