如何加速r中的GLM估计?

时间:2021-03-24 13:49:59

I am using RStudio 0.97.320 (R 2.15.3) on amazon ec2. My df has 200k rows and 12 columns.

我在亚马逊ec2上使用RStudio 0.97.320(R 2.15.3)。我的df有20万行和12列。

I am trying to fit a logistic regression with ~1500 parameters.

我试图用~1500个参数拟合逻辑回归。

Rsudio is using 7% cpu and has 60+GB memory and is still taking a very long time.

Rsudio使用7%的CPU,拥有60 + GB内存,但仍需要很长时间。

Here is the code:

这是代码:

glm.1.2<-glm(formula = Y ~ factor(X1) * log(X2) * (X3 + X4 * (X5 + I(X5^2)) * (X8 + I(X8^2)) + ((X6 + I(X6^2)) * factor(X7))), family = binomial(logit), data = df[1:150000,])

Any suggestions to speed this up by a significant amount?

有什么建议可以加快这个速度吗?

2 个解决方案

#1


7  

You could try the speedglm function from the speedglm package. I haven't used it on problems as large as you describe, but especially if you install a BLAS library (as @Ben Bolker suggested in the comments) it should be easy to use and give you a nice speed bump.

您可以尝试使用speedglm软件包中的speedglm函数。我没有在你描述的大问题上使用它,但特别是如果你安装一个BLAS库(如评论中提到的@Ben Bolker)它应该很容易使用并给你一个很好的减速带。

I remember seeing a table benchmarking glm and speedglm with and without a BLAS library, but I can't seem to find it today. I remember that the convinced me that I want both BLAS and speedglm.

我记得有一个表格使用和没有BLAS库对glm和speedglm进行基准测试,但我今天似乎无法找到它。我记得我确信我想要BLAS和speedglm。

#2


4  

Although a bit late but I can only encourage dickoa's suggestion to generate a sparse model matrix using the Matrix package and then feeding this to the speedglm.wfit function. That works great ;-) This way, I was able to run a logistic regression on a 1e6 x 3500 model matrix in less than 3 minutes.

虽然有点晚但我只能鼓励dickoa建议使用Matrix包生成稀疏模型矩阵,然后将其提供给speedglm.wfit函数。这很好;-)这样,我能够在不到3分钟的时间内在1e6 x 3500模型矩阵上运行逻辑回归。

#1


7  

You could try the speedglm function from the speedglm package. I haven't used it on problems as large as you describe, but especially if you install a BLAS library (as @Ben Bolker suggested in the comments) it should be easy to use and give you a nice speed bump.

您可以尝试使用speedglm软件包中的speedglm函数。我没有在你描述的大问题上使用它,但特别是如果你安装一个BLAS库(如评论中提到的@Ben Bolker)它应该很容易使用并给你一个很好的减速带。

I remember seeing a table benchmarking glm and speedglm with and without a BLAS library, but I can't seem to find it today. I remember that the convinced me that I want both BLAS and speedglm.

我记得有一个表格使用和没有BLAS库对glm和speedglm进行基准测试,但我今天似乎无法找到它。我记得我确信我想要BLAS和speedglm。

#2


4  

Although a bit late but I can only encourage dickoa's suggestion to generate a sparse model matrix using the Matrix package and then feeding this to the speedglm.wfit function. That works great ;-) This way, I was able to run a logistic regression on a 1e6 x 3500 model matrix in less than 3 minutes.

虽然有点晚但我只能鼓励dickoa建议使用Matrix包生成稀疏模型矩阵,然后将其提供给speedglm.wfit函数。这很好;-)这样,我能够在不到3分钟的时间内在1e6 x 3500模型矩阵上运行逻辑回归。