While getting a handle on glmnet versus glm, I ran into convergence problems for lambda=0 and family="poisson". My understanding is that with lambda=0 (and alpha=1, the default), the answers should be essentially the same.
在对glmnet和glm进行处理时,我遇到了lambda = 0和family =“poisson”的收敛问题。我的理解是,使用lambda = 0(和alpha = 1,默认值),答案应该基本相同。
Below is code changed slightly from the poisson example on the glmnet help page (?glmnet). The only change is that nzc = p so that all variables are in the true model
下面的代码与glmnet帮助页面(?glmnet)上的poisson示例略有不同。唯一的变化是nzc = p,所以所有变量都在真实模型中
N=1000; p=50
nzc=p
x=matrix(rnorm(N*p),N,p)
beta=rnorm(nzc)
f = x[,seq(nzc)]%*%beta
mu=exp(f)
y=rpois(N,mu)
#With lambda=0 glmnet throws the convergence error shown below
fit=glmnet(x,y,family="poisson",lambda=0)
#It works with default lambda passed in
# but estimates are quite different from glm.
fit=glmnet(x,y,family="poisson") #use default lambdas
fit2=glm(y~x,family="poisson")
plot(coef(fit2)[2:(p+1)],
coef(fit,s=min(fit$lambda))[2:(p+1)],
xlab="glm",ylab="glmnet")
abline(0,1)
#works fine with gaussian response and lambda=0 or default lambda
#glm and glmnet identical
mu = f
y=rnorm(N,mu)
fit=glmnet(x,y,family="gaussian",lambda=0)
fit2=glm(y~x)
plot(coef(fit2)[2:(p+1)], coef(fit)[2:(p+1)])
abline(0,1)
Here's the error message
这是错误消息
Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda value not reached after maxit=100000 iterations; solutions for larger lambdas returned
2: In getcoef(fit, nvars, nx, vnames) :an empty model has been returned; probably a convergence issue
Updated: The problem seems to be with the intercept being estimated by glmnet when family="poisson" and not related to the setting of lambda per se.
更新:问题似乎是当系列=“泊松”时由glmnet估计的截距并且与lambda本身的设置无关。
fit=glmnet(x,y,family="poisson")
#intercept should be close to 0
coef(fit)[1,]
#but it is huge
#passing in intercept=FALSE however generates the convergence error again
fit=glmnet(x,y,family="poisson", intercept=FALSE)
2 个解决方案
#1
1
I think you are confused about lambda and alpha. alpha
is the penalization factor which is set to 0 will give you ridge regression. Typically it is set to something between 0.1 and 1. lambda
is typically not set, and there is a warning on the help page NOT to set it to a single value:
我认为你对lambda和alpha感到困惑。 alpha是惩罚因子,设置为0会给你岭回归。通常它设置为介于0.1和1之间的值。通常不设置lambda,并且帮助页面上有一个警告,不要将其设置为单个值:
WARNING: use with care. Do not supply a single value for lambda
I don't know why you think a lasso penalty should be the same as an unpenalized Poisson model. The whole point of a penalized model is to be less subject to the biases and constraints of an ordinary regression model.
我不知道为什么你认为套索惩罚应该与无人值守的泊松模型相同。惩罚模型的重点在于较少受普通回归模型的偏差和约束的影响。
#2
0
You get the error because you try to pass lambda = 0 to glmnet. If you want to select the coefficients from glmnet for lambda = 0, you could use:
您收到错误,因为您尝试将lambda = 0传递给glmnet。如果要从glmnet中为lambda = 0选择系数,可以使用:
coef(fit, s=0)
This automatically selects the last (smallest) value of lambda. I guess you've basically done that already though, with s = min(fit$lambda)
. If you want to go even smaller than that you might have to manually put in a lambda sequence, but this is a little bit tricky (glmnet seems a little bit stubborn about its lambda's).
这会自动选择lambda的最后(最小)值。我猜你已基本完成了这个,但是s = min(适合$ lambda)。如果你想要比你更小,你可能需要手动输入一个lambda序列,但这有点棘手(glmnet似乎对它的lambda有点固执)。
Also keep in mind that there might be some bias in glmnet, so it could be slightly different from the results of glm.
还要记住,glmnet可能有一些偏差,所以它可能与glm的结果略有不同。
#1
1
I think you are confused about lambda and alpha. alpha
is the penalization factor which is set to 0 will give you ridge regression. Typically it is set to something between 0.1 and 1. lambda
is typically not set, and there is a warning on the help page NOT to set it to a single value:
我认为你对lambda和alpha感到困惑。 alpha是惩罚因子,设置为0会给你岭回归。通常它设置为介于0.1和1之间的值。通常不设置lambda,并且帮助页面上有一个警告,不要将其设置为单个值:
WARNING: use with care. Do not supply a single value for lambda
I don't know why you think a lasso penalty should be the same as an unpenalized Poisson model. The whole point of a penalized model is to be less subject to the biases and constraints of an ordinary regression model.
我不知道为什么你认为套索惩罚应该与无人值守的泊松模型相同。惩罚模型的重点在于较少受普通回归模型的偏差和约束的影响。
#2
0
You get the error because you try to pass lambda = 0 to glmnet. If you want to select the coefficients from glmnet for lambda = 0, you could use:
您收到错误,因为您尝试将lambda = 0传递给glmnet。如果要从glmnet中为lambda = 0选择系数,可以使用:
coef(fit, s=0)
This automatically selects the last (smallest) value of lambda. I guess you've basically done that already though, with s = min(fit$lambda)
. If you want to go even smaller than that you might have to manually put in a lambda sequence, but this is a little bit tricky (glmnet seems a little bit stubborn about its lambda's).
这会自动选择lambda的最后(最小)值。我猜你已基本完成了这个,但是s = min(适合$ lambda)。如果你想要比你更小,你可能需要手动输入一个lambda序列,但这有点棘手(glmnet似乎对它的lambda有点固执)。
Also keep in mind that there might be some bias in glmnet, so it could be slightly different from the results of glm.
还要记住,glmnet可能有一些偏差,所以它可能与glm的结果略有不同。