当lambda = 0为家庭=“poisson”时，glmnet缺乏收敛

While getting a handle on glmnet versus glm, I ran into convergence problems for lambda=0 and family="poisson". My understanding is that with lambda=0 (and alpha=1, the default), the answers should be essentially the same.

在对glmnet和glm进行处理时，我遇到了lambda = 0和family =“poisson”的收敛问题。我的理解是，使用lambda = 0（和alpha = 1，默认值），答案应该基本相同。

Below is code changed slightly from the poisson example on the glmnet help page (?glmnet). The only change is that nzc = p so that all variables are in the true model

下面的代码与glmnet帮助页面（？glmnet）上的poisson示例略有不同。唯一的变化是nzc = p，所以所有变量都在真实模型中

N=1000; p=50
nzc=p
x=matrix(rnorm(N*p),N,p)
beta=rnorm(nzc)
f = x[,seq(nzc)]%*%beta
mu=exp(f)
y=rpois(N,mu)

#With lambda=0 glmnet throws the convergence error shown below
fit=glmnet(x,y,family="poisson",lambda=0)

#It works with default lambda passed in
# but estimates are quite different from glm.
fit=glmnet(x,y,family="poisson") #use default lambdas
fit2=glm(y~x,family="poisson")
plot(coef(fit2)[2:(p+1)], 
     coef(fit,s=min(fit$lambda))[2:(p+1)],
     xlab="glm",ylab="glmnet")
abline(0,1)

#works fine with gaussian response and lambda=0 or default lambda
#glm and glmnet identical
mu = f
y=rnorm(N,mu)
fit=glmnet(x,y,family="gaussian",lambda=0)
fit2=glm(y~x)
plot(coef(fit2)[2:(p+1)], coef(fit)[2:(p+1)])
abline(0,1)

Here's the error message

这是错误消息

Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda value not reached after maxit=100000 iterations; solutions for larger lambdas returned 
2: In getcoef(fit, nvars, nx, vnames) :an empty model has been returned; probably a convergence issue

Updated: The problem seems to be with the intercept being estimated by glmnet when family="poisson" and not related to the setting of lambda per se.

更新：问题似乎是当系列=“泊松”时由glmnet估计的截距并且与lambda本身的设置无关。

fit=glmnet(x,y,family="poisson")
#intercept should be close to 0
coef(fit)[1,]
#but it is huge
#passing in intercept=FALSE however generates the convergence error again
fit=glmnet(x,y,family="poisson", intercept=FALSE)

2 个解决方案

#1

I think you are confused about lambda and alpha. alpha is the penalization factor which is set to 0 will give you ridge regression. Typically it is set to something between 0.1 and 1. lambda is typically not set, and there is a warning on the help page NOT to set it to a single value:

我认为你对lambda和alpha感到困惑。 alpha是惩罚因子，设置为0会给你岭回归。通常它设置为介于0.1和1之间的值。通常不设置lambda，并且帮助页面上有一个警告，不要将其设置为单个值：

WARNING: use with care. Do not supply a single value for lambda

I don't know why you think a lasso penalty should be the same as an unpenalized Poisson model. The whole point of a penalized model is to be less subject to the biases and constraints of an ordinary regression model.

我不知道为什么你认为套索惩罚应该与无人值守的泊松模型相同。惩罚模型的重点在于较少受普通回归模型的偏差和约束的影响。

#2

You get the error because you try to pass lambda = 0 to glmnet. If you want to select the coefficients from glmnet for lambda = 0, you could use:

您收到错误，因为您尝试将lambda = 0传递给glmnet。如果要从glmnet中为lambda = 0选择系数，可以使用：

coef(fit, s=0)

This automatically selects the last (smallest) value of lambda. I guess you've basically done that already though, with s = min(fit$lambda). If you want to go even smaller than that you might have to manually put in a lambda sequence, but this is a little bit tricky (glmnet seems a little bit stubborn about its lambda's).

这会自动选择lambda的最后（最小）值。我猜你已基本完成了这个，但是s = min（适合$ lambda）。如果你想要比你更小，你可能需要手动输入一个lambda序列，但这有点棘手（glmnet似乎对它的lambda有点固执）。

Also keep in mind that there might be some bias in glmnet, so it could be slightly different from the results of glm.

还要记住，glmnet可能有一些偏差，所以它可能与glm的结果略有不同。

#1

WARNING: use with care. Do not supply a single value for lambda