I am running a generalized linear model with random effects using the glmer()
function from the package lme4
.
我使用包lme4中的glmer()函数运行一个具有随机效应的广义线性模型。
The model code looks like this:
模型代码如下:
mod6 <- glmer((Ndifference+74337) ~ netm1011 + d1011 +
b0001 + (1|region), Family = Gamma(link = "identity"))
Ndifference
is count data of population differences between 200 and 2010 of 50 states (and DC). There is one negative value (Michigan at -74336) so I added a constant to make sure my response was all positive.
Ndifference是50个州(和DC) 200至2010年人口差异的计数数据。有一个负值(密歇根州,-74336),所以我增加了一个常数,以确保我的反应都是正值的。
All the predictors (aside from the random effect of region) are proportions or percentages. Netm1011 (rate of immigration to the states in 2010) and d1011 (rate of deaths per 1000 people) both have several negative values. B0001 contains all positive proportions (birth rate/1000 people).
所有的预测因子(除了区域的随机效应外)都是比例或百分比。Netm1011(2010年美国移民率)和d1011(每千人死亡率)都有几个负值。B0001包含所有的正比例(出生率/1000人)。
When I run the model I keep getting this error:
当我运行这个模型时,我一直得到这个错误:
Error in eval(substitute(expr), envir, enclos) :
(maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate
Before this error theres a crazy amount of output that looks like this::
在这个错误出现之前,会有大量的输出如下所示:
I've tried different families of distributions as well (Gamma
, inverse.gaussian
). What exactly does this error code mean?
我也尝试过不同的分布族(Gamma, inverse.gaussian)这个错误代码到底是什么意思?
Below is the data I've been using (just the variables involved). Any help would be appreciated!
下面是我一直在使用的数据(仅涉及到的变量)。如有任何帮助,我们将不胜感激!
structure(list(Ndifference = c(333269L, 86445L, 1245536L, 244720L,
3333964L, 725062L, 166537L, 113590L, 33923L, 2791639L, 1484925L,
151993L, 271526L, 404489L, 399906L, 122812L, 167087L, 299231L,
74155L, 50624L, 477090L, 206187L, -74336L, 379644L, 120037L,
392539L, 87578L, 117119L, 685035L, 76962L, 374381L, 243485L,
409910L, 1481681L, 33444L, 178335L, 306232L, 408919L, 428717L,
2533L, 612397L, 60714L, 654646L, 4296180L, 533538L, 16207L, 921089L,
835264L, 46920L, 317348L, 70366L), d1011 = c(0.01009935290407,
0.00482181820219, 0.00740624039708, 0.00988384227183, 0.00640323497813,
0.00628209882119, 0.00812947210436, 0.00872837354861, 0.00764311417232,
0.00915166624244, 0.00737517844004, 0.00718037578961, 0.00746442540795,
0.00794410854005, 0.00889218497298, 0.00923607712106, 0.00850800517833,
0.00983998039872, 0.00904860671746, 0.00978543746728, 0.00752488166029,
0.00814412474047, 0.008998680863, 0.0074466124005, 0.00971662809766,
0.00917030625948, 0.00880861178822, 0.00819753663997, 0.00718370505053,
0.00796602176569, 0.00789025770533, 0.00777472712417, 0.00769648628539,
0.00831202019281, 0.00850432185633, 0.00953304172455, 0.00962020831593,
0.0084093843696, 0.00992588646267, 0.00893168396866, 0.00908595754594,
0.00854178331167, 0.00947807131183, 0.00662702930588, 0.0053663066427,
0.00848516414343, 0.00741560390799, 0.00724357008593, 0.01174960990152,
0.00835051236548, 0.00772546941972), netm1011 = c(0.00109618827436,
0.00189063449694, 0.00284535027555, 0.00218869215636, 0.00200262974559,
0.0065388101588, 0.00074903204593, 0.00531214993154, 0.01546474398708,
0.01046605886554, 0.00346226170766, 0.0039720759906, 0.00110199747387,
-0.00340610876916, -4.63800737643485e-05, 0.00143230827182, -0.0018378102704,
0.00157293366968, 0.00169295518939, 0.00086246831653, 0.00396682929054,
0.00395032406919, -0.00265224491201, 0.00162162050201, -0.0011606066005,
-0.00128783881235, 0.00364476878277, 0.00043559148624, -0.00024040613102,
-0.00066598675772, -8.70119549428016e-05, 0.00073131738351, 0.0004310477698,
0.00519235806746, 0.00995606223948, -0.00192862200551, 0.00257535479622,
0.00452502363079, 0.00132008444764, -0.0033720597776, 0.00464986350895,
0.00318540398886, 0.0036471909126, 0.00699275905022, 0.00104501002309,
7.98829235871906e-05, 0.00428168852619, 0.00637386122264, 0.00108682812851,
-0.00029879124879, -6.91039695800782e-05), b0001 = c(0.01800092688004,
0.02011469070317, 0.02028566151573, 0.0179124206973, 0.01941521590629,
0.01846852368848, 0.01564274610647, 0.01763088342656, 0.01782806190528,
0.01595252071591, 0.02045645453128, 0.01934534979926, 0.01892079941012,
0.01859074925398, 0.01759062061265, 0.01593436604294, 0.01809677718956,
0.01698907749719, 0.01956008653302, 0.01292521854622, 0.01781296155008,
0.01589757045382, 0.01700943508274, 0.01673888527351, 0.01999578814362,
0.01689588730579, 0.01474826635901, 0.01762957227617, 0.01844433337313,
0.01426185254875, 0.01647358935637, 0.01852101980912, 0.01705020482026,
0.01867477359887, 0.01474757340631, 0.01722894148788, 0.01746963005864,
0.01632960496522, 0.01466473971168, 0.01463956672595, 0.01772861915606,
0.01702957873434, 0.01740538934663, 0.02136003322368, 0.02565897334663,
0.01291107725161, 0.01753092898439, 0.01687893043972, 0.01409828681218,
0.01588293753652, 0.01540482711573), region = structure(c(3L,
4L, 4L, 3L, 4L, 4L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 1L, 1L, 1L, 1L,
3L, 3L, 2L, 3L, 2L, 1L, 1L, 3L, 1L, 4L, 1L, 4L, 2L, 2L, 4L, 2L,
3L, 1L, 1L, 3L, 4L, 2L, 2L, 3L, 1L, 3L, 3L, 4L, 2L, 3L, 4L, 3L,
1L, 4L), .Label = c("MW", "NE", "SE", "W"), class = "factor")), .Names = c("Ndifference",
"d1011", "netm1011", "b0001", "region"), class = "data.frame", row.names = c(NA,
-51L))
1 个解决方案
#1
1
Do you have a particular reason for using Gamma with an identity link (family=Gamma(link="identity")
)? That's probably the proximal cause of your problem. Using Gamma with a log link (family(Gamma(link="log"))
) seems to work fine. (As is often the case, fitting a log-Normal model, i.e. log-transforming the response variable and using lmer
rather than glmer
, is even faster and more robust.)
您是否有特殊的理由使用带有标识链接的Gamma(family=Gamma(link="identity") ?这可能是你问题的近因。使用带有日志链接的伽玛(家庭)(Gamma(link="log"))似乎运行良好。(通常情况下,拟合对数正态模型(即对响应变量进行对数转换并使用lmer而不是glmer)会更快、更健壮。)
In general, using link functions that do not constrain the response to be in the domain of the chosen family is numerically unstable, because it is easy for the model to wander into infeasible regions. Translating that out of statistical jargon, if you're going to assume the data are Gamma-distributed, then you may have trouble if, on the way to the optimal solution, the program tries out values that lead to negative mean estimates. In contrast to the identity link (or the canonical inverse link for the Gamma family), the log link forces all the predictions to be positive.
一般来说,使用不限制响应在所选族域中的链接函数在数值上是不稳定的,因为模型很容易漫游到不可行的区域。用统计学术语解释一下,如果你假设数据是伽玛分布的,那么你可能会遇到麻烦,如果在最优解的过程中,程序会尝试导致负均值估计的值。与恒等链(或伽玛族的标准反链)相反,对数链迫使所有的预测都是正的。
lme4
is admittedly more fragile than it could be about this, but I don't see any particular reason to assume your responses are Gamma distributed with an identity link ...
lme4显然比这个要脆弱得多,但是我不认为有什么特别的理由假设您的响应是带有标识链接的Gamma……
Based on what I'm seeing from the model predictions and diagnostics, it might be probably be worth dropping the negative value from your data set - forcing it to be (just barely) positive is making it an outlier on the log scale ...
根据我从模型预测和诊断中看到的情况,可能有必要将数据集的负值去掉——强制它为正(几乎为正)使它成为日志级别上的异常值……
library(lme4)
m1 <- lmer(log(Ndifference+74337) ~ netm1011 + d1011 +
b0001 + (1|region), data=dd)
m2 <- glmer(Ndifference+74337 ~ netm1011 + d1011 +
b0001 + (1|region), data=dd,
family=Gamma(link="log"))
Comparing pred and obs (ignoring the outlier):
比较pred和obs(忽略异常值):
pairs(cbind(obs=log(dd$Ndifference+74337),
lognorm=predict(m1),gamma=predict(m2)),gap=0,
xlim=c(10,15),ylim=c(10,15))
#1
1
Do you have a particular reason for using Gamma with an identity link (family=Gamma(link="identity")
)? That's probably the proximal cause of your problem. Using Gamma with a log link (family(Gamma(link="log"))
) seems to work fine. (As is often the case, fitting a log-Normal model, i.e. log-transforming the response variable and using lmer
rather than glmer
, is even faster and more robust.)
您是否有特殊的理由使用带有标识链接的Gamma(family=Gamma(link="identity") ?这可能是你问题的近因。使用带有日志链接的伽玛(家庭)(Gamma(link="log"))似乎运行良好。(通常情况下,拟合对数正态模型(即对响应变量进行对数转换并使用lmer而不是glmer)会更快、更健壮。)
In general, using link functions that do not constrain the response to be in the domain of the chosen family is numerically unstable, because it is easy for the model to wander into infeasible regions. Translating that out of statistical jargon, if you're going to assume the data are Gamma-distributed, then you may have trouble if, on the way to the optimal solution, the program tries out values that lead to negative mean estimates. In contrast to the identity link (or the canonical inverse link for the Gamma family), the log link forces all the predictions to be positive.
一般来说,使用不限制响应在所选族域中的链接函数在数值上是不稳定的,因为模型很容易漫游到不可行的区域。用统计学术语解释一下,如果你假设数据是伽玛分布的,那么你可能会遇到麻烦,如果在最优解的过程中,程序会尝试导致负均值估计的值。与恒等链(或伽玛族的标准反链)相反,对数链迫使所有的预测都是正的。
lme4
is admittedly more fragile than it could be about this, but I don't see any particular reason to assume your responses are Gamma distributed with an identity link ...
lme4显然比这个要脆弱得多,但是我不认为有什么特别的理由假设您的响应是带有标识链接的Gamma……
Based on what I'm seeing from the model predictions and diagnostics, it might be probably be worth dropping the negative value from your data set - forcing it to be (just barely) positive is making it an outlier on the log scale ...
根据我从模型预测和诊断中看到的情况,可能有必要将数据集的负值去掉——强制它为正(几乎为正)使它成为日志级别上的异常值……
library(lme4)
m1 <- lmer(log(Ndifference+74337) ~ netm1011 + d1011 +
b0001 + (1|region), data=dd)
m2 <- glmer(Ndifference+74337 ~ netm1011 + d1011 +
b0001 + (1|region), data=dd,
family=Gamma(link="log"))
Comparing pred and obs (ignoring the outlier):
比较pred和obs(忽略异常值):
pairs(cbind(obs=log(dd$Ndifference+74337),
lognorm=predict(m1),gamma=predict(m2)),gap=0,
xlim=c(10,15),ylim=c(10,15))