尝试在R中运行lmer()时出错

时间:2021-10-06 17:42:01

So here's my problem. I have a data set in R that I need to run a mixed effects model on. Here's the code:

这是我的问题。我有一个R中的数据集,我需要运行一个混合效果模型。这是代码:

data <- read.csv("D:/blahblah.csv")
analysis.data <- lmer(intdiff ~ stress_limit * word_position * follows + (1|speaker), data)
summary(analysis.data)

When I try to run the script, it returns the following error:

当我尝试运行脚本时,它会返回以下错误:

 Error in mer_finalize(ans) : Downdated X'X is not positive definite, 15.

I have tracked the error down to the "follows" parameter because when I just use stress_limit and word_position, it runs fine. If it helps, the data in "follows" are only 3 strings: n or l, consonant, vowel. I've tried replacing the spaces with _ but with no success. Is there something about the internal workings of the lmer() function that is preventing the use of "follows" in this case? Any help would be great!

我已经将错误跟踪到“follow”参数,因为当我使用stress_limit和word_position时,它运行良好。如果有帮助的话,“follow”中的数据只有3个字符串:n或l,辅音,元音。我试过用_替换空格,但没有成功。在这种情况下,lmer()函数的内部工作方式是否有什么问题阻碍了“follow”的使用?任何帮助都将是伟大的!

For more info: intdiff contains numeric values, stress_limit is strings (Stressed or Unstressed) and word position is also strings (Word Medial or Word Initial).

有关更多信息:intdiff包含数值,stress_limit是字符串(重读或非重读),单词位置也是字符串(单词中间或单词首字母)。

EDIT: Here is a data sample that reproduces the error:

编辑:这里是一个重新生成错误的数据样本:

structure(list(intdiff = c(11.45007951, 12.40144758, 13.47898367, 
6.279497762, 18.19461897, 16.15539707), word_position = structure(c(2L, 
2L, 2L, 1L, 1L, 1L), .Label = c("Word Initial", "Word Medial"
), class = "factor"), follows = structure(c(4L, 4L, 4L, 1L, 2L, 
4L), .Label = c("Consonant", "n or l", "Pause", "Vowel"), class = "factor"), 
stress_limit = structure(c(2L, 1L, 1L, 2L, 2L, 2L), .Label = c("Stressed", 
"Unstressed"), class = "factor"), speaker = structure(c(2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("f11r", "f13r", "f15a", "f16a", 
"m09a", "m10a", "m12r", "m14r"), class = "factor")), .Names = c("intdiff", 
"word_position", "follows", "stress_limit", "speaker"), row.names = c(NA, 
6L), class = "data.frame")

I tried the lme() function as well, but it returned this error:

我也尝试了lme()函数,但它返回了这个错误:

Error in MEEM(object, conLin, control$niterEM) : 
Singularity in backsolve at level 0, block 1

The code in my original post is the exact code I'm using, minus the library(lme4) call, so I'm not leaving any information out I can think of.

我最初的文章中的代码就是我使用的代码,减去库(lme4)调用,所以我不会遗漏任何我能想到的信息。

My R version is 2.15.2

我的R版本是2。15.2

1 个解决方案

#1


10  

Hard to tell for sure without a reproducible example: How to make a great R reproducible example?

如果没有一个可再现的例子,就很难确定:如何才能做出一个伟大的R可再现的例子?

But, guessing: these sorts of problems are generally due to collinearity in the design matrix. Centering your continuous predictor (intdiff) may help. You can also explore the design matrix directly

但是,猜测:这类问题通常是由于设计矩阵的共线性。集中你的连续预测器(intdiff)可能会有帮助。你也可以直接探索设计矩阵。

X <- model.matrix( ~ stress_limit * word_position * follows, data)

Collinearity between pairs: cor(X). Unfortunately I don't have a suggestion for detecting multi-collinearity (i.e. not between pairs, but between combinations of >2 predictors) off the top of my head, although you can look into the tools for computing variance inflation factors (e.g. library("sos"); findFn("VIF")).

对之间的共线性:软木(X)。不幸的是,我并没有什么建议可以让你从我的头脑中去检测多重共线性(即不是在对之间,而是在>2预测因子的组合之间),尽管你可以查看计算方差膨胀因子的工具(例如library(“sos”);findFn(VIF))。

As a cross-check, lme should also be able to handle your model:

作为交叉检查,lme也应该能够处理你的模型:

library(nlme)
lme(intdiff ~ stress_limit * word_position * follows, 
   random=~1|speaker, data=data)

When I run your test data in the development version of lme4 (available on github) I get Error in lmer(intdiff ~ stress_limit * word_position * follows + (1 | : rank of X = 5 < ncol(X) = 12. On the other hand, with this small an input data set (6 observations), there's no possible way you could fit 12 parameters. It's a little harder to tell exactly where your problem is. Do all 12 combinations of your 3 variables actually occur in your data? If some are missing, then you need to follow the advice given in the development version's help:

当我在lme4的开发版本中运行您的测试数据时(可以在github上获得),我在lmer中得到了错误(intdiff ~ stress_limit * word_position * follows + (1 |: rank of X = 5 < ncol(X) = 12)。另一方面,有了这么小的输入数据集(6个观测值),就不可能有12个参数。很难确切地说出你的问题在哪里。你的3个变量的全部12个组合在你的数据中出现了吗?如果缺少一些,那么您需要遵循开发版本的帮助中给出的建议:

Unlike some simpler modeling frameworks such as ‘lm’ and ‘glm’ which automatically detect perfectly collinear predictor variables, ‘[gn]lmer’ cannot handle design matrices of less than full rank. For example, in cases of models with interactions that have unobserved combinations of levels, it is up to the user to define a new variable (for example creating ‘ab’ within the data from the results of ‘droplevels(interaction(a,b))’).

与“lm”和“glm”等简单的建模框架不同,这些框架可以自动检测出完美的共线预测变量。例如,对于具有未观察到的级别组合的交互的模型,用户可以定义一个新的变量(例如,在“droplevel(交互(a,b))”结果的数据中创建“ab”)。

In particular, you can fit this model as follows:

具体可拟合如下:

data <- transform(data,
       allcomb=interaction(stress_limit,word_position,follow,drop=TRUE))
lme(intdiff ~ allcomb, random=~1|speaker, data=data)

This will give you a one-way ANOVA treating the unique combinations of levels that are actually present in the data as the categories. You'll have to figure out for yourself what they mean.

这将给您一个单向的ANOVA,将数据中实际存在的级别的惟一组合处理为类别。你得自己弄明白他们的意思。

The alternative is to reduce the number of interactions in the model until you get to a set that don't have any missing combinations; if you're lucky (stress_limit+word_position+follow)^2 (all two-way interactions) will work, but you might have to reduce the model still farther (e.g. stress_limit + word_position*follow).

另一种选择是减少模型中交互的数量,直到你得到一个没有任何缺失的组合的集合;如果你幸运的话(stress_limit + word_position +)^ 2(所有双向交互)将工作,但你可能会减少模型仍然远(例如stress_limit + word_position *跟随)。

Another way to test this is to use lm() on your proposed models and check that there are no NA values in the estimated coefficients.

另一种测试方法是在建议的模型上使用lm(),并检查估计系数中没有NA值。

The main thing you will be losing in these ways is convenience/ease of interpretation, because the parameters for the missing combinations couldn't have been estimated from the data anyway ...

在这些方法中,您将丢失的主要东西是方便/易于解释,因为无论如何,丢失的组合的参数不可能从数据中估计出来……

#1


10  

Hard to tell for sure without a reproducible example: How to make a great R reproducible example?

如果没有一个可再现的例子,就很难确定:如何才能做出一个伟大的R可再现的例子?

But, guessing: these sorts of problems are generally due to collinearity in the design matrix. Centering your continuous predictor (intdiff) may help. You can also explore the design matrix directly

但是,猜测:这类问题通常是由于设计矩阵的共线性。集中你的连续预测器(intdiff)可能会有帮助。你也可以直接探索设计矩阵。

X <- model.matrix( ~ stress_limit * word_position * follows, data)

Collinearity between pairs: cor(X). Unfortunately I don't have a suggestion for detecting multi-collinearity (i.e. not between pairs, but between combinations of >2 predictors) off the top of my head, although you can look into the tools for computing variance inflation factors (e.g. library("sos"); findFn("VIF")).

对之间的共线性:软木(X)。不幸的是,我并没有什么建议可以让你从我的头脑中去检测多重共线性(即不是在对之间,而是在>2预测因子的组合之间),尽管你可以查看计算方差膨胀因子的工具(例如library(“sos”);findFn(VIF))。

As a cross-check, lme should also be able to handle your model:

作为交叉检查,lme也应该能够处理你的模型:

library(nlme)
lme(intdiff ~ stress_limit * word_position * follows, 
   random=~1|speaker, data=data)

When I run your test data in the development version of lme4 (available on github) I get Error in lmer(intdiff ~ stress_limit * word_position * follows + (1 | : rank of X = 5 < ncol(X) = 12. On the other hand, with this small an input data set (6 observations), there's no possible way you could fit 12 parameters. It's a little harder to tell exactly where your problem is. Do all 12 combinations of your 3 variables actually occur in your data? If some are missing, then you need to follow the advice given in the development version's help:

当我在lme4的开发版本中运行您的测试数据时(可以在github上获得),我在lmer中得到了错误(intdiff ~ stress_limit * word_position * follows + (1 |: rank of X = 5 < ncol(X) = 12)。另一方面,有了这么小的输入数据集(6个观测值),就不可能有12个参数。很难确切地说出你的问题在哪里。你的3个变量的全部12个组合在你的数据中出现了吗?如果缺少一些,那么您需要遵循开发版本的帮助中给出的建议:

Unlike some simpler modeling frameworks such as ‘lm’ and ‘glm’ which automatically detect perfectly collinear predictor variables, ‘[gn]lmer’ cannot handle design matrices of less than full rank. For example, in cases of models with interactions that have unobserved combinations of levels, it is up to the user to define a new variable (for example creating ‘ab’ within the data from the results of ‘droplevels(interaction(a,b))’).

与“lm”和“glm”等简单的建模框架不同,这些框架可以自动检测出完美的共线预测变量。例如,对于具有未观察到的级别组合的交互的模型,用户可以定义一个新的变量(例如,在“droplevel(交互(a,b))”结果的数据中创建“ab”)。

In particular, you can fit this model as follows:

具体可拟合如下:

data <- transform(data,
       allcomb=interaction(stress_limit,word_position,follow,drop=TRUE))
lme(intdiff ~ allcomb, random=~1|speaker, data=data)

This will give you a one-way ANOVA treating the unique combinations of levels that are actually present in the data as the categories. You'll have to figure out for yourself what they mean.

这将给您一个单向的ANOVA,将数据中实际存在的级别的惟一组合处理为类别。你得自己弄明白他们的意思。

The alternative is to reduce the number of interactions in the model until you get to a set that don't have any missing combinations; if you're lucky (stress_limit+word_position+follow)^2 (all two-way interactions) will work, but you might have to reduce the model still farther (e.g. stress_limit + word_position*follow).

另一种选择是减少模型中交互的数量,直到你得到一个没有任何缺失的组合的集合;如果你幸运的话(stress_limit + word_position +)^ 2(所有双向交互)将工作,但你可能会减少模型仍然远(例如stress_limit + word_position *跟随)。

Another way to test this is to use lm() on your proposed models and check that there are no NA values in the estimated coefficients.

另一种测试方法是在建议的模型上使用lm(),并检查估计系数中没有NA值。

The main thing you will be losing in these ways is convenience/ease of interpretation, because the parameters for the missing combinations couldn't have been estimated from the data anyway ...

在这些方法中,您将丢失的主要东西是方便/易于解释,因为无论如何,丢失的组合的参数不可能从数据中估计出来……