如何强制R在回归中使用指定的因素级别作为参考?

How can I tell R to use a certain level as reference if I use binary explanatory variables in a regression?

如果我在回归中使用二元解释变量，我如何告诉R使用某个级别作为参考?

It's just using some level by default.

它只是在默认情况下使用某个级别。

lm(x ~ y + as.factor(b))

with b {0, 1, 2, 3, 4}. Let's say I want to use 3 instead of the zero that is used by R.

和b{0, 1, 2, 3, 4}。假设我想用3而不是R用的0。

5 个解决方案

#1

110

See the relevel() function. Here is an example:

看到relevel()函数。这是一个例子:

set.seed(123)
x <- rnorm(100)
DF <- data.frame(x = x,
                 y = 4 + (1.5*x) + rnorm(100, sd = 2),
                 b = gl(5, 20))
head(DF)
str(DF)

m1 <- lm(y ~ x + b, data = DF)
summary(m1)

Now alter the factor b in DF by use of the relevel() function:

现在通过使用relevel()函数改变DF中的因子b:

DF <- within(DF, b <- relevel(b, ref = 3))
m2 <- lm(y ~ x + b, data = DF)
summary(m2)

The models have estimated different reference levels.

模型估计了不同的参考水平。

> coef(m1)
(Intercept)           x          b2          b3          b4          b5 
  3.2903239   1.4358520   0.6296896   0.3698343   1.0357633   0.4666219 
> coef(m2)
(Intercept)           x          b1          b2          b4          b5 
 3.66015826  1.43585196 -0.36983433  0.25985529  0.66592898  0.09678759

#2

Others have mentioned the relevel command which is the best solution if you want to change the base level for all analyses on your data (or are willing to live with changing the data).

其他人提到了发布命令，这是最好的解决方案，如果您想要更改数据的所有分析的基本级别(或者愿意接受更改数据的生活)。

If you don't want to change the data (this is a one time change, but in the future you want the default behavior again), then you can use a combination of the C (note uppercase) function to set contrasts and the contr.treatments function with the base argument for choosing which level you want to be the baseline. For example:

如果你不想改变数据(这是一次改变,但未来你想要默认行为),那么您可以使用一个C的组合(注意大写)函数设置对比和contr.treatments函数与基本参数选择你想成为基线水平。例如:

lm( Sepal.Width ~ C(Species,contr.treatment(3, base=2)), data=iris )

#3

The relevel() command is a shorthand method to your question. What it does is reorder the factor so that whatever is the ref level is first. Therefore, reordering your factor levels will also have the same effect but gives you more control. Perhaps you wanted to have levels 3,4,0,1,2. In that case...

命令是对您的问题的简略方法。它所做的是重新排序因子，这样不管ref级别是什么。因此，重新排序你的因素水平也会有同样的效果，但会给你更多的控制。也许你想要的是3 4 0 1 2。在这种情况下……

bFactor <- factor(b, levels = c(3,4,0,1,2))

I prefer this method because it's easier for me to see in my code not only what the reference was but the position of the other values as well (rather than having to look at the results for that).

我更喜欢这个方法，因为在我的代码中看到的不仅仅是引用，而是其他值的位置(而不是查看结果)。

NOTE: DO NOT make it an ordered factor. A factor with a specified order and an ordered factor are not the same thing. lm() may start to think you want polynomial contrasts if you do that.

注意:不要使它成为一个有序的因素。一个具有指定顺序和有序因子的因子是不一样的。lm()可能会开始认为，如果你这样做，你想要多项式的对比。

#4

You can also manually tag the column with a contrasts attribute, which seems to be respected by the regression functions:

您还可以使用对比属性手动标记该列，该属性似乎受到回归函数的尊重:

contrasts(df$factorcol) <- contr.treatment(levels(df$factorcol),
   base=which(levels(df$factorcol) == 'RefLevel'))

#5

I know this is an old question, but I had a similar issue and found that:

我知道这是一个老问题，但我有一个类似的问题，发现:

lm(x ~ y + relevel(b, ref = "3"))

does exactly what you asked.

照你说的做。

#1

110