仅与R中因子级别的子集建模交互

时间:2022-09-15 09:31:09

Let's first look at lm. I have a continuous explanatory $X$ and a factor $F$ modelling seasonal aspects (in the example 8 levels).

我们先来看看lm。我有一个连续的解释$ X $和$ F $建模季节性方面(在例子8级)。

Let $\beta$ denote the slope for $X$ then I want to model interactions of the slope with the factor. It is some kind of physical model thus I an assume that the interaction is significant only for 2 of the 8 levels. How can this be formulated? I would like to use an ordinary formula as later I would like to put it into a censored regression in the AER package (function tobit)

让$ \ beta $表示$ X $的斜率然后我想模拟斜率与因子的相互作用。它是某种物理模型,因此我假设交互仅对8个级别中的2个有意义。怎么制定这个?我想使用一个普通的公式,因为我想把它放到AER包中的审查回归(函数tobit)

The data is:

数据是:

N = 50
f = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
fcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)
beta = rep(c(5,5,5,8,4,5,5,5),N)
set.seed(100) 
x = rnorm(8*N)+1
epsilon = rnorm(8*N,sd = sqrt(1/5))
y = x*beta+fcoeff+epsilon

A fit with all interactions gives an accurate result

适合所有交互可提供准确的结果

fit <- lm(y~0+x+x*f)
summary(fit)

Call:
lm(formula = y ~ 0 + x + x * f)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.41018 -0.30296  0.01818  0.32657  1.20677 

Coefficients:
       Estimate Std. Error  t value Pr(>|t|)    
x      5.039064   0.075818   66.463   <2e-16 ***
fs1   -0.945112   0.088072  -10.731   <2e-16 ***
fs2   -2.107483   0.103590  -20.344   <2e-16 ***
fs3   -2.992401   0.088164  -33.941   <2e-16 ***
fs4   -4.054411   0.094878  -42.733   <2e-16 ***
fs5   -2.730448   0.094815  -28.798   <2e-16 ***
fs6   -5.232721   0.102254  -51.174   <2e-16 ***
fs7   -9.969175   0.096307 -103.515   <2e-16 ***
fs8   -4.922782   0.092917  -52.980   <2e-16 ***
x:fs2 -0.006081   0.097748   -0.062    0.950    
x:fs3 -0.050684   0.102124   -0.496    0.620    
x:fs4  2.988702   0.103652   28.834   <2e-16 ***
x:fs5 -1.196775   0.105139  -11.383   <2e-16 ***
x:fs6  0.099112   0.103811    0.955    0.340    
x:fs7 -0.007648   0.110908   -0.069    0.945    
x:fs8 -0.107148   0.094346   -1.136    0.257    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4705 on 384 degrees of freedom
Multiple R-squared:  0.9942,    Adjusted R-squared:  0.994 
F-statistic:  4120 on 16 and 384 DF,  p-value: < 2.2e-16

How can I model the interaction with s4 and s5 only? Can I delete the other interactions from the fit for further predictions?

我怎样才能模拟与s4和s5的交互?我可以从适合中删除其他交互以进行进一步预测吗?

I tried to split the factors in 2 but then the model gets singular:

我试图将因子分成2但随后模型变得奇异:

f = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
fcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)
f2 = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
f[f %in% c("s4","s5")] <- "no.inter"
f2[f2 %in% c("s1","s2","s3","s6","s7","s8")] <- "rest"

fit <- lm(y~0+x+x*f2+ f)
summary(fit)

Call:
lm(formula = y ~ 0 + x + x * f2 + f)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.41018 -0.31544  0.00653  0.31615  1.20670 

Coefficients: (1 not defined because of singularities)
       Estimate Std. Error t value Pr(>|t|)    
x       5.01794    0.02756 182.106   <2e-16 ***
f2rest -5.02213    0.07381 -68.045   <2e-16 ***
f2s4   -4.05441    0.09495 -42.702   <2e-16 ***
f2s5   -2.73045    0.09488 -28.777   <2e-16 ***
fs1     4.09310    0.09480  43.177   <2e-16 ***
fs2     2.93401    0.09424  31.132   <2e-16 ***
fs3     2.00475    0.09456  21.201   <2e-16 ***
fs6    -0.07894    0.09419  -0.838    0.402    
fs7    -4.93545    0.09452 -52.213   <2e-16 ***
fs8          NA         NA      NA       NA    
x:f2s4  3.00983    0.07591  39.651   <2e-16 ***
x:f2s5 -1.17565    0.07793 -15.086   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4709 on 389 degrees of freedom
Multiple R-squared:  0.9941,    Adjusted R-squared:  0.994 
F-statistic:  5983 on 11 and 389 DF,  p-value: < 2.2e-16

2 个解决方案

#1


The easiest way might be to manipulate the model matrix to remove the unwanted columns:

最简单的方法可能是操纵模型矩阵以删除不需要的列:

xx <- model.matrix(y ~ 0 + x + x*f)
omit <- grep("[:]fs[^45]", colnames(xx))
xx <- xx[, -omit]
lm(y ~ 0 + xx)

Output:

Call:
lm(formula = y ~ 0 + xx)

Coefficients:
    xxx    xxfs1    xxfs2    xxfs3    xxfs4    xxfs5    xxfs6    xxfs7    xxfs8  xxx:fs4  xxx:fs5  
  5.018   -0.929   -2.088   -3.017   -4.054   -2.730   -5.101   -9.958   -5.022    3.010   -1.176 

#2


The R aspects of this question are off topic, but the statistical aspects are on topic.

这个问题的R方面是偏离主题的,但统计方面是主题。

If I may summarize: You want to model an interaction between a continuous variable and a categorical one, but only at certain levels of the categorical one.

如果我可以总结一下:您希望模拟连续变量和分类变量之间的交互,但仅限于分类变量的某些级别。

I don't think you can do this in a linear model, at least, not directly. You could, however, subset the data by level of the categorical variable and then include the interaction only in certain subsets. Another possibility is some form of regression tree, which may wind up with nodes being split into levels of the categorical variable - but I do not know of a method for forcing certain interactions into the tree.

我认为你不能在线性模型中做到这一点,至少不是直接的。但是,您可以按分类变量的级别对数​​据进行子集化,然后仅在某些子集中包含交互。另一种可能性是某种形式的回归树,它可能会将节点分成分类变量的级别 - 但我不知道强制某些交互进入树的方法。

#1


The easiest way might be to manipulate the model matrix to remove the unwanted columns:

最简单的方法可能是操纵模型矩阵以删除不需要的列:

xx <- model.matrix(y ~ 0 + x + x*f)
omit <- grep("[:]fs[^45]", colnames(xx))
xx <- xx[, -omit]
lm(y ~ 0 + xx)

Output:

Call:
lm(formula = y ~ 0 + xx)

Coefficients:
    xxx    xxfs1    xxfs2    xxfs3    xxfs4    xxfs5    xxfs6    xxfs7    xxfs8  xxx:fs4  xxx:fs5  
  5.018   -0.929   -2.088   -3.017   -4.054   -2.730   -5.101   -9.958   -5.022    3.010   -1.176 

#2


The R aspects of this question are off topic, but the statistical aspects are on topic.

这个问题的R方面是偏离主题的,但统计方面是主题。

If I may summarize: You want to model an interaction between a continuous variable and a categorical one, but only at certain levels of the categorical one.

如果我可以总结一下:您希望模拟连续变量和分类变量之间的交互,但仅限于分类变量的某些级别。

I don't think you can do this in a linear model, at least, not directly. You could, however, subset the data by level of the categorical variable and then include the interaction only in certain subsets. Another possibility is some form of regression tree, which may wind up with nodes being split into levels of the categorical variable - but I do not know of a method for forcing certain interactions into the tree.

我认为你不能在线性模型中做到这一点,至少不是直接的。但是,您可以按分类变量的级别对数​​据进行子集化,然后仅在某些子集中包含交互。另一种可能性是某种形式的回归树,它可能会将节点分成分类变量的级别 - 但我不知道强制某些交互进入树的方法。