Let's first look at lm
. I have a continuous explanatory $X$ and a factor $F$ modelling seasonal aspects (in the example 8 levels).
我们先来看看lm。我有一个连续的解释$ X $和$ F $建模季节性方面(在例子8级)。
Let $\beta$ denote the slope for $X$ then I want to model interactions of the slope with the factor. It is some kind of physical model thus I an assume that the interaction is significant only for 2 of the 8 levels. How can this be formulated? I would like to use an ordinary formula as later I would like to put it into a censored regression in the AER
package (function tobit
)
让$ \ beta $表示$ X $的斜率然后我想模拟斜率与因子的相互作用。它是某种物理模型,因此我假设交互仅对8个级别中的2个有意义。怎么制定这个?我想使用一个普通的公式,因为我想把它放到AER包中的审查回归(函数tobit)
The data is:
数据是:
N = 50
f = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
fcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)
beta = rep(c(5,5,5,8,4,5,5,5),N)
set.seed(100)
x = rnorm(8*N)+1
epsilon = rnorm(8*N,sd = sqrt(1/5))
y = x*beta+fcoeff+epsilon
A fit with all interactions gives an accurate result
适合所有交互可提供准确的结果
fit <- lm(y~0+x+x*f)
summary(fit)
Call:
lm(formula = y ~ 0 + x + x * f)
Residuals:
Min 1Q Median 3Q Max
-1.41018 -0.30296 0.01818 0.32657 1.20677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x 5.039064 0.075818 66.463 <2e-16 ***
fs1 -0.945112 0.088072 -10.731 <2e-16 ***
fs2 -2.107483 0.103590 -20.344 <2e-16 ***
fs3 -2.992401 0.088164 -33.941 <2e-16 ***
fs4 -4.054411 0.094878 -42.733 <2e-16 ***
fs5 -2.730448 0.094815 -28.798 <2e-16 ***
fs6 -5.232721 0.102254 -51.174 <2e-16 ***
fs7 -9.969175 0.096307 -103.515 <2e-16 ***
fs8 -4.922782 0.092917 -52.980 <2e-16 ***
x:fs2 -0.006081 0.097748 -0.062 0.950
x:fs3 -0.050684 0.102124 -0.496 0.620
x:fs4 2.988702 0.103652 28.834 <2e-16 ***
x:fs5 -1.196775 0.105139 -11.383 <2e-16 ***
x:fs6 0.099112 0.103811 0.955 0.340
x:fs7 -0.007648 0.110908 -0.069 0.945
x:fs8 -0.107148 0.094346 -1.136 0.257
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4705 on 384 degrees of freedom
Multiple R-squared: 0.9942, Adjusted R-squared: 0.994
F-statistic: 4120 on 16 and 384 DF, p-value: < 2.2e-16
How can I model the interaction with s4
and s5
only? Can I delete the other interactions from the fit for further predictions?
我怎样才能模拟与s4和s5的交互?我可以从适合中删除其他交互以进行进一步预测吗?
I tried to split the factors in 2 but then the model gets singular:
我试图将因子分成2但随后模型变得奇异:
f = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
fcoeff = rep(c(-1,-2,-3,-4,-3,-5,-10,-5),N)
f2 = rep(c("s1","s2","s3","s4","s5","s6","s7","s8"),N)
f[f %in% c("s4","s5")] <- "no.inter"
f2[f2 %in% c("s1","s2","s3","s6","s7","s8")] <- "rest"
fit <- lm(y~0+x+x*f2+ f)
summary(fit)
Call:
lm(formula = y ~ 0 + x + x * f2 + f)
Residuals:
Min 1Q Median 3Q Max
-1.41018 -0.31544 0.00653 0.31615 1.20670
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
x 5.01794 0.02756 182.106 <2e-16 ***
f2rest -5.02213 0.07381 -68.045 <2e-16 ***
f2s4 -4.05441 0.09495 -42.702 <2e-16 ***
f2s5 -2.73045 0.09488 -28.777 <2e-16 ***
fs1 4.09310 0.09480 43.177 <2e-16 ***
fs2 2.93401 0.09424 31.132 <2e-16 ***
fs3 2.00475 0.09456 21.201 <2e-16 ***
fs6 -0.07894 0.09419 -0.838 0.402
fs7 -4.93545 0.09452 -52.213 <2e-16 ***
fs8 NA NA NA NA
x:f2s4 3.00983 0.07591 39.651 <2e-16 ***
x:f2s5 -1.17565 0.07793 -15.086 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4709 on 389 degrees of freedom
Multiple R-squared: 0.9941, Adjusted R-squared: 0.994
F-statistic: 5983 on 11 and 389 DF, p-value: < 2.2e-16
2 个解决方案
#1
The easiest way might be to manipulate the model matrix to remove the unwanted columns:
最简单的方法可能是操纵模型矩阵以删除不需要的列:
xx <- model.matrix(y ~ 0 + x + x*f)
omit <- grep("[:]fs[^45]", colnames(xx))
xx <- xx[, -omit]
lm(y ~ 0 + xx)
Output:
Call:
lm(formula = y ~ 0 + xx)
Coefficients:
xxx xxfs1 xxfs2 xxfs3 xxfs4 xxfs5 xxfs6 xxfs7 xxfs8 xxx:fs4 xxx:fs5
5.018 -0.929 -2.088 -3.017 -4.054 -2.730 -5.101 -9.958 -5.022 3.010 -1.176
#2
The R aspects of this question are off topic, but the statistical aspects are on topic.
这个问题的R方面是偏离主题的,但统计方面是主题。
If I may summarize: You want to model an interaction between a continuous variable and a categorical one, but only at certain levels of the categorical one.
如果我可以总结一下:您希望模拟连续变量和分类变量之间的交互,但仅限于分类变量的某些级别。
I don't think you can do this in a linear model, at least, not directly. You could, however, subset the data by level of the categorical variable and then include the interaction only in certain subsets. Another possibility is some form of regression tree, which may wind up with nodes being split into levels of the categorical variable - but I do not know of a method for forcing certain interactions into the tree.
我认为你不能在线性模型中做到这一点,至少不是直接的。但是,您可以按分类变量的级别对数据进行子集化,然后仅在某些子集中包含交互。另一种可能性是某种形式的回归树,它可能会将节点分成分类变量的级别 - 但我不知道强制某些交互进入树的方法。
#1
The easiest way might be to manipulate the model matrix to remove the unwanted columns:
最简单的方法可能是操纵模型矩阵以删除不需要的列:
xx <- model.matrix(y ~ 0 + x + x*f)
omit <- grep("[:]fs[^45]", colnames(xx))
xx <- xx[, -omit]
lm(y ~ 0 + xx)
Output:
Call:
lm(formula = y ~ 0 + xx)
Coefficients:
xxx xxfs1 xxfs2 xxfs3 xxfs4 xxfs5 xxfs6 xxfs7 xxfs8 xxx:fs4 xxx:fs5
5.018 -0.929 -2.088 -3.017 -4.054 -2.730 -5.101 -9.958 -5.022 3.010 -1.176
#2
The R aspects of this question are off topic, but the statistical aspects are on topic.
这个问题的R方面是偏离主题的,但统计方面是主题。
If I may summarize: You want to model an interaction between a continuous variable and a categorical one, but only at certain levels of the categorical one.
如果我可以总结一下:您希望模拟连续变量和分类变量之间的交互,但仅限于分类变量的某些级别。
I don't think you can do this in a linear model, at least, not directly. You could, however, subset the data by level of the categorical variable and then include the interaction only in certain subsets. Another possibility is some form of regression tree, which may wind up with nodes being split into levels of the categorical variable - but I do not know of a method for forcing certain interactions into the tree.
我认为你不能在线性模型中做到这一点,至少不是直接的。但是,您可以按分类变量的级别对数据进行子集化,然后仅在某些子集中包含交互。另一种可能性是某种形式的回归树,它可能会将节点分成分类变量的级别 - 但我不知道强制某些交互进入树的方法。