I have run an ANOVA and TukeyHSD over a dataframe containing anatomical regions in column 1 (region) and gene expression values in column 2 (S1). I normally would expect the p-value from the aov summary to be expressed as Pr(>F), so I'm a little fuzzy on the results I've obtained. Also, can someone help me understand the Tukey multiple comparisons of means results? I'm not totally clear on what the diff and p adj results indicate. The results shown here are an abridged version of what I'm actually working with, FYI.
我在第1列(区域)和第2列(S1)的基因表达值上运行了一个ANOVA和TukeyHSD。我通常期望从aov总结中得到的p值表示为Pr(>F),所以我对得到的结果有点模糊。还有,有人能帮我理解一下多重比较的结果吗?我还不清楚diff和p的结果表明了什么。这里显示的结果是我实际工作的一个简化版本,FYI。
> aov.result = aov(S1 ~ region, data=raw.data)
> summary(aov.result)
Df Sum Sq Mean Sq F value Pr(>F)
region 60 61.713 1.02856 5.9246 < 2.2e-16 ***
Residuals 655 113.712 0.17361
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> TukeyHSD(aov.result)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = S1 ~ region, data = raw.data)
$region
diff lwr upr p adj
AB-AA 0.4118651583 -2.864195e-01 1.110149848 0.9847745
AHA-AA -0.0468785098 -7.608569e-01 0.667099930 1.0000000
APir-AA 0.4419135565 -2.563711e-01 1.140198246 0.9502924
B-AA 0.5379787168 -1.603060e-01 1.236263406 0.5846356
1 个解决方案
#1
2
Lets start with some reproducible data, one factor and one continuous variable:
让我们从一些可重复的数据开始,一个因素和一个连续变量:
set.seed(1)
df1 <- data.frame(
f1=as.factor(rep(seq(1:3),4)),
c1=abs(rnorm(12)))
s1 <- stats::aov(df1$c1 ~ df1$f1)
summary(s1)
This gives output similar to yours.
这将输出类似于您的输出。
The P-value for your data appears correct and can be confirmed with e.g.:
你的数据的p值看起来是正确的,可以用例来确认。
1-stats::pf(q=5.92, df1=60, df2=655)
[1] 0
Now, looking at output from:
现在,看看输出:
s2 <- stats::TukeyHSD.aov(s1)
i.e.
即。
$`df1$f1`
diff lwr upr p adj
2-1 -0.06282377 -1.038236 0.9125887 0.9823655
3-1 -0.09820762 -1.073620 0.8772048 0.9575774
3-2 -0.03538385 -1.010796 0.9400286 0.9943641
The first column is the difference in the means. In my example:
第一列是均值之差。在我的例子:
m1 <- mean( df1$c1[df1$f1==1] )
m2 <- mean( df1$c1[df1$f1==2] )
Now m2-m1
is approximately equal to s2$"df1$f1"[1,1]
, here -0.068..
现在m2-m1近似等于s2$"df1$f1"[1,1],这里-0.068。
This 'difference of means' has a confidence interval calculated from the studentized range (q) distribution. The mechanics can be found in the source code of stats::TukeyHSD.aov()
. See also ?ptukey
. Note also the rationale for 'correction for multiple comparisons' is controversial in certain contexts. This sort of question might be better suited to CrossValidated.
这种“均值差异”的置信区间是由研究范围(q)分布计算出来的。可以在stats的源代码中找到该机制:TukeyHSD.aov()。参见? ptukey。还请注意,在某些情况下,“多次比较”的理由是有争议的。这类问题可能更适合交叉验证。
#1
2
Lets start with some reproducible data, one factor and one continuous variable:
让我们从一些可重复的数据开始,一个因素和一个连续变量:
set.seed(1)
df1 <- data.frame(
f1=as.factor(rep(seq(1:3),4)),
c1=abs(rnorm(12)))
s1 <- stats::aov(df1$c1 ~ df1$f1)
summary(s1)
This gives output similar to yours.
这将输出类似于您的输出。
The P-value for your data appears correct and can be confirmed with e.g.:
你的数据的p值看起来是正确的,可以用例来确认。
1-stats::pf(q=5.92, df1=60, df2=655)
[1] 0
Now, looking at output from:
现在,看看输出:
s2 <- stats::TukeyHSD.aov(s1)
i.e.
即。
$`df1$f1`
diff lwr upr p adj
2-1 -0.06282377 -1.038236 0.9125887 0.9823655
3-1 -0.09820762 -1.073620 0.8772048 0.9575774
3-2 -0.03538385 -1.010796 0.9400286 0.9943641
The first column is the difference in the means. In my example:
第一列是均值之差。在我的例子:
m1 <- mean( df1$c1[df1$f1==1] )
m2 <- mean( df1$c1[df1$f1==2] )
Now m2-m1
is approximately equal to s2$"df1$f1"[1,1]
, here -0.068..
现在m2-m1近似等于s2$"df1$f1"[1,1],这里-0.068。
This 'difference of means' has a confidence interval calculated from the studentized range (q) distribution. The mechanics can be found in the source code of stats::TukeyHSD.aov()
. See also ?ptukey
. Note also the rationale for 'correction for multiple comparisons' is controversial in certain contexts. This sort of question might be better suited to CrossValidated.
这种“均值差异”的置信区间是由研究范围(q)分布计算出来的。可以在stats的源代码中找到该机制:TukeyHSD.aov()。参见? ptukey。还请注意,在某些情况下,“多次比较”的理由是有争议的。这类问题可能更适合交叉验证。