介绍
方差分析的R语言实现包括以下部分:
-
数据导入
-
数据清洗
-
ANOVA计算
-
结果解析
-
ANOVA评估
参考教程Analysis_of_Variance
knitr::opts_chunk$set(message = FALSE, warning = FALSE)
options(stringsAsFactors = F)
options(future.globals.maxSize = 1000 * 1024^2)
step1: 安装R包
install.packages(c("ggplot2", "ggpubr", "tidyverse"))
step2: 载R包
library(tidyverse) # 数据预处理R包
library(readxl) # 读取xlsx数据R包
library(ggpubr) # 画图R包
step3: 导入数据
- 随机生成数据
data <- data.frame(D = c(rep("A", 4), rep("B", 4), rep("C", 4), rep("D", 4), rep("E", 4), rep("F", 4)),
RR = c(80,83,83,85,75,75,79,79,74,73,76,77,67,72,74,74,62,62,67,69,60,61,64,66))
- 存储数据
write.table(data, file = "data.txt", sep = "\t", quote = F, row.names = F)
xlsx::write.xlsx(data, file = "data.xlsx", row.names = F)
- txt数据格式
data <- read.table("data.txt", header = T)
- xlsx数据格式
data <- read_xlsx("data.xlsx", sheet = 1)
step4: 数据清洗
- 筛选数据:丢弃A组数据
data_drop <- data %>%
dplyr::filter(D != "A")#%>%
#dplyr::mutate(Test = "test")
head(data_drop)
- 数据平均值和其他指标
data %>%
group_by(D) %>%
summarise(N=n(),
Means=mean(RR),
SS=sum((RR - Means)^2),
SD=sd(RR),
SEM=SD/N^.5)
- 展示数据: boxplot
ggboxplot(data_drop,
x = "D",
y = "RR",
color = "D",
ylab = "RR", xlab = "D")
step5: 单因素方差分析
-
one-way ANOVAs: 使用aov函数运行单因素方差分析 (公式是:Y是检验变量,X是分组变量);
-
再使用summary函数获取单因素方差分析的结果。
# Y=RR; X=D
one.way <- aov(RR ~ D, data = data_drop)
summary(one.way)
结果解析:
-
Residuals
是模型的残差,可以理解为截距; -
Df
列显示了自变量的*度(变量中的水平数减1)和残差的*度(观察总数减1和自变量中的水平数减1); -
Sum Sq
列显示平方和(即组均值与总体均值之间的总变化)。; -
Mean Sq
列是平方和的平均值,通过将平方和除以每个参数的*度来计算; -
F value
列是F检验的检验统计量。这是每个自变量的均方除以残差的均方。F值越大,自变量引起的变化越有可能是真实的,而不是偶然的; -
Pr(>F)
列是F统计量的p值。这表明,如果组均值之间没有差异的原假设成立,那么从检验中计算出的F值发生的概率大小。 -
另一种方法:t-test仅仅适合2组比较,因此需要筛选
data_ttest <- data_drop %>%
dplyr::filter(D %in% c("B", "C")) #%>%
#dplyr::filter(RR != 77)
# data_test_filter <- filter(data_drop, D %in% c("B"))
# data_test_filter2 <- filter(data_test_filter, RR != 77)
t.test(RR ~ D, data = data_ttest)
step6: 后置检验
-
ANOVA结果仅仅揭示多个组间的差异结果,具体到哪两个组内部差异还需要做后置检验
-
后置检验通常采用
TukeyHD
函数
TukeyHSD(one.way)
-
该结果给出每个两组之间的结果;
-
diff
: 两组的均值之差; -
Lwr, upr
: 95%置信区间的下限和上限(默认值) ; -
P adj
: 多次比较调整后的P值。
step7: 检查残差分布是否符合正态分布
- ANOVA比较的是均值,需要每个分组的残差服从正态部分
plot(one.way, 2)
- 采用Shapiro-Wilk对残差进行检验
shapiro.test(x = residuals(object = one.way))
结果显示:残差不显著也即是表明残差服从正态分布,可以采用ANOVA分析方法判断RR在D分组的分布水平。
step8: 方差齐性检验
library(car)
leveneTest(RR ~ D, data = data_drop, center = mean)
bartlett.test(RR ~ D, data = data_drop)
系统信息
devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16)
os macOS Monterey 12.2.1
system x86_64, darwin20
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Asia/Shanghai
date 2024-03-12
rstudio 2023.09.0+463 Desert Sunflower (desktop)
pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.0)
backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0)
broom 1.0.5 2023-06-09 [1] CRAN (R 4.3.0)
bslib 0.6.1 2023-11-28 [1] CRAN (R 4.3.0)
cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.0)
car 3.1-2 2023-03-30 [1] CRAN (R 4.3.0)
carData 3.0-5 2022-01-06 [1] CRAN (R 4.3.0)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.3.0)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.0)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.0)
digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.0)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0)
evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.0)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.0)
farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.0)
ggpubr * 0.6.0 2023-02-10 [1] CRAN (R 4.3.0)
ggsignif 0.6.4 2022-10-13 [1] CRAN (R 4.3.0)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.0)
gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.0)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.0)
httpuv 1.6.14 2024-01-26 [1] CRAN (R 4.3.2)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.3.0)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.0)
knitr 1.45 2023-10-30 [1] CRAN (R 4.3.0)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.0)
later 1.3.2 2023-12-06 [1] CRAN (R 4.3.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.0)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0)
mime 0.12 2021-09-28 [1] CRAN (R 4.3.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgbuild 1.4.3 2023-12-10 [1] CRAN (R 4.3.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
pkgload 1.3.4 2024-01-16 [1] CRAN (R 4.3.0)
profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.0)
promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.0)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.0)
readxl * 1.4.3 2023-07-06 [1] CRAN (R 4.3.0)
remotes 2.4.2.1 2023-07-18 [1] CRAN (R 4.3.0)
rJava 1.0-6 2021-12-10 [1] CRAN (R 4.3.0)
rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.0)
rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.0)
rstatix 0.7.2 2023-02-01 [1] CRAN (R 4.3.0)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0)
sass 0.4.8 2023-12-06 [1] CRAN (R 4.3.0)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
shiny 1.8.0 2023-11-17 [1] CRAN (R 4.3.0)
stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.0)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.0)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.2)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.0)
usethis 2.2.2 2023-07-06 [1] CRAN (R 4.3.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.0)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.0)
withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.0)
xfun 0.41 2023-11-01 [1] CRAN (R 4.3.0)
xlsx 0.6.5 2020-11-10 [1] CRAN (R 4.3.0)
xlsxjars 0.6.1 2014-08-22 [1] CRAN (R 4.3.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0)
yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.0)
[1] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
参考材料
-
Analysis_of_Variance
-
R语言统计分析 04 多组间差异的单因素方差分析(ANOVA)