文章目录
学习要求
- 数据分类与模型选择
- 广义线性模型概述
- Logistic回归模型
- 对数线性模型
- 一般线性模型的计算
1 数据的分类与模型选择
1.1 变量取值类型
因变量
解释变量
1.2 案例
1.2.1 建立Poisson对数线性模型
构建拟Poisson分布族模型
glm(formula, family=poisson(link = log), data, …)
d5.2 = read.table("clipboard", hearder = T) #读取数据
log = glm(y~x1+x2, family=poisson, data=d5,2) #对数线性模型
summary(log) #检验结果Deviance Residuals:
1 2 3 4 5 6
-10.784 14.444 -8.468 -2.620 4.960 -3.142
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.15687 0.14196 43.371 < 2e-16 ***
x1 0.12915 0.04370 2.955 0.00312 **
x2 -1.12573 0.08262 -13.625 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 662.84 on 5 degrees of freedom
Residual deviance: 437.97 on 3 degrees of freedom
AIC: 481.96
Number of Fisher Scoring iterations: 5
分析
说明收入和满意程度对产品有重要影响
2 广义线性模型
(y不再连续) 指数分布族
2.1 广义线性模型函数glm()
glm(formula, family = gaussian, data, …)
family为分布族
2.2 说明:Logistic模型
2.3 举例
d5.1 = read.table("clipboard", header = T) #读取数据
logit <- glm(y~x1 + x2 + x3, family = binomial, data = d5.1) #Logistic模型
summary(logit)
Call:
glm(formula = y ~ x1 + x2 + x3, family = binomial, data = d5.1)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5636 -0.9131 -0.7892 0.9637 1.6000
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.597610 0.894831 0.668 0.5042
x1 -1.496084 0.704861 -2.123 0.0338 *
x2 -0.001595 0.016758 -0.095 0.9242
x3 0.315865 0.701093 0.451 0.6523
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 62.183 on 44 degrees of freedom
Residual deviance: 57.026 on 41 degrees of freedom
AIC: 65.026
Number of Fisher Scoring iterations: 4
#逐步筛选变量logistic回归模型
logit.step = step(logit)
summary(logit.step)
Start: AIC=65.03
y ~ x1 + x2 + x3
Df Deviance AIC
- x2 1 57.035 63.035
- x3 1 57.232 63.232
<none> 57.026 65.026
- x1 1 61.936 67.936
Step: AIC=63.03
y ~ x1 + x3
Df Deviance AIC
- x3 1 57.241 61.241
<none> 57.035 63.035
- x1 1 61.991 65.991
Step: AIC=61.24
y ~ x1
Df Deviance AIC
<none> 57.241 61.241
- x1 1 62.183 64.183
Call:
glm(formula = y ~ x1, family = binomial, data = d5.1)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4490 -0.8782 -0.8782 0.9282 1.5096
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.6190 0.4688 1.320 0.1867
x1 -1.3728 0.6353 -2.161 0.0307 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 62.183 on 44 degrees of freedom
Residual deviance: 57.241 on 43 degrees of freedom
AIC: 61.241
Number of Fisher Scoring iterations: 4
2.4 一般线性模型:完全随机设计模型
#试分析各机器生产有无显著性差异
d5.3 = read.table("clipboard", header = T)
anova(lm(Y~factor(A), data = d5.3)
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
factor(A) 2 0.122233 0.061117 40.534 8.94e-07 ***
Residuals 15 0.022617 0.001508
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
2.5 随机单位组设计模型
d5.4 = read.table("clipboard", header = T);d5.4 #读取数据
anova(lm(Y~factor(A)+factor(B),data = d5.4))
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
factor(A) 3 15759 5253 0.4306 0.7387
factor(B) 2 22385 11192 0.9174 0.4491
Residuals 6 73198 12200
得到燃和推进器对火箭射程都无影响
2.6 关于多元线性回归模型的基本假定
问题:
- 多元线性回归模型有哪些基本假定?
- 为什么要求多元线性回归模型满足一些基本假设?
- 当这些假定不满足时对回归模型有何影响?
答:(手打…自己的理解…)
1.
- 解释变量是确定性变量, 不是随机变量;解释变量之间互不相关,无多重共线性
- 随机误差项具有0均差和同方差
- 随机误差项不存在序列相关关系
- 随机误差项与解释变量之间不相关
- 随机误差项服从0均值、同方差的正态分布
因为只有当满组这些条件时得到的多元线性回归模型才是合理存在的
3. 若这些假定不满足那么,我们建立这样的多元线性回归模型就不合理,无依据,得到的模型自然就不能恰当的用于拟合,建立模型也就没有意义