d5.2 = read.table("clipboard", hearder = T)   #读取数据
log = glm(y~x1+x2, family=poisson, data=d5,2)  #对数线性模型

summary(log)                        #检验结果Deviance Residuals:

      1        2        3        4        5        6  
-10.784   14.444   -8.468   -2.620    4.960   -3.142  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  6.15687    0.14196  43.371  < 2e-16 ***
x1           0.12915    0.04370   2.955  0.00312 ** 
x2          -1.12573    0.08262 -13.625  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 662.84  on 5  degrees of freedom
Residual deviance: 437.97  on 3  degrees of freedom
AIC: 481.96

Number of Fisher Scoring iterations: 5

分析 $p_1, p_2, p_3$

说明收入和满意程度对产品有重要影响

2 广义线性模型

(y不再连续) $\rightarrow$ 指数分布族

广义与一般线性模型_基于R

2.1 广义线性模型函数glm()

glm(formula, family = gaussian, data, …)

family为分布族
$\left\{\begin{matrix} 正态分布(gaussian) & & \\ 二项分布(binomial) & & \\ 泊松分布(poission) & & \\ 伽马分布(gamma) & & \end{matrix}\right.$

2.2 说明:Logistic模型

$Logit(y) = ln(\frac{P}{1 - P}) = \beta_0 + \beta_1 x_1 + \cdot\cdot\cdot + \beta_p x_p = X\beta$

2.3 举例

d5.1 = read.table("clipboard", header = T)   #读取数据
logit <- glm(y~x1 + x2 + x3, family = binomial, data = d5.1)  #Logistic模型
summary(logit)

Call:
glm(formula = y ~ x1 + x2 + x3, family = binomial, data = d5.1)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.5636  -0.9131  -0.7892   0.9637   1.6000  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.597610   0.894831   0.668   0.5042  
x1          -1.496084   0.704861  -2.123   0.0338 *
x2          -0.001595   0.016758  -0.095   0.9242  
x3           0.315865   0.701093   0.451   0.6523  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 62.183  on 44  degrees of freedom
Residual deviance: 57.026  on 41  degrees of freedom
AIC: 65.026

Number of Fisher Scoring iterations: 4

#逐步筛选变量logistic回归模型
logit.step = step(logit)
summary(logit.step)

    Start:  AIC=65.03
y ~ x1 + x2 + x3

       Df Deviance    AIC
- x2    1   57.035 63.035
- x3    1   57.232 63.232
<none>      57.026 65.026
- x1    1   61.936 67.936

Step:  AIC=63.03
y ~ x1 + x3

       Df Deviance    AIC
- x3    1   57.241 61.241
<none>      57.035 63.035
- x1    1   61.991 65.991

Step:  AIC=61.24
y ~ x1

       Df Deviance    AIC
<none>      57.241 61.241
- x1    1   62.183 64.183

Call:
glm(formula = y ~ x1, family = binomial, data = d5.1)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4490  -0.8782  -0.8782   0.9282   1.5096  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)   0.6190     0.4688   1.320   0.1867  
x1           -1.3728     0.6353  -2.161   0.0307 *
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 62.183  on 44  degrees of freedom
Residual deviance: 57.241  on 43  degrees of freedom
AIC: 61.241

Number of Fisher Scoring iterations: 4

2.4 一般线性模型:完全随机设计模型

#试分析各机器生产有无显著性差异
d5.3 = read.table("clipboard", header = T)
anova(lm(Y~factor(A), data = d5.3)

Analysis of Variance Table

Response: Y
          Df   Sum Sq  Mean Sq F value   Pr(>F)    
factor(A)  2 0.122233 0.061117  40.534 8.94e-07 ***
Residuals 15 0.022617 0.001508                     
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

2.5 随机单位组设计模型

$y_{ij} = \mu +\alpha_i + \beta_j + e_{ij} 　　i=1,2,...,G　　　j=1,2,...,n$

d5.4 = read.table("clipboard", header = T);d5.4    #读取数据
anova(lm(Y~factor(A)+factor(B),data = d5.4))

Analysis of Variance Table

Response: Y
          Df Sum Sq Mean Sq F value Pr(>F)
factor(A)  3  15759    5253  0.4306 0.7387
factor(B)  2  22385   11192  0.9174 0.4491
Residuals  6  73198   12200

得到燃和推进器对火箭射程都无影响

2.6 关于多元线性回归模型的基本假定

问题:

多元线性回归模型有哪些基本假定？
为什么要求多元线性回归模型满足一些基本假设？
当这些假定不满足时对回归模型有何影响？

答:(手打…自己的理解…)
1.

解释变量 $X_i$ 是确定性变量, 不是随机变量;解释变量之间互不相关,无多重共线性
随机误差项具有0均差和同方差
随机误差项不存在序列相关关系
随机误差项与解释变量之间不相关
随机误差项服从0均值、同方差的正态分布

因为只有当满组这些条件时得到的多元线性回归模型才是合理存在的
3. 若这些假定不满足那么,我们建立这样的多元线性回归模型就不合理,无依据,得到的模型自然就不能恰当的用于拟合,建立模型也就没有意义

秒客网

广义与一般线性模型_基于R

文章目录

学习要求

1 数据的分类与模型选择

1.1 变量取值类型

1.2 案例

1.2.1 建立Poisson对数线性模型