【R统计】主成分分析2——主成分回归

时间:2021-01-03 17:01:39

习题: 

对某地区的某消费品的销售量Y进行调查,它与下面四个变量有关:x1居民可支配收入,x2该类消费品平均价格指数,x3社会该消费品保有量,x4其他消费品平均价格指数,历史资料如下表所示。试用主成分回归方法建立销售量Y与其他四个变量x1,x2, x3 和 x4的回归方程。

数据资料data.txt:

	x1	x2	x3	x4	y
1	82.9	92	17.1	94	8.4
2	88.0	93	21.3	96	9.6
3	99.9	96	25.1	97	10.4
4	105.3	94	29.0	97	11.4
5	117.7	100	34.0	100	12.2
6	131.0	101	40.0	101	14.2
7	148.2	105	44.0	104	15.8
8	161.8	112	49.0	109	17.9
9	174.2	112	51.0	111	19.6
10	184.7	112	53.0	111	20.8

脚本

#270
#230

conomy <- read.table("data.txt");

#### 作线性回归
lm.sol<-lm(y~x1+x2+x3, data=conomy);
summary(lm.sol);
Call:
lm(formula = y ~ x1 + x2 + x3, data = conomy);
# Residuals:
     # Min       1Q   Median       3Q      Max 
# -0.44365 -0.20719  0.04925  0.18879  0.47673 

# Coefficients:
            # Estimate Std. Error t value Pr(>|t|)   
# (Intercept)  0.23574    5.39534   0.044  0.96657   
# x1           0.14167    0.02587   5.477  0.00155 **
# x2          -0.02763    0.07265  -0.380  0.71685   
# x3          -0.04743    0.05903  -0.803  0.45235   
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Residual standard error: 0.349 on 6 degrees of freedom
# Multiple R-squared:  0.9957,    Adjusted R-squared:  0.9935 
# F-statistic: 462.5 on 3 and 6 DF,  p-value: 1.744e-07


#### 作主成分分析
# conomy.pr<-princomp(~x1+x2+x3, data=conomy, cor=T);
# summary(conomy.pr, loadings=TRUE);
# Importance of components:
                         # Comp.1     Comp.2      Comp.3
# Standard deviation     1.720206 0.17628306 0.099081994
# Proportion of Variance 0.986369 0.01035857 0.003272414
# Cumulative Proportion  0.986369 0.99672759 1.000000000

# Loadings:
   # Comp.1 Comp.2 Comp.3
# x1  0.579  0.180  0.795
# x2  0.576 -0.781 -0.243
# x3  0.577  0.598 -0.556

#### 预测测样本主成分, 并作主成分分析
pre<-predict(conomy.pr);
conomy$z1<-pre[,1];
conomy$z2<-pre[,2];
lm.sol<-lm(y~z1+z2, data=conomy);
# summary(lm.sol);
# Call:
# lm(formula = y ~ z1 + z2, data = conomy)
# Residuals:
     # Min       1Q   Median       3Q      Max 
# -0.79867 -0.45194  0.06536  0.36712  0.83831 

# Coefficients:
            # Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  14.0300     0.1897  73.972 2.17e-11 ***
# z1            2.3763     0.1103  21.552 1.17e-07 ***
# z2            0.6977     1.0759   0.648    0.537    
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Residual standard error: 0.5998 on 7 degrees of freedom
# Multiple R-squared:  0.9852,    Adjusted R-squared:  0.9809 
# F-statistic: 232.5 on 2 and 7 DF,  p-value: 3.975e-07

#### 作变换, 得到原坐标下的关系表达式
beta<-coef(lm.sol); A<-loadings(conomy.pr);
x.bar<-conomy.pr$center; x.sd<-conomy.pr$scale;
coef<-(beta[2]*A[,1]+ beta[3]*A[,2])/x.sd;
beta0 <- beta[1]- sum(x.bar * coef);
c(beta0, coef);
# (Intercept)          x1          x2          x3 
# -7.75109994  0.04347167  0.10678004  0.14573976 

### 结论:y=-7.75109994+0.04347167x1+ 0.10678004x2+0.14573976x3

 

文源代码和习题均来自于教材《统计建模与R软件》(ISBN:9787302143666,作者:薛毅)。