第十四周作业

时间:2021-02-02 21:58:56

作业取自https://nbviewer.jupyter.org/github/schmit/cme193-ipython-notebooks-lecture/blob/master/Exercises.ipynb

Part 1

For each of the four datasets...

  • Compute the mean and variance of both x and y
  • Compute the correlation coefficient between x and y
  • Compute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook

    panda库有求均值的函数mean(),方差var(),相关系数corr()。另外,线性回归用到了statsmodels库中的ols,最后用summary提取出相关数据。(方法由https://nbviewer.jupyter.org/github/schmit/cme193-ipython-notebooks-lecture/blob/master/3.%20Statsmodels.ipynb提供)

代码如下
anascombe = pd.read_csv('https://raw.githubusercontent.com/schmit/cme193-ipython-notebooks-lecture/master/data/anscombe.csv')

print("均值为:")
print(anascombe.groupby(['dataset']).mean())

print("\n方差为:")
print(anascombe.groupby(['dataset']).var())

print("\n相关系数为:")
print(anascombe.groupby(['dataset']).corr())

print("\n线性回归:")
for i in range(4):
    X = sm.add_constant(np.array(anascombe[i:i+11].x))
    Y = np.array(anascombe[i:i+11].y)
    res = sm.OLS(Y, X).fit()
    print(res.summary())
结果
 
均值为:
           x         y
dataset               
I        9.0  7.500909
II       9.0  7.500909
III      9.0  7.500000
IV       9.0  7.500909

方差为:
            x         y
dataset                
I        11.0  4.127269
II       11.0  4.127629
III      11.0  4.122620
IV       11.0  4.123249

相关系数为:
                  x         y
dataset                      
I       x  1.000000  0.816421
        y  0.816421  1.000000
II      x  1.000000  0.816237
        y  0.816237  1.000000
III     x  1.000000  0.816287
        y  0.816287  1.000000
IV      x  1.000000  0.816521
        y  0.816521  1.000000

线性回归:
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.667
Model:                            OLS   Adj. R-squared:                  0.629
Method:                 Least Squares   F-statistic:                     17.99
Date:                Sun, 10 Jun 2018   Prob (F-statistic):            0.00217
Time:                        22:36:49   Log-Likelihood:                -16.841
No. Observations:                  11   AIC:                             37.68
Df Residuals:                       9   BIC:                             38.48
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          3.0001      1.125      2.667      0.026       0.456       5.544
x1             0.5001      0.118      4.241      0.002       0.233       0.767
==============================================================================
Omnibus:                        0.082   Durbin-Watson:                   3.212
Prob(Omnibus):                  0.960   Jarque-Bera (JB):                0.289
Skew:                          -0.122   Prob(JB):                        0.865
Kurtosis:                       2.244   Cond. No.                         29.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.657
Model:                            OLS   Adj. R-squared:                  0.619
Method:                 Least Squares   F-statistic:                     17.24
Date:                Sun, 10 Jun 2018   Prob (F-statistic):            0.00247
Time:                        22:36:49   Log-Likelihood:                -17.291
No. Observations:                  11   AIC:                             38.58
Df Residuals:                       9   BIC:                             39.38
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          3.0101      1.172      2.569      0.030       0.359       5.661
x1             0.5101      0.123      4.153      0.002       0.232       0.788
==============================================================================
Omnibus:                        0.562   Durbin-Watson:                   3.011
Prob(Omnibus):                  0.755   Jarque-Bera (JB):                0.578
Skew:                          -0.304   Prob(JB):                        0.749
Kurtosis:                       2.057   Cond. No.                         29.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.633
Model:                            OLS   Adj. R-squared:                  0.593
Method:                 Least Squares   F-statistic:                     15.54
Date:                Sun, 10 Jun 2018   Prob (F-statistic):            0.00339
Time:                        22:36:49   Log-Likelihood:                -17.627
No. Observations:                  11   AIC:                             39.25
Df Residuals:                       9   BIC:                             40.05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          3.2156      1.208      2.662      0.026       0.483       5.948
x1             0.4993      0.127      3.943      0.003       0.213       0.786
==============================================================================
Omnibus:                        1.155   Durbin-Watson:                   2.623
Prob(Omnibus):                  0.561   Jarque-Bera (JB):                0.881
Skew:                          -0.467   Prob(JB):                        0.644
Kurtosis:                       1.975   Cond. No.                         29.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.729
Model:                            OLS   Adj. R-squared:                  0.699
Method:                 Least Squares   F-statistic:                     24.24
Date:                Sun, 10 Jun 2018   Prob (F-statistic):           0.000820
Time:                        22:36:49   Log-Likelihood:                -16.074
No. Observations:                  11   AIC:                             36.15
Df Residuals:                       9   BIC:                             36.94
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.9415      1.049      2.804      0.021       0.568       5.314
x1             0.5415      0.110      4.924      0.001       0.293       0.790
==============================================================================
Omnibus:                        1.370   Durbin-Watson:                   2.795
Prob(Omnibus):                  0.504   Jarque-Bera (JB):                0.835
Skew:                          -0.307   Prob(JB):                        0.659
Kurtosis:                       1.798   Cond. No.                         29.1
==============================================================================

Part 2

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

    在以前的联系里我已经介绍过seaborn,该库对matplotlib进行了二次封装,各种函数相比matplotlib更加简便,且画图效果更好。

代码如下
pic = sns.FacetGrid(anascombe, col='dataset')
pic = pic.map(plt.scatter, 'x', 'y')
结果

第十四周作业