I'm trying to use ryp2 to do a logistic regression. I managed to execute it, but don't know how to extract the coefficients and p-values from the result. I don't want to print the values on the screen bu create a function to use them independently.
我想用ryp2进行逻辑回归。我成功地执行了它,但是不知道如何从结果中提取系数和p值。我不想在屏幕上打印值,bu创建一个函数来独立使用它们。
import rpy2.robjects as ro
mydata = ro.r['data.frame']
read = ro.r['read.csv']
head = ro.r['head']
summary = ro.r['summary']
mydata = read("http://www.ats.ucla.edu/stat/data/binary.csv")
#cabecalho = head(mydata)
formula = 'admit ~ gre + gpa + rank'
mylogit = ro.r.glm(formula=ro.r(formula), data=mydata,family=ro.r('binomial(link="logit")'))
#What NEXT?
2 个解决方案
#1
2
I don't known how you can get the p-values, but for any others it should be something like this:
我不知道你怎么能得到p值,但是对于其他人,它应该是这样的:
In [24]:
#what is stored in mylogit?
mylogit.names
Out[24]:
<StrVector - Python:0x10a01a0e0 / R:0x10353ab20>
['coef..., 'resi..., 'fitt..., ..., 'meth..., 'cont..., 'xlev...]
In [25]:
#looks like the first item is the coefficients
mylogit.names[0]
Out[25]:
'coefficients'
In [26]:
#OK, let's get the the coefficients.
mylogit[0]
Out[26]:
<FloatVector - Python:0x10a01a5f0 / R:0x1028bcc80>
[-3.449548, 0.002294, 0.777014, -0.560031]
In [27]:
#be careful that the index from print is R index, starting with 1. I don't see p values here
print mylogit.names
[1] "coefficients" "residuals" "fitted.values"
[4] "effects" "R" "rank"
[7] "qr" "family" "linear.predictors"
[10] "deviance" "aic" "null.deviance"
[13] "iter" "weights" "prior.weights"
[16] "df.residual" "df.null" "y"
[19] "converged" "boundary" "model"
[22] "call" "formula" "terms"
[25] "data" "offset" "control"
[28] "method" "contrasts" "xlevels"
Edit
The P values for each terms:
每一项的P值为:
In [55]:
#p values:
list(summary(mylogit)[-6])[-4:]
Out[55]:
[0.0023265825120094407,
0.03564051883525258,
0.017659683902155117,
1.0581094283250368e-05]
And:
和:
In [56]:
#coefficients
list(summary(mylogit)[-6])[:4]
Out[56]:
[-3.449548397668471,
0.0022939595044433334,
0.7770135737198545,
-0.5600313868499897]
In [57]:
#S.E.
list(summary(mylogit)[-6])[4:8]
Out[57]:
[1.1328460085495897,
0.001091839095422917,
0.327483878497867,
0.12713698917130048]
In [58]:
#Z value
list(summary(mylogit)[-6])[8:12]
Out[58]:
[-3.0450285137032984,
2.1010050968680347,
2.3726773277632214,
-4.4049445444662885]
Or more generally:
或更一般的:
In [60]:
import numpy as np
In [62]:
COEF=np.array(summary(mylogit)[-6]) #it has a shape of (number_of_terms, 4)
In [63]:
COEF[:, -1] #p-value
Out[63]:
array([ 2.32658251e-03, 3.56405188e-02, 1.76596839e-02,
1.05810943e-05])
In [66]:
COEF[:, 0] #coefficients
Out[66]:
array([ -3.44954840e+00, 2.29395950e-03, 7.77013574e-01,
-5.60031387e-01])
In [68]:
COEF[:, 1] #S.E.
Out[68]:
array([ 1.13284601e+00, 1.09183910e-03, 3.27483878e-01,
1.27136989e-01])
In [69]:
COEF[:, 2] #Z
Out[69]:
array([-3.04502851, 2.1010051 , 2.37267733, -4.40494454])
You can also summary(mylogit).rx2('coefficient')
(or rx
), if you know that coefficient
is in the summary vector.
你也可以总结(mylogit).rx2('coefficient'))(或rx),如果你知道这个系数在摘要向量中。
#2
2
This isn't quite an answer to what you asked, but if your question is more generally "how to move a logistic regression to Python", why not use statsmodels?
这并不能很好地回答您的问题,但是如果您的问题更一般地是“如何将逻辑回归转移到Python”,为什么不使用statsmodels呢?
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
model = smf.glm('admit ~ gre + gpa + rank', df, family=sm.families.Binomial()).fit()
print model.summary()
This prints:
这个打印:
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: admit No. Observations: 400
Model: GLM Df Residuals: 396
Model Family: Binomial Df Model: 3
Link Function: logit Scale: 1.0
Method: IRLS Log-Likelihood: -229.72
Date: Sat, 29 Mar 2014 Deviance: 459.44
Time: 11:56:19 Pearson chi2: 399.
No. Iterations: 5
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept -3.4495 1.133 -3.045 0.002 -5.670 -1.229
gre 0.0023 0.001 2.101 0.036 0.000 0.004
gpa 0.7770 0.327 2.373 0.018 0.135 1.419
rank -0.5600 0.127 -4.405 0.000 -0.809 -0.311
==============================================================================
While there are still some statistical procedures that only have a good implementation in R, for straightforward things like linear models, it's probably a lot easier to use statsmodels than to fight with RPy2, since all of the introspection, built-in documentation, tab completion (in IPython), etc. will work directly on statsmodels objects.
虽然仍有一些统计程序,只有有一个好的实现R,为简单的线性模型,它可能更容易利用RPy2 statsmodels比战斗,因为所有的内省,内置文档,完成选项卡(在IPython),等将直接在statsmodels对象。
#1
2
I don't known how you can get the p-values, but for any others it should be something like this:
我不知道你怎么能得到p值,但是对于其他人,它应该是这样的:
In [24]:
#what is stored in mylogit?
mylogit.names
Out[24]:
<StrVector - Python:0x10a01a0e0 / R:0x10353ab20>
['coef..., 'resi..., 'fitt..., ..., 'meth..., 'cont..., 'xlev...]
In [25]:
#looks like the first item is the coefficients
mylogit.names[0]
Out[25]:
'coefficients'
In [26]:
#OK, let's get the the coefficients.
mylogit[0]
Out[26]:
<FloatVector - Python:0x10a01a5f0 / R:0x1028bcc80>
[-3.449548, 0.002294, 0.777014, -0.560031]
In [27]:
#be careful that the index from print is R index, starting with 1. I don't see p values here
print mylogit.names
[1] "coefficients" "residuals" "fitted.values"
[4] "effects" "R" "rank"
[7] "qr" "family" "linear.predictors"
[10] "deviance" "aic" "null.deviance"
[13] "iter" "weights" "prior.weights"
[16] "df.residual" "df.null" "y"
[19] "converged" "boundary" "model"
[22] "call" "formula" "terms"
[25] "data" "offset" "control"
[28] "method" "contrasts" "xlevels"
Edit
The P values for each terms:
每一项的P值为:
In [55]:
#p values:
list(summary(mylogit)[-6])[-4:]
Out[55]:
[0.0023265825120094407,
0.03564051883525258,
0.017659683902155117,
1.0581094283250368e-05]
And:
和:
In [56]:
#coefficients
list(summary(mylogit)[-6])[:4]
Out[56]:
[-3.449548397668471,
0.0022939595044433334,
0.7770135737198545,
-0.5600313868499897]
In [57]:
#S.E.
list(summary(mylogit)[-6])[4:8]
Out[57]:
[1.1328460085495897,
0.001091839095422917,
0.327483878497867,
0.12713698917130048]
In [58]:
#Z value
list(summary(mylogit)[-6])[8:12]
Out[58]:
[-3.0450285137032984,
2.1010050968680347,
2.3726773277632214,
-4.4049445444662885]
Or more generally:
或更一般的:
In [60]:
import numpy as np
In [62]:
COEF=np.array(summary(mylogit)[-6]) #it has a shape of (number_of_terms, 4)
In [63]:
COEF[:, -1] #p-value
Out[63]:
array([ 2.32658251e-03, 3.56405188e-02, 1.76596839e-02,
1.05810943e-05])
In [66]:
COEF[:, 0] #coefficients
Out[66]:
array([ -3.44954840e+00, 2.29395950e-03, 7.77013574e-01,
-5.60031387e-01])
In [68]:
COEF[:, 1] #S.E.
Out[68]:
array([ 1.13284601e+00, 1.09183910e-03, 3.27483878e-01,
1.27136989e-01])
In [69]:
COEF[:, 2] #Z
Out[69]:
array([-3.04502851, 2.1010051 , 2.37267733, -4.40494454])
You can also summary(mylogit).rx2('coefficient')
(or rx
), if you know that coefficient
is in the summary vector.
你也可以总结(mylogit).rx2('coefficient'))(或rx),如果你知道这个系数在摘要向量中。
#2
2
This isn't quite an answer to what you asked, but if your question is more generally "how to move a logistic regression to Python", why not use statsmodels?
这并不能很好地回答您的问题,但是如果您的问题更一般地是“如何将逻辑回归转移到Python”,为什么不使用statsmodels呢?
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
model = smf.glm('admit ~ gre + gpa + rank', df, family=sm.families.Binomial()).fit()
print model.summary()
This prints:
这个打印:
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: admit No. Observations: 400
Model: GLM Df Residuals: 396
Model Family: Binomial Df Model: 3
Link Function: logit Scale: 1.0
Method: IRLS Log-Likelihood: -229.72
Date: Sat, 29 Mar 2014 Deviance: 459.44
Time: 11:56:19 Pearson chi2: 399.
No. Iterations: 5
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept -3.4495 1.133 -3.045 0.002 -5.670 -1.229
gre 0.0023 0.001 2.101 0.036 0.000 0.004
gpa 0.7770 0.327 2.373 0.018 0.135 1.419
rank -0.5600 0.127 -4.405 0.000 -0.809 -0.311
==============================================================================
While there are still some statistical procedures that only have a good implementation in R, for straightforward things like linear models, it's probably a lot easier to use statsmodels than to fight with RPy2, since all of the introspection, built-in documentation, tab completion (in IPython), etc. will work directly on statsmodels objects.
虽然仍有一些统计程序,只有有一个好的实现R,为简单的线性模型,它可能更容易利用RPy2 statsmodels比战斗,因为所有的内省,内置文档,完成选项卡(在IPython),等将直接在statsmodels对象。