当使用scipy.stats.linregress vs numpy.polyfit(deg = 1)时,为什么在线性回归时会得到不同的结果

时间:2021-11-21 21:22:38

I have a scatter plot which I want to fit a linear line of best fit to. The raw data is below:

我有一个散点图,我想要拟合一条最适合的线性线。原始数据如下:

x = [2,5,10,20,30,50]
y = [0.0013,0.0018,0.0067,0.0081,0.009,0.013]

When I use

我用的时候

numpy.polyfit(x,y,deg=1) 

and

scipy.stats.linregress(x,y)

I get different values for the slope and intercept. Why is this? I thought that maybe they are using slightly different algorithms, but the scipy version gives me a line that overestimates all of my data.

我得到斜率和截距的不同值。为什么是这样?我想也许他们使用稍微不同的算法,但是scipy版本给了我一条过高估计我所有数据的线。

Is the scipy function only for a specific application? Is there a way to reconcile this difference?

scipy功能仅适用于特定应用吗?有没有办法调和这种差异?

I would like to know when to use which and the applicability of each function.

我想知道何时使用哪个以及每个功能的适用性。

Thank you.

EDIT: Results in the form slope, intercept

编辑:结果形式斜率,截距

scipy: 0.000257290802691 0.00826916605228

numpy: 0.0002322   0.00212209

EDIT: Mistake was in a line of code that was changing my result for scipy. These functions do in fact give the same results to the level of accuracy I need.

编辑:错误是在一行代码中改变我的scipy结果。事实上,这些功能确实给出了我需要的准确度。

1 个解决方案

#1


1  

I don't know what's the issue with the code, here's what I get when I run your exact snippet:

我不知道代码有什么问题,这是我运行你的确切片段时得到的:

In [1]: x = [2,5,10,20,30,50,100,200,300]
   ...: y = [0.0013,0.0018,0.0067,0.0081,0.009,0.013,0.077,0.085,0.057]
   ...: print(numpy.polyfit(x,y,deg=1))
   ...: print(scipy.stats.linregress(x,y))
   ...:
[0.00025729 0.00826917]
LinregressResult(slope=0.0002572908026909962, intercept=0.00826916605228397, rvalue=0.7851975581052358, pvalue=0.012170749250986976, stderr=7.669358704600765e-05)

As you can see, I get:

如你所见,我得到:

       slope              intercept
numpy  0.00025729         0.00826917
scipy  0.0002572908026... 0.0082691660...

Which is identical besides rounding. Here are my library versions:

除了舍入外,这是相同的。这是我的库版本:

In [14]: numpy.__version__
Out[14]: '1.14.2'

In [15]: scipy.__version__
Out[15]: '1.0.1'

See if updating your libraries has any effect, otherwise update your example code to make sure you get the same issue as your actual code.

查看更新库是否有效,否则请更新示例代码以确保您遇到与实际代码相同的问题。

#1


1  

I don't know what's the issue with the code, here's what I get when I run your exact snippet:

我不知道代码有什么问题,这是我运行你的确切片段时得到的:

In [1]: x = [2,5,10,20,30,50,100,200,300]
   ...: y = [0.0013,0.0018,0.0067,0.0081,0.009,0.013,0.077,0.085,0.057]
   ...: print(numpy.polyfit(x,y,deg=1))
   ...: print(scipy.stats.linregress(x,y))
   ...:
[0.00025729 0.00826917]
LinregressResult(slope=0.0002572908026909962, intercept=0.00826916605228397, rvalue=0.7851975581052358, pvalue=0.012170749250986976, stderr=7.669358704600765e-05)

As you can see, I get:

如你所见,我得到:

       slope              intercept
numpy  0.00025729         0.00826917
scipy  0.0002572908026... 0.0082691660...

Which is identical besides rounding. Here are my library versions:

除了舍入外,这是相同的。这是我的库版本:

In [14]: numpy.__version__
Out[14]: '1.14.2'

In [15]: scipy.__version__
Out[15]: '1.0.1'

See if updating your libraries has any effect, otherwise update your example code to make sure you get the same issue as your actual code.

查看更新库是否有效,否则请更新示例代码以确保您遇到与实际代码相同的问题。