numpi .polyfit的误差是多少?

时间:2021-11-21 21:23:02

I want to use numpy.polyfit for physical calculations, therefore I need the magnitude of the error.

我想用numpy。polyfit用于物理计算,因此我需要误差的大小。

2 个解决方案

#1


21  

If you specify full=True in your call to polyfit, it will include extra information:

如果您在对polyfit的调用中指定full=True,它将包含额外的信息:

>>> x = np.arange(100)
>>> y = x**2 + 3*x + 5 + np.random.rand(100)
>>> np.polyfit(x, y, 2)
array([ 0.99995888,  3.00221219,  5.56776641])
>>> np.polyfit(x, y, 2, full=True)
(array([ 0.99995888,  3.00221219,  5.56776641]), # coefficients
 array([ 7.19260721]), # residuals
 3, # rank
 array([ 11.87708199,   3.5299267 ,   0.52876389]), # singular values
 2.2204460492503131e-14) # conditioning threshold

The residual value returned is the sum of the squares of the fit errors, not sure if this is what you are after:

返回的残差值是拟合误差的平方和,不确定这是否是你想要的:

>>> np.sum((np.polyval(np.polyfit(x, y, 2), x) - y)**2)
7.1926072073491056

In version 1.7 there is also a cov keyword that will return the covariance matrix for your coefficients, which you could use to calculate the uncertainty of the fit coefficients themselves.

在版本1.7中还有一个cov关键字,它会返回系数的协方差矩阵,你可以用它来计算拟合系数本身的不确定性。

#2


16  

As you can see in the documentation:

正如你在文件中看到的:

Returns
-------
p : ndarray, shape (M,) or (M, K)
    Polynomial coefficients, highest power first.
    If `y` was 2-D, the coefficients for `k`-th data set are in ``p[:,k]``.

residuals, rank, singular_values, rcond : present only if `full` = True
    Residuals of the least-squares fit, the effective rank of the scaled
    Vandermonde coefficient matrix, its singular values, and the specified
    value of `rcond`. For more details, see `linalg.lstsq`.

Which means that if you can do a fit and get the residuals as:

也就是说,如果你能做一个拟合,得到剩余值为:

 import numpy as np
 x = np.arange(10)
 y = x**2 -3*x + np.random.random(10)

 p, res, _, _, _ = numpy.polyfit(x, y, deg, full=True)

Then, the p are your fit parameters, and the res will be the residuals, as described above. The _'s are because you don't need to save the last three parameters, so you can just save them in the variable _ which you won't use. This is a convention and is not required.

然后,p是你的拟合参数,res将是剩余值,如上所述。因为你不需要保存最后三个参数,所以你可以把它们保存在变量_中,你不会用到它。这是一种约定,不是必需的。


@Jaime's answer explains what the residual means. Another thing you can do is look at those squared deviations as a function (the sum of which is res). This is particularly helpful to see a trend that didn't fit sufficiently. res can be large because of statistical noise, or possibly systematic poor fitting, for example:

杰米的回答解释了剩余的含义。你可以做的另一件事是把这些平方偏差看成一个函数(它的和是res)。这对于看到一个不合适的趋势特别有帮助。由于统计噪声,或者可能是系统拟合不佳,res可能很大,例如:

x = np.arange(100)
y = 1000*np.sqrt(x) + x**2 - 10*x + 500*np.random.random(100) - 250

p = np.polyfit(x,y,2) # insufficient degree to include sqrt

yfit = np.polyval(p,x)

figure()
plot(x,y, label='data')
plot(x,yfit, label='fit')
plot(x,yfit-y, label='var')

So in the figure, note the bad fit near x = 0:
numpi .polyfit的误差是多少?

所以在图中,注意x = 0附近的差值:

#1


21  

If you specify full=True in your call to polyfit, it will include extra information:

如果您在对polyfit的调用中指定full=True,它将包含额外的信息:

>>> x = np.arange(100)
>>> y = x**2 + 3*x + 5 + np.random.rand(100)
>>> np.polyfit(x, y, 2)
array([ 0.99995888,  3.00221219,  5.56776641])
>>> np.polyfit(x, y, 2, full=True)
(array([ 0.99995888,  3.00221219,  5.56776641]), # coefficients
 array([ 7.19260721]), # residuals
 3, # rank
 array([ 11.87708199,   3.5299267 ,   0.52876389]), # singular values
 2.2204460492503131e-14) # conditioning threshold

The residual value returned is the sum of the squares of the fit errors, not sure if this is what you are after:

返回的残差值是拟合误差的平方和,不确定这是否是你想要的:

>>> np.sum((np.polyval(np.polyfit(x, y, 2), x) - y)**2)
7.1926072073491056

In version 1.7 there is also a cov keyword that will return the covariance matrix for your coefficients, which you could use to calculate the uncertainty of the fit coefficients themselves.

在版本1.7中还有一个cov关键字,它会返回系数的协方差矩阵,你可以用它来计算拟合系数本身的不确定性。

#2


16  

As you can see in the documentation:

正如你在文件中看到的:

Returns
-------
p : ndarray, shape (M,) or (M, K)
    Polynomial coefficients, highest power first.
    If `y` was 2-D, the coefficients for `k`-th data set are in ``p[:,k]``.

residuals, rank, singular_values, rcond : present only if `full` = True
    Residuals of the least-squares fit, the effective rank of the scaled
    Vandermonde coefficient matrix, its singular values, and the specified
    value of `rcond`. For more details, see `linalg.lstsq`.

Which means that if you can do a fit and get the residuals as:

也就是说,如果你能做一个拟合,得到剩余值为:

 import numpy as np
 x = np.arange(10)
 y = x**2 -3*x + np.random.random(10)

 p, res, _, _, _ = numpy.polyfit(x, y, deg, full=True)

Then, the p are your fit parameters, and the res will be the residuals, as described above. The _'s are because you don't need to save the last three parameters, so you can just save them in the variable _ which you won't use. This is a convention and is not required.

然后,p是你的拟合参数,res将是剩余值,如上所述。因为你不需要保存最后三个参数,所以你可以把它们保存在变量_中,你不会用到它。这是一种约定,不是必需的。


@Jaime's answer explains what the residual means. Another thing you can do is look at those squared deviations as a function (the sum of which is res). This is particularly helpful to see a trend that didn't fit sufficiently. res can be large because of statistical noise, or possibly systematic poor fitting, for example:

杰米的回答解释了剩余的含义。你可以做的另一件事是把这些平方偏差看成一个函数(它的和是res)。这对于看到一个不合适的趋势特别有帮助。由于统计噪声,或者可能是系统拟合不佳,res可能很大,例如:

x = np.arange(100)
y = 1000*np.sqrt(x) + x**2 - 10*x + 500*np.random.random(100) - 250

p = np.polyfit(x,y,2) # insufficient degree to include sqrt

yfit = np.polyval(p,x)

figure()
plot(x,y, label='data')
plot(x,yfit, label='fit')
plot(x,yfit-y, label='var')

So in the figure, note the bad fit near x = 0:
numpi .polyfit的误差是多少?

所以在图中,注意x = 0附近的差值: