numpy。多重拟合给出了有用的拟合,但有无限的协方差矩阵。

时间:2021-11-21 21:23:08

I am trying to fit a polynomial to a set of data. Sometimes it may happen that the covariance matrix returned by numpy.ployfit only consists of inf, although the fit seems to be useful. There are no numpy.inf or 'numpy.nan' in the data!

我正试图将一个多项式拟合到一组数据中。有时可能会发生协方差矩阵由numpy返回。ployfit只由inf组成,尽管fit看起来很有用。没有numpy。正或“numpy。南”的数据!

Example:

例子:

import numpy as np
# sample data, does not contain really x**2-like behaviour, 
# but that should be visible in the fit results
x = [-449., -454., -459., -464., -469.]
y = [ 0.9677024,   0.97341953,  0.97724978,  0.98215678,  0.9876293]

fit, cov = np.polyfit(x, y, 2, cov=True)

print 'fit: ', fit
print 'cov: ', cov

Result:

结果:

fit: [  1.67867158e-06   5.69199547e-04   8.85146009e-01]
cov: [[ inf  inf  inf]
      [ inf  inf  inf]
      [ inf  inf  inf]]

np.cov(x,y) gives

np.cov(x,y)

[[  6.25000000e+01  -6.07388099e-02]
 [ -6.07388099e-02   5.92268942e-05]]

So np.cov is not the same as the covariance returned from np.polyfit. Has anybody an idea what's going on?

所以np。cov与从np.polyfit返回的协方差不一样。有人知道发生了什么吗?

EDIT: I now got the point that numpy.cov is not what I want. I need the variances of the polynom coefficients, but I dont get them if (len(x) - order - 2.0) == 0. Is there another way to get the variances of the fit polynom coefficients?

编辑:我现在的意思是,numpy。cov不是我想要的。我需要的是多名系数的方差,但如果(len(x) - order - 2.0) == 0,我就不会得到它们。是否有另一种方法来获得适合的多子系数的方差?

2 个解决方案

#1


3  

As rustil's answer says, this is caused by the bias correction applied to the denominator of the covariance equation, which results in a zero divide for this input. The reasoning behind this correction is similar to that behind Bessel's Correction. This is really a sign that there are too few datapoints to estimate covariance in a well-defined way.

正如rustil的回答所言,这是由对协方差方程的分母的偏置修正所引起的,这导致了这个输入的零分。这一修正背后的原因与贝塞尔的修正如出一辙。这实际上是一个迹象,表明有太少的数据点以一种定义良好的方式来估计协方差。

How to skirt this problem? Well, this version of polyfit accepts weights. You could add another datapoint but weight it at epsilon. This is equivalent to reducing the 2.0 in this formula to a 1.0.

如何避开这个问题?这个版本的polyfit接受重量。你可以添加另一个数据点,但要在。这相当于将这个公式中的2.0降低到1.0。

x = [-449., -454., -459., -464., -469.]
y = [ 0.9677024,   0.97341953,  0.97724978,  0.98215678,  0.9876293]

x_extra = x + x[-1:]
y_extra = y + y[-1:]
weights = [1.0, 1.0, 1.0, 1.0, 1.0, sys.float_info.epsilon]

fit, cov = np.polyfit(x, y, 2, cov=True)
fit_extra, cov_extra = np.polyfit(x_extra, y_extra, 2, w=weights, cov=True)

print fit == fit_extra
print cov_extra

The output. Note that the fit values are identical:

输出。请注意,fit值是相同的:

>>> print fit == fit_extra
[ True  True  True]
>>> print cov_extra
[[  8.84481850e-11   8.11954338e-08   1.86299297e-05]
 [  8.11954338e-08   7.45405039e-05   1.71036963e-02]
 [  1.86299297e-05   1.71036963e-02   3.92469307e+00]]

I am very uncertain that this will be especially meaningful, but it's a way to work around the problem. It's a bit of a kludge though. For something more robust, you could modify polyfit to accept its own ddof parameter, perhaps in lieu of the boolean that cov currently accepts. (I just opened an issue to suggest as much.)

我很不确定这是否会特别有意义,但这是解决这个问题的方法。不过它有点蹩脚。对于更健壮的东西,您可以修改polyfit来接受它自己的ddof参数,这可能代替了cov当前接受的布尔值。(我只是提出了一个建议。)

A quick final note about the calculation of cov: If you look at the wikipedia page on least squares regression, you'll see that the simplified formula for the covariance of the coefficients is inv(dot(dot(X, W), X)), which has a corresponding line in the numpy code -- at least roughly speaking. In this case, X is the Vandermonde matrix, and the weights have already been multiplied in. The numpy code also does some scaling (which I understand; it's part of a strategy to minimize numerical error) and multiplies the result by the norm of the residuals (which I don't understand; I can only guess that it's part of another version of the covariance formula).

关于cov计算的一个快速的最后说明:如果你看一下*上关于最小二乘回归的页面,你会发现系数的协方差的简化公式是inv(点(点(X, W), X)),它在numpy代码中有一个对应的行——至少粗略地说。在这种情况下,X是Vandermonde矩阵,权重已经相乘了。numpy代码也进行了一些扩展(我理解;这是一个最小化数值误差的策略的一部分,并将结果乘以残差的平均值(我不理解;我只能猜测它是协方差公式的另一个版本的一部分。

#2


2  


the difference should be in the degree of freedom. In the polyfit method it already takes into account that your degree is 2, thus causing:

不同之处在于*的程度。在polyfit方法中,它已经考虑到你的学位是2,从而导致:

RuntimeWarning: divide by zero encountered in true_divide
  fac = resids / (len(x) - order - 2.0)

you can pass your np.cov a ddof= keyword (ddof = delta degrees of freedom) and you'll run into the same problem

你可以通过np。cov a ddof=关键字(ddof =*度),你会遇到同样的问题。

#1


3  

As rustil's answer says, this is caused by the bias correction applied to the denominator of the covariance equation, which results in a zero divide for this input. The reasoning behind this correction is similar to that behind Bessel's Correction. This is really a sign that there are too few datapoints to estimate covariance in a well-defined way.

正如rustil的回答所言,这是由对协方差方程的分母的偏置修正所引起的,这导致了这个输入的零分。这一修正背后的原因与贝塞尔的修正如出一辙。这实际上是一个迹象,表明有太少的数据点以一种定义良好的方式来估计协方差。

How to skirt this problem? Well, this version of polyfit accepts weights. You could add another datapoint but weight it at epsilon. This is equivalent to reducing the 2.0 in this formula to a 1.0.

如何避开这个问题?这个版本的polyfit接受重量。你可以添加另一个数据点,但要在。这相当于将这个公式中的2.0降低到1.0。

x = [-449., -454., -459., -464., -469.]
y = [ 0.9677024,   0.97341953,  0.97724978,  0.98215678,  0.9876293]

x_extra = x + x[-1:]
y_extra = y + y[-1:]
weights = [1.0, 1.0, 1.0, 1.0, 1.0, sys.float_info.epsilon]

fit, cov = np.polyfit(x, y, 2, cov=True)
fit_extra, cov_extra = np.polyfit(x_extra, y_extra, 2, w=weights, cov=True)

print fit == fit_extra
print cov_extra

The output. Note that the fit values are identical:

输出。请注意,fit值是相同的:

>>> print fit == fit_extra
[ True  True  True]
>>> print cov_extra
[[  8.84481850e-11   8.11954338e-08   1.86299297e-05]
 [  8.11954338e-08   7.45405039e-05   1.71036963e-02]
 [  1.86299297e-05   1.71036963e-02   3.92469307e+00]]

I am very uncertain that this will be especially meaningful, but it's a way to work around the problem. It's a bit of a kludge though. For something more robust, you could modify polyfit to accept its own ddof parameter, perhaps in lieu of the boolean that cov currently accepts. (I just opened an issue to suggest as much.)

我很不确定这是否会特别有意义,但这是解决这个问题的方法。不过它有点蹩脚。对于更健壮的东西,您可以修改polyfit来接受它自己的ddof参数,这可能代替了cov当前接受的布尔值。(我只是提出了一个建议。)

A quick final note about the calculation of cov: If you look at the wikipedia page on least squares regression, you'll see that the simplified formula for the covariance of the coefficients is inv(dot(dot(X, W), X)), which has a corresponding line in the numpy code -- at least roughly speaking. In this case, X is the Vandermonde matrix, and the weights have already been multiplied in. The numpy code also does some scaling (which I understand; it's part of a strategy to minimize numerical error) and multiplies the result by the norm of the residuals (which I don't understand; I can only guess that it's part of another version of the covariance formula).

关于cov计算的一个快速的最后说明:如果你看一下*上关于最小二乘回归的页面,你会发现系数的协方差的简化公式是inv(点(点(X, W), X)),它在numpy代码中有一个对应的行——至少粗略地说。在这种情况下,X是Vandermonde矩阵,权重已经相乘了。numpy代码也进行了一些扩展(我理解;这是一个最小化数值误差的策略的一部分,并将结果乘以残差的平均值(我不理解;我只能猜测它是协方差公式的另一个版本的一部分。

#2


2  


the difference should be in the degree of freedom. In the polyfit method it already takes into account that your degree is 2, thus causing:

不同之处在于*的程度。在polyfit方法中,它已经考虑到你的学位是2,从而导致:

RuntimeWarning: divide by zero encountered in true_divide
  fac = resids / (len(x) - order - 2.0)

you can pass your np.cov a ddof= keyword (ddof = delta degrees of freedom) and you'll run into the same problem

你可以通过np。cov a ddof=关键字(ddof =*度),你会遇到同样的问题。