来自Wolfram和numpy的相同输入的不同标准差

时间:2022-10-27 21:22:43

I am currently working on reimplementing some algorithm written in Java in Python. One step is to calculate the standard deviation of a list of values. The original implementation uses DescriptiveStatistics.getStandardDeviation from the Apache Math 1.1 library for this. I use the standard deviation of numpy 1.5. The problem is, they give (very) different results for the same input. The sample I have is this:

我目前正在重新实现用Python编写的一些算法。一个步骤是计算值列表的标准差。最初的实现使用了描述性统计。getStandardDeviation from Apache Math 1.1库用于此目的。我用1。5的标准差。问题是,对于相同的输入,它们给出(非常)不同的结果。我的样本是这样的:

[0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842]

I get the following results:

我得到以下结果:

numpy           : 0.10932134388775223
Apache Math 1.1 : 0.12620366805397404
Wolfram Alpha   : 0.12620366805397404

I checked with Wolfram Alpha to get a third opinion. I do not think that such a difference can be explained by precision alone. Does anyone have any idea why this is happening, and what I could do about it?

我咨询了Wolfram Alpha以获得第三种意见。我认为这种差异不能仅用精确来解释。有人知道为什么会这样吗?我能做些什么?

Edit: Calculating it manually in Python gives the same result:

编辑:在Python中手工计算会得到相同的结果:

>>> from math import sqrt
>>> v = [0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842]
>>> mu = sum(v) / 4
>>> sqrt(sum([(x - mu)**2 for x in v]) / 4)
0.10932134388775223

Also, about not using it right:

同样,关于不正确使用它:

>>> from numpy import std
>>> std([0.113967640255, 0.223095775796, 0.283134228235, 0.416793887842])
0.10932134388775223

1 个解决方案

#1


23  

Apache and Wolfram divide by N-1 rather than N. This is a degrees of freedom adjustment, since you estimate μ. By dividing by N-1 you obtain an unbiased estimate of the population standard deviation. You can change NumPy's behavior using the ddof option.

Apache和Wolfram除以n - 1而不是n .这是一个*度的调整,因为你估计μ。除以N-1,得到总体标准差的无偏估计。您可以使用ddof选项更改NumPy的行为。

This is described in the NumPy documentation:

这是在NumPy文档中描述的:

The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.

平均平方偏差通常计算为x.sum() / N,其中N = len(x)。但是,如果指定了ddof,则使用除数N - ddof。在标准统计实践中,ddof=1提供了无限总体方差的无偏估计量。ddof=0提供了正态分布变量方差的最大似然估计。这个函数中计算的标准差是估计方差的平方根,因此即使ddof=1,也不是标准差本身的无偏估计。

#1


23  

Apache and Wolfram divide by N-1 rather than N. This is a degrees of freedom adjustment, since you estimate μ. By dividing by N-1 you obtain an unbiased estimate of the population standard deviation. You can change NumPy's behavior using the ddof option.

Apache和Wolfram除以n - 1而不是n .这是一个*度的调整,因为你估计μ。除以N-1,得到总体标准差的无偏估计。您可以使用ddof选项更改NumPy的行为。

This is described in the NumPy documentation:

这是在NumPy文档中描述的:

The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.

平均平方偏差通常计算为x.sum() / N,其中N = len(x)。但是,如果指定了ddof,则使用除数N - ddof。在标准统计实践中,ddof=1提供了无限总体方差的无偏估计量。ddof=0提供了正态分布变量方差的最大似然估计。这个函数中计算的标准差是估计方差的平方根,因此即使ddof=1,也不是标准差本身的无偏估计。