为什么numpy std()给matlab std()一个不同的结果?

I try to convert matlab code to numpy and figured out that numpy has a different result with the std function.

我试着把matlab代码转换成numpy，并计算出numpy与std函数有不同的结果。

in matlab

在matlab

std([1,3,4,6])
ans =  2.0817

in numpy

在numpy

np.std([1,3,4,6])
1.8027756377319946

Is this normal? And how should I handle this?

这是正常的吗?我该怎么处理呢?

3 个解决方案

#1

121

The NumPy function np.std takes an optional parameter ddof: "Delta Degrees of Freedom". By default, this is 0. Set it to 1 to get the MATLAB result:

np NumPy函数。std采用可选参数ddof:“*度”。默认情况下，这是0。设为1，得到MATLAB结果:

>>> np.std([1,3,4,6], ddof=1)
2.0816659994661326

To add a little more context, in the calculation of the variance (of which the standard deviation is the square root) we typically divide by the number of values we have.

为了添加更多的上下文，在计算方差时(标准差是平方根)我们通常除以我们拥有的值的数量。

But if we select a random sample of N elements from a larger distribution and calculate the variance, division by N can lead to an underestimate of the actual variance. To fix this, we can lower the number we divide by (the degrees of freedom) to a number less than N (usually N-1). The ddof parameter allows us change the divisor by the amount we specify.

但如果我们从较大的分布中随机选取N个元素样本并计算方差，除以N会导致实际方差被低估。为了解决这个问题，我们可以把除以(*度)的数减少到小于N(通常是N-1)的数。ddof参数允许我们通过指定的数量来改变除数。

Unless told otherwise, NumPy will calculate the biased estimator for the variance (ddof=0, dividing by N). This is what you want if you are working with the entire distribution (and not a subset of values which have been randomly picked from a larger distribution). If the ddof parameter is given, NumPy divides by N - ddof instead.

除非另有说明，否则NumPy将计算方差的有偏估计量(ddof=0，除以N)，如果处理整个分布(而不是从较大的分布中随机选取的值的子集)，这就是您想要的结果。如果给出了ddof参数，则NumPy除以N - ddof。

The default behaviour of MATLAB's std is to correct the bias for sample variance by dividing by N-1. This gets rid of some of (but probably not all of) of the bias in the standard deviation. This is likely to be what you want if you're using the function on a random sample of a larger distribution.

MATLAB的std的默认行为是通过除以N-1来校正样本方差的偏差。这就消除了标准差的偏差。这很可能是你想要的如果你在一个更大的分布的随机样本中使用这个函数。

The nice answer by @hbaderts gives further mathematical details.

@hbaderts给出的漂亮答案提供了进一步的数学细节。

#2

The standard deviation is the square root of the variance. The variance of a random variable X is defined as