为什么numpy std()给matlab std()一个不同的结果?

时间:2022-06-09 12:09:36

I try to convert matlab code to numpy and figured out that numpy has a different result with the std function.

我试着把matlab代码转换成numpy,并计算出numpy与std函数有不同的结果。

in matlab

在matlab

std([1,3,4,6])
ans =  2.0817

in numpy

在numpy

np.std([1,3,4,6])
1.8027756377319946

Is this normal? And how should I handle this?

这是正常的吗?我该怎么处理呢?

3 个解决方案

#1


121  

The NumPy function np.std takes an optional parameter ddof: "Delta Degrees of Freedom". By default, this is 0. Set it to 1 to get the MATLAB result:

np NumPy函数。std采用可选参数ddof:“*度”。默认情况下,这是0。设为1,得到MATLAB结果:

>>> np.std([1,3,4,6], ddof=1)
2.0816659994661326

To add a little more context, in the calculation of the variance (of which the standard deviation is the square root) we typically divide by the number of values we have.

为了添加更多的上下文,在计算方差时(标准差是平方根)我们通常除以我们拥有的值的数量。

But if we select a random sample of N elements from a larger distribution and calculate the variance, division by N can lead to an underestimate of the actual variance. To fix this, we can lower the number we divide by (the degrees of freedom) to a number less than N (usually N-1). The ddof parameter allows us change the divisor by the amount we specify.

但如果我们从较大的分布中随机选取N个元素样本并计算方差,除以N会导致实际方差被低估。为了解决这个问题,我们可以把除以(*度)的数减少到小于N(通常是N-1)的数。ddof参数允许我们通过指定的数量来改变除数。

Unless told otherwise, NumPy will calculate the biased estimator for the variance (ddof=0, dividing by N). This is what you want if you are working with the entire distribution (and not a subset of values which have been randomly picked from a larger distribution). If the ddof parameter is given, NumPy divides by N - ddof instead.

除非另有说明,否则NumPy将计算方差的有偏估计量(ddof=0,除以N),如果处理整个分布(而不是从较大的分布中随机选取的值的子集),这就是您想要的结果。如果给出了ddof参数,则NumPy除以N - ddof。

The default behaviour of MATLAB's std is to correct the bias for sample variance by dividing by N-1. This gets rid of some of (but probably not all of) of the bias in the standard deviation. This is likely to be what you want if you're using the function on a random sample of a larger distribution.

MATLAB的std的默认行为是通过除以N-1来校正样本方差的偏差。这就消除了标准差的偏差。这很可能是你想要的如果你在一个更大的分布的随机样本中使用这个函数。

The nice answer by @hbaderts gives further mathematical details.

@hbaderts给出的漂亮答案提供了进一步的数学细节。

#2


54  

The standard deviation is the square root of the variance. The variance of a random variable X is defined as

标准差是方差的平方根。随机变量X的方差定义为

为什么numpy std()给matlab std()一个不同的结果?

An estimator for the variance would therefore be

因此,方差的估计值是

为什么numpy std()给matlab std()一个不同的结果?

where 为什么numpy std()给matlab std()一个不同的结果? denotes the sample mean. For randomly selected 为什么numpy std()给matlab std()一个不同的结果?, it can be shown that this estimator does not converge to the real variance, but to

其中表示样本均值。对于随机选取的,可以证明该估计量不收敛于实际方差,而是收敛于

为什么numpy std()给matlab std()一个不同的结果?

If you randomly select samples and estimate the sample mean and variance, you will have to use a corrected (unbiased) estimator

如果你随机选择样本并估计样本均值和方差,你将不得不使用一个修正的(无偏的)估计值

为什么numpy std()给matlab std()一个不同的结果?

which will converge to 为什么numpy std()给matlab std()一个不同的结果?. The correction term 为什么numpy std()给matlab std()一个不同的结果? is also called Bessel's correction.

它会收敛到。修正项也叫做贝塞尔修正。

Now by default, MATLABs std calculates the unbiased estimator with the correction term n-1. NumPy however (as @ajcr explained) calculates the biased estimator with no correction term by default. The parameter ddof allows to set any correction term n-ddof. By setting it to 1 you get the same result as in MATLAB.

现在,默认情况下,MATLABs std计算了校正项n-1的无偏估计量。然而NumPy(如@ajcr解释的)在默认情况下计算没有修正项的有偏估计量。参数ddof允许设置任何修正项n-ddof。通过将它设为1,你会得到与MATLAB相同的结果。

Similarly, MATLAB allows to add a second parameter w, which specifies the "weighing scheme". The default, w=0, results in the correction term n-1 (unbiased estimator), while for w=1, only n is used as correction term (biased estimator).

同样,MATLAB允许添加第二个参数w,它指定了“称重方案”。默认值w=0,结果为修正项n-1(无偏估计量),而对于w=1,仅用n作为修正项(有偏估计量)。

#3


1  

For people who aren't great with statistics, a simplistic guide is:

对于不擅长统计的人来说,一个简单的指南是:

  • Include ddof=1 if you're calculating np.std() for a sample taken from your full dataset.

    如果您正在计算从完整数据集中提取的示例的np.std(),则包含ddof=1。

  • Ensure ddof=0 if you're calculating np.std() for the full population

    如果您正在计算整个总体的np.std(),请确保ddof=0

The DDOF is included for samples in order to counterbalance bias that can occur in the numbers.

为了抵消数字中可能出现的偏差,样本中包含了DDOF。

#1


121  

The NumPy function np.std takes an optional parameter ddof: "Delta Degrees of Freedom". By default, this is 0. Set it to 1 to get the MATLAB result:

np NumPy函数。std采用可选参数ddof:“*度”。默认情况下,这是0。设为1,得到MATLAB结果:

>>> np.std([1,3,4,6], ddof=1)
2.0816659994661326

To add a little more context, in the calculation of the variance (of which the standard deviation is the square root) we typically divide by the number of values we have.

为了添加更多的上下文,在计算方差时(标准差是平方根)我们通常除以我们拥有的值的数量。

But if we select a random sample of N elements from a larger distribution and calculate the variance, division by N can lead to an underestimate of the actual variance. To fix this, we can lower the number we divide by (the degrees of freedom) to a number less than N (usually N-1). The ddof parameter allows us change the divisor by the amount we specify.

但如果我们从较大的分布中随机选取N个元素样本并计算方差,除以N会导致实际方差被低估。为了解决这个问题,我们可以把除以(*度)的数减少到小于N(通常是N-1)的数。ddof参数允许我们通过指定的数量来改变除数。

Unless told otherwise, NumPy will calculate the biased estimator for the variance (ddof=0, dividing by N). This is what you want if you are working with the entire distribution (and not a subset of values which have been randomly picked from a larger distribution). If the ddof parameter is given, NumPy divides by N - ddof instead.

除非另有说明,否则NumPy将计算方差的有偏估计量(ddof=0,除以N),如果处理整个分布(而不是从较大的分布中随机选取的值的子集),这就是您想要的结果。如果给出了ddof参数,则NumPy除以N - ddof。

The default behaviour of MATLAB's std is to correct the bias for sample variance by dividing by N-1. This gets rid of some of (but probably not all of) of the bias in the standard deviation. This is likely to be what you want if you're using the function on a random sample of a larger distribution.

MATLAB的std的默认行为是通过除以N-1来校正样本方差的偏差。这就消除了标准差的偏差。这很可能是你想要的如果你在一个更大的分布的随机样本中使用这个函数。

The nice answer by @hbaderts gives further mathematical details.

@hbaderts给出的漂亮答案提供了进一步的数学细节。

#2


54  

The standard deviation is the square root of the variance. The variance of a random variable X is defined as

标准差是方差的平方根。随机变量X的方差定义为

为什么numpy std()给matlab std()一个不同的结果?

An estimator for the variance would therefore be

因此,方差的估计值是

为什么numpy std()给matlab std()一个不同的结果?

where 为什么numpy std()给matlab std()一个不同的结果? denotes the sample mean. For randomly selected 为什么numpy std()给matlab std()一个不同的结果?, it can be shown that this estimator does not converge to the real variance, but to

其中表示样本均值。对于随机选取的,可以证明该估计量不收敛于实际方差,而是收敛于

为什么numpy std()给matlab std()一个不同的结果?

If you randomly select samples and estimate the sample mean and variance, you will have to use a corrected (unbiased) estimator

如果你随机选择样本并估计样本均值和方差,你将不得不使用一个修正的(无偏的)估计值

为什么numpy std()给matlab std()一个不同的结果?

which will converge to 为什么numpy std()给matlab std()一个不同的结果?. The correction term 为什么numpy std()给matlab std()一个不同的结果? is also called Bessel's correction.

它会收敛到。修正项也叫做贝塞尔修正。

Now by default, MATLABs std calculates the unbiased estimator with the correction term n-1. NumPy however (as @ajcr explained) calculates the biased estimator with no correction term by default. The parameter ddof allows to set any correction term n-ddof. By setting it to 1 you get the same result as in MATLAB.

现在,默认情况下,MATLABs std计算了校正项n-1的无偏估计量。然而NumPy(如@ajcr解释的)在默认情况下计算没有修正项的有偏估计量。参数ddof允许设置任何修正项n-ddof。通过将它设为1,你会得到与MATLAB相同的结果。

Similarly, MATLAB allows to add a second parameter w, which specifies the "weighing scheme". The default, w=0, results in the correction term n-1 (unbiased estimator), while for w=1, only n is used as correction term (biased estimator).

同样,MATLAB允许添加第二个参数w,它指定了“称重方案”。默认值w=0,结果为修正项n-1(无偏估计量),而对于w=1,仅用n作为修正项(有偏估计量)。

#3


1  

For people who aren't great with statistics, a simplistic guide is:

对于不擅长统计的人来说,一个简单的指南是:

  • Include ddof=1 if you're calculating np.std() for a sample taken from your full dataset.

    如果您正在计算从完整数据集中提取的示例的np.std(),则包含ddof=1。

  • Ensure ddof=0 if you're calculating np.std() for the full population

    如果您正在计算整个总体的np.std(),请确保ddof=0

The DDOF is included for samples in order to counterbalance bias that can occur in the numbers.

为了抵消数字中可能出现的偏差,样本中包含了DDOF。