Hmisc wtd.var和SAS proc之间的差异意味着生成加权方差

时间:2022-03-31 21:30:37

I'm getting different results from R and SAS when I try to calculate a weighted variance. Does anyone know what might be causing this difference?

当我尝试计算加权方差时,我得到了R和SAS的不同结果。有谁知道可能导致这种差异的原因是什么?

I create vectors of weights and values and I then calculate the weighted variance using the Hmisc library wtd.var function:

我创建权重和值的向量,然后使用Hmisc库wtd.var函数计算加权方差:

library(Hmisc)
wt <- c(5,  5,  4,  1)
x <- c(3.7,3.3,3.5,2.8)
wtd.var(x,weights=wt)

I get an answer of:

我得到了答案:

[1] 0.0612381

But if I try to reproduce these results in SAS I get a quite different result:

但如果我尝试在SAS中重现这些结果,我会得到一个完全不同的结果:

data test;
  input wt x;
cards;
5 3.7
5 3.3
4 3.5
1 2.8
;
run;
proc means data=test var;
var x;
weight wt;
run;

Results in an answer of

结果在答案中

0.2857778

1 个解决方案

#1


1  

You probably have a difference in how the variance is calculated. SAS gives you an option, VARDEF, which may help here.

您可能在如何计算方差方面存在差异。 SAS为您提供了一个选项,VARDEF,这可能会有所帮助。

proc means data=test var vardef=WDF;
var x;
weight wt;
run;

That on your dataset gives a variance similar to r. Both are 'right', depending on how you choose to calculate the weighted variance. (At my shop we calculate it a third way, of course...)

您的数据集上的那个给出与r类似的方差。两者都是“正确的”,具体取决于您选择如何计算加权方差。 (在我的商店,我们计算第三种方式,当然......)

Complete text from PROC MEANS documentation:

PROC MEANS文档中的完整文本:

VARDEF=divisor specifies the divisor to use in the calculation of the variance and standard deviation. The following table shows the possible values for divisor and associated divisors.

VARDEF =除数指定用于计算方差和标准差的除数。下表显示除数和相关除数的可能值。

Possible Values for VARDEF=
Value            Divisor                     Formula for Divisor
DF               degrees of freedom          n - 1
N                number of observations      n
WDF              sum of weights minus one    ([Sigma]iwi) - 1
WEIGHT | WGT     sum of weights              [Sigma]iwi

The procedure computes the variance as CSS/Divisor, where CSS is the corrected sums of squares and equals Sum((Xi-Xbar)^2). When you weight the analysis variables, CSS equals sum(Wi*(Xi-Xwbar)^2), where Xwbar is the weighted mean.

该过程将方差计算为CSS / Divisor,其中CSS是校正的平方和和等于Sum((Xi-Xbar)^ 2)。当您对分析变量进行加权时,CSS等于sum(Wi *(Xi-Xwbar)^ 2),其中Xwbar是加权平均值。

Default: DF Requirement: To compute the standard error of the mean, confidence limits for the mean, or the Student's t-test, use the default value of VARDEF=.

默认值:DF要求:要计算平均值的标准误差,均值的置信度限制或学生t检验,请使用默认值VARDEF =。

Tip: When you use the WEIGHT statement and VARDEF=DF, the variance is an estimate of Sigma^2, where the variance of the ith observation is Sigma^2/wi and wi is the weight for the ith observation. This method yields an estimate of the variance of an observation with unit weight.

提示:当您使用WEIGHT语句和VARDEF = DF时,方差是Sigma ^ 2的估计值,其中第i个观察值的方差是Sigma ^ 2 / wi,wi是第i个观察值的权重。该方法产生对单位重量的观测方差的估计。

Tip: When you use the WEIGHT statement and VARDEF=WGT, the computed variance is asymptotically (for large n) an estimate of Sigma^2/wbar, where wbar is the average weight. This method yields an asymptotic estimate of the variance of an observation with average weight.

提示:当您使用WEIGHT语句和VARDEF = WGT时,计算的方差渐近(对于大n)估计的Sigma ^ 2 / wbar,其中wbar是平均权重。该方法产生具有平均权重的观察方差的渐近估计。

#1


1  

You probably have a difference in how the variance is calculated. SAS gives you an option, VARDEF, which may help here.

您可能在如何计算方差方面存在差异。 SAS为您提供了一个选项,VARDEF,这可能会有所帮助。

proc means data=test var vardef=WDF;
var x;
weight wt;
run;

That on your dataset gives a variance similar to r. Both are 'right', depending on how you choose to calculate the weighted variance. (At my shop we calculate it a third way, of course...)

您的数据集上的那个给出与r类似的方差。两者都是“正确的”,具体取决于您选择如何计算加权方差。 (在我的商店,我们计算第三种方式,当然......)

Complete text from PROC MEANS documentation:

PROC MEANS文档中的完整文本:

VARDEF=divisor specifies the divisor to use in the calculation of the variance and standard deviation. The following table shows the possible values for divisor and associated divisors.

VARDEF =除数指定用于计算方差和标准差的除数。下表显示除数和相关除数的可能值。

Possible Values for VARDEF=
Value            Divisor                     Formula for Divisor
DF               degrees of freedom          n - 1
N                number of observations      n
WDF              sum of weights minus one    ([Sigma]iwi) - 1
WEIGHT | WGT     sum of weights              [Sigma]iwi

The procedure computes the variance as CSS/Divisor, where CSS is the corrected sums of squares and equals Sum((Xi-Xbar)^2). When you weight the analysis variables, CSS equals sum(Wi*(Xi-Xwbar)^2), where Xwbar is the weighted mean.

该过程将方差计算为CSS / Divisor,其中CSS是校正的平方和和等于Sum((Xi-Xbar)^ 2)。当您对分析变量进行加权时,CSS等于sum(Wi *(Xi-Xwbar)^ 2),其中Xwbar是加权平均值。

Default: DF Requirement: To compute the standard error of the mean, confidence limits for the mean, or the Student's t-test, use the default value of VARDEF=.

默认值:DF要求:要计算平均值的标准误差,均值的置信度限制或学生t检验,请使用默认值VARDEF =。

Tip: When you use the WEIGHT statement and VARDEF=DF, the variance is an estimate of Sigma^2, where the variance of the ith observation is Sigma^2/wi and wi is the weight for the ith observation. This method yields an estimate of the variance of an observation with unit weight.

提示:当您使用WEIGHT语句和VARDEF = DF时,方差是Sigma ^ 2的估计值,其中第i个观察值的方差是Sigma ^ 2 / wi,wi是第i个观察值的权重。该方法产生对单位重量的观测方差的估计。

Tip: When you use the WEIGHT statement and VARDEF=WGT, the computed variance is asymptotically (for large n) an estimate of Sigma^2/wbar, where wbar is the average weight. This method yields an asymptotic estimate of the variance of an observation with average weight.

提示:当您使用WEIGHT语句和VARDEF = WGT时,计算的方差渐近(对于大n)估计的Sigma ^ 2 / wbar,其中wbar是平均权重。该方法产生具有平均权重的观察方差的渐近估计。