I have a problem in Stata. I have a survey dataset with sampling weights and stratification. Calculating population totals can be done very easily by first set up the survey design (sampling weights and strata) and then using the prefix svy: total. But unfortunately it is not possible to plot density functions using histogram since it ignores the survey design. Is there any way to plot density and distribution functions of population totals easily with survey data?
我在Stata有问题。我有一个抽样权重和分层的调查数据集。通过首先设置调查设计(采样权值和地层),然后使用前缀svy: total,可以很容易地计算人口总数。但不幸的是,不可能用直方图绘制密度函数,因为它忽略了调查设计。是否有办法可以方便地利用调查数据绘制人口密度和分布函数?
1 个解决方案
#1
0
The histogram, kdensity, and cumul commands all take frequency weights, which must be integers. The problem with sampling weights is that they can be non-integral. However you can create frequency weights that will be multiples of the probability weights and agree in precision to any desired accuracy. (The trick is due to Austin Nichols.) This will permit you do use the above commands. The only option unavailable will be the frequency option of histogram. The other functions are means, so are invariant to multiplication of the original weights.
直方图、kdensity和cumul命令都取频率权重,必须是整数。抽样权重的问题是它们可以是非积分的。但是,您可以创建频率权值,该频率权值将是概率权值的倍数,并在精度上与任何期望的精度一致。(这是奥斯丁·尼科尔斯(Austin Nichols)的拿手好戏。)这将允许您使用上述命令。唯一不可用的选项是直方图的频率选项。其他的函数是均值,所以对原权值的乘法是不变的。
In this example, I use the nhanes2b data set. It turns out that all the weights are integers, so the conversion trick isn't needed.
在这个例子中,我使用了nhanes2b数据集,结果发现所有的权值都是整数,所以不需要转换技巧。
webuse nhanes2b, clear
/* Create frequency weight which
agrees with original weight to any degree of accuracy */
local k = 2
/* Match sampling weights to k = 2 decimal places:
e.g. sampling weight = 3.212 -> fwt =321 */
gen fwt = round(10^(`k')*finalwgt,1)
/* 1. Histogram */
histogram height [fw = fwt], percent width(10)
/* 2. Probability Density */
kdensity height [fw = fwt], bwidth(2)
/* 3. CDF */
cumul height [fw = fwt], gen(cum)
sort height cum
scatter cum height, c(J) ms(i)
Edit: histogram with frequency option:
编辑:直方图与频率选项:
This uses the undocumented command twoway__histogram_gen
(http://www.stata-journal.com/sjpdf.html?articlenum=gr0014).
这使用了未文档化的命令twoway__histogram_gen (http://www.stata-journal.com/sjpdf.html?articlenum=gr0014)。
twoway__histogram_gen height [fw = fwt], ///
freq width(10) gen(h x, replace)
replace h = round( h/10^`k')/10^6
format h %5.0f
twoway bar h x, ///
bstyle(histogram) barwidth(10) ///
ytitle("Frequency (Millions)")
#1
0
The histogram, kdensity, and cumul commands all take frequency weights, which must be integers. The problem with sampling weights is that they can be non-integral. However you can create frequency weights that will be multiples of the probability weights and agree in precision to any desired accuracy. (The trick is due to Austin Nichols.) This will permit you do use the above commands. The only option unavailable will be the frequency option of histogram. The other functions are means, so are invariant to multiplication of the original weights.
直方图、kdensity和cumul命令都取频率权重,必须是整数。抽样权重的问题是它们可以是非积分的。但是,您可以创建频率权值,该频率权值将是概率权值的倍数,并在精度上与任何期望的精度一致。(这是奥斯丁·尼科尔斯(Austin Nichols)的拿手好戏。)这将允许您使用上述命令。唯一不可用的选项是直方图的频率选项。其他的函数是均值,所以对原权值的乘法是不变的。
In this example, I use the nhanes2b data set. It turns out that all the weights are integers, so the conversion trick isn't needed.
在这个例子中,我使用了nhanes2b数据集,结果发现所有的权值都是整数,所以不需要转换技巧。
webuse nhanes2b, clear
/* Create frequency weight which
agrees with original weight to any degree of accuracy */
local k = 2
/* Match sampling weights to k = 2 decimal places:
e.g. sampling weight = 3.212 -> fwt =321 */
gen fwt = round(10^(`k')*finalwgt,1)
/* 1. Histogram */
histogram height [fw = fwt], percent width(10)
/* 2. Probability Density */
kdensity height [fw = fwt], bwidth(2)
/* 3. CDF */
cumul height [fw = fwt], gen(cum)
sort height cum
scatter cum height, c(J) ms(i)
Edit: histogram with frequency option:
编辑:直方图与频率选项:
This uses the undocumented command twoway__histogram_gen
(http://www.stata-journal.com/sjpdf.html?articlenum=gr0014).
这使用了未文档化的命令twoway__histogram_gen (http://www.stata-journal.com/sjpdf.html?articlenum=gr0014)。
twoway__histogram_gen height [fw = fwt], ///
freq width(10) gen(h x, replace)
replace h = round( h/10^`k')/10^6
format h %5.0f
twoway bar h x, ///
bstyle(histogram) barwidth(10) ///
ytitle("Frequency (Millions)")