如何检查是否有遗漏的观察

时间:2023-01-24 22:52:02

I know a way of finding and identifying missing values for a particular variable.

我知道一种查找和确定特定变量缺失值的方法。

For the variable avedmajor, I could do

对于变量avedmajor,我可以这样做

tab avedmajor, m

Then,

然后,

gen avedmajormissing=0

replace avedmajormissing=1 if avedmajor==.

But how to see if my dataset has missing values in any of the variables without going through each one of them?

但是,如何查看我的数据集是否在没有遍历每一个变量的情况下丢失了任何一个变量的值呢?

Thanks.

谢谢。

4 个解决方案

#1


3  

One command is:

一个命令是:

misstable summarize

But see also:

还看到:

help missing##useful

and more generally:

和更普遍的:

help missing

#2


2  

I'd add mdesc command to proposed solutions. According to description mdesc:

我将在建议的解决方案中添加mdesc命令。根据描述mdesc:

Produces a table with the number of missing values, total number of cases, and percent missing for each variable in varlist. mdesc works with both numeric and character variables.

生成一个表,其中包含varlist中每个变量的缺失值、总案例数和缺失百分比。mdesc同时处理数字和字符变量。

So advantage to misstable solution is that it works with both numeric and string variables in one go.

所以misstable解决方案的优点是它可以一次同时处理数值变量和字符串变量。

sysuse auto
mdesc

Gives a nice overview of missings:

对缺失做一个很好的概述:

    Variable    |     Missing          Total     Percent Missing
----------------+-----------------------------------------------
           make |           0             74           0.00
          price |           0             74           0.00
            mpg |           0             74           0.00
          rep78 |           5             74           6.76
       headroom |           0             74           0.00
          trunk |           0             74           0.00
         weight |           0             74           0.00
         length |           0             74           0.00
           turn |           0             74           0.00
   displacement |           0             74           0.00
     gear_ratio |           0             74           0.00
        foreign |           0             74           0.00
----------------+-----------------------------------------------

#3


1  

Various commands help. See e.g. codebook. For one user-written command, install nmissing.

各种命令的帮助。看到如电报密码本。对于一个用户编写的命令,安装nmissing。

. search nmissing, historical

Search of official help files, FAQs, Examples, SJs, and STBs

FAQ     . . . . . .  Can I quickly see how many missing values a variable has?
    . . . . . . . . . . . . . . . . . .  UCLA Academic Technology Services
    7/08    http://www.ats.ucla.edu/stat/stata/faq/nmissing.htm

Example . . . . . . . . . . . . . . . . . . . . Useful non-UCLA Stata programs
    . . . . . . . . . . . . . . . . . .  UCLA Academic Technology Services
    7/08    http://www.ats.ucla.edu/stat/ado/world/

SJ-5-4  dm67_3  . . . . . . . . . .  Software update for nmissing and npresent
    (help nmissing if installed)  . . . . . . . . . . . . . . .  N. J. Cox
    Q4/05   SJ 5(4):607
    now produces saved results

SJ-3-4  sg67_2  . . . . . . . . . .  Software update for nmissing and npresent
    (help nmissing, npresent if installed)  . . . . . . . . . .  N. J. Cox
    Q4/03   SJ 3(4):449
    updated to include support for by, options for checking
    string values that contain spaces or periods, documentation
    of extended missing values .a to .z, and improved output

STB-60  dm67.1  . . . .  Enhancements to numbers of missing and present values
    (help nmissing if installed)  . . . . . . . . . . . . . . .  N. J. Cox
    3/01    pp.2--3; STB Reprints Vol 10, pp.7--9
    updated with option for reporting on observations

STB-49  dm67  . . . . . . . . . . . . .  Numbers of missing and present values
    (help nmissing if installed)  . . . . . . . . . . . . . . .  N. J. Cox
    5/99    pp.7--8; STB Reprints Vol 9, pp.26--27
    commands to list the numbers of missing values and nonmissing
    values in each variable in varlist

Here is an example:

这是一个例子:

. webuse nlswork
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. nmissing

age                                 24
msp                                 16
nev_mar                             16
grade                                2
not_smsa                             8
c_city                               8
south                                8
ind_code                           341
occ_code                           121
union                             9296
wks_ue                            5704
tenure                             433
hours                               67
wks_work                           703

#4


1  

Another option would be misschk from the SPost site. Type findit misschk to install it. Here's an example:

另一个选项是SPost站点的misschk。类型findit misschk安装它。这里有一个例子:

sysuse auto,clear
replace price=. if (_n==1|_n==3)  // additional missing values
misschk

Without specifying the varlist, misschk just checks all variables.

不指定varlist, misschk只检查所有变量。

The standard output gives you the number as well as percentage of missing values on each variable.

标准输出给出每个变量上缺失值的数量和百分比。

Variables examined for missing values

   #  Variable        # Missing   % Missing
--------------------------------------------
   1  price                 2         2.7
   2  mpg                   0         0.0
   3  rep78                 5         6.8
   4  headroom              0         0.0
   5  trunk                 0         0.0
   6  weight                0         0.0
   7  length                0         0.0
   8  turn                  0         0.0
   9  displacement          0         0.0
   10 gear_ratio            0         0.0
   11 foreign               0         0.0

It also counts all the different missing patterns.

它还计算所有不同的缺失模式。

   Missing for |
         which |
    variables? |      Freq.     Percent        Cum.
---------------+-----------------------------------
 1_3__ _____ _ |          1        1.35        1.35
 1____ _____ _ |          1        1.35        2.70
 __3__ _____ _ |          4        5.41        8.11
 _____ _____ _ |         68       91.89      100.00
---------------+-----------------------------------
         Total |         74      100.00

Lastly, it summarizes the amount of missing values by cases.

最后,根据案例总结了缺失值的数量。

Missing for |
   how many |
 variables? |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         68       91.89       91.89
          1 |          5        6.76       98.65
          2 |          1        1.35      100.00
------------+-----------------------------------
      Total |         74      100.00

misschk also has a couple of other neat features with additional options you can find out about with help misschk.

misschk还有一些其他的整洁的特性,你可以在帮助misschk中找到更多的选项。

#1


3  

One command is:

一个命令是:

misstable summarize

But see also:

还看到:

help missing##useful

and more generally:

和更普遍的:

help missing

#2


2  

I'd add mdesc command to proposed solutions. According to description mdesc:

我将在建议的解决方案中添加mdesc命令。根据描述mdesc:

Produces a table with the number of missing values, total number of cases, and percent missing for each variable in varlist. mdesc works with both numeric and character variables.

生成一个表,其中包含varlist中每个变量的缺失值、总案例数和缺失百分比。mdesc同时处理数字和字符变量。

So advantage to misstable solution is that it works with both numeric and string variables in one go.

所以misstable解决方案的优点是它可以一次同时处理数值变量和字符串变量。

sysuse auto
mdesc

Gives a nice overview of missings:

对缺失做一个很好的概述:

    Variable    |     Missing          Total     Percent Missing
----------------+-----------------------------------------------
           make |           0             74           0.00
          price |           0             74           0.00
            mpg |           0             74           0.00
          rep78 |           5             74           6.76
       headroom |           0             74           0.00
          trunk |           0             74           0.00
         weight |           0             74           0.00
         length |           0             74           0.00
           turn |           0             74           0.00
   displacement |           0             74           0.00
     gear_ratio |           0             74           0.00
        foreign |           0             74           0.00
----------------+-----------------------------------------------

#3


1  

Various commands help. See e.g. codebook. For one user-written command, install nmissing.

各种命令的帮助。看到如电报密码本。对于一个用户编写的命令,安装nmissing。

. search nmissing, historical

Search of official help files, FAQs, Examples, SJs, and STBs

FAQ     . . . . . .  Can I quickly see how many missing values a variable has?
    . . . . . . . . . . . . . . . . . .  UCLA Academic Technology Services
    7/08    http://www.ats.ucla.edu/stat/stata/faq/nmissing.htm

Example . . . . . . . . . . . . . . . . . . . . Useful non-UCLA Stata programs
    . . . . . . . . . . . . . . . . . .  UCLA Academic Technology Services
    7/08    http://www.ats.ucla.edu/stat/ado/world/

SJ-5-4  dm67_3  . . . . . . . . . .  Software update for nmissing and npresent
    (help nmissing if installed)  . . . . . . . . . . . . . . .  N. J. Cox
    Q4/05   SJ 5(4):607
    now produces saved results

SJ-3-4  sg67_2  . . . . . . . . . .  Software update for nmissing and npresent
    (help nmissing, npresent if installed)  . . . . . . . . . .  N. J. Cox
    Q4/03   SJ 3(4):449
    updated to include support for by, options for checking
    string values that contain spaces or periods, documentation
    of extended missing values .a to .z, and improved output

STB-60  dm67.1  . . . .  Enhancements to numbers of missing and present values
    (help nmissing if installed)  . . . . . . . . . . . . . . .  N. J. Cox
    3/01    pp.2--3; STB Reprints Vol 10, pp.7--9
    updated with option for reporting on observations

STB-49  dm67  . . . . . . . . . . . . .  Numbers of missing and present values
    (help nmissing if installed)  . . . . . . . . . . . . . . .  N. J. Cox
    5/99    pp.7--8; STB Reprints Vol 9, pp.26--27
    commands to list the numbers of missing values and nonmissing
    values in each variable in varlist

Here is an example:

这是一个例子:

. webuse nlswork
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. nmissing

age                                 24
msp                                 16
nev_mar                             16
grade                                2
not_smsa                             8
c_city                               8
south                                8
ind_code                           341
occ_code                           121
union                             9296
wks_ue                            5704
tenure                             433
hours                               67
wks_work                           703

#4


1  

Another option would be misschk from the SPost site. Type findit misschk to install it. Here's an example:

另一个选项是SPost站点的misschk。类型findit misschk安装它。这里有一个例子:

sysuse auto,clear
replace price=. if (_n==1|_n==3)  // additional missing values
misschk

Without specifying the varlist, misschk just checks all variables.

不指定varlist, misschk只检查所有变量。

The standard output gives you the number as well as percentage of missing values on each variable.

标准输出给出每个变量上缺失值的数量和百分比。

Variables examined for missing values

   #  Variable        # Missing   % Missing
--------------------------------------------
   1  price                 2         2.7
   2  mpg                   0         0.0
   3  rep78                 5         6.8
   4  headroom              0         0.0
   5  trunk                 0         0.0
   6  weight                0         0.0
   7  length                0         0.0
   8  turn                  0         0.0
   9  displacement          0         0.0
   10 gear_ratio            0         0.0
   11 foreign               0         0.0

It also counts all the different missing patterns.

它还计算所有不同的缺失模式。

   Missing for |
         which |
    variables? |      Freq.     Percent        Cum.
---------------+-----------------------------------
 1_3__ _____ _ |          1        1.35        1.35
 1____ _____ _ |          1        1.35        2.70
 __3__ _____ _ |          4        5.41        8.11
 _____ _____ _ |         68       91.89      100.00
---------------+-----------------------------------
         Total |         74      100.00

Lastly, it summarizes the amount of missing values by cases.

最后,根据案例总结了缺失值的数量。

Missing for |
   how many |
 variables? |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         68       91.89       91.89
          1 |          5        6.76       98.65
          2 |          1        1.35      100.00
------------+-----------------------------------
      Total |         74      100.00

misschk also has a couple of other neat features with additional options you can find out about with help misschk.

misschk还有一些其他的整洁的特性,你可以在帮助misschk中找到更多的选项。