There is a dataset with 3 variables - ID, Wage and Year, it is an unbalanced panel.There are 2 problems:
有一个数据集有三个变量- ID,工资和年,这是一个不平衡的面板。有两个问题:
- I want to drop all data on such IDs for which there is a Year with no observations. Shortly, i want to convert my unbalanced panel into balanced dropping every id that creates this "unbalanceness".
- 我想删除所有关于这些id的数据,这些id有一年没有观察到。简单地说,我想将不平衡的面板转换成平衡的删除创建“unbalanceness”的每个id。
For example, if a guy with ID = 1 didn't report his Wage in a Year = 2010 (and therefore there is no observation with Year = 2010 and ID = 1), I want to drop all data for ID = 1.
例如,如果一个ID = 1的人在一年内没有报告他的工资= 2010(因此没有观察到Year = 2010和ID = 1),我想删除ID = 1的所有数据。
It seems like a popular question, but all I found on Google and * were multiple solutions for Stata and none for SPSS.
这似乎是一个流行的问题,但我在谷歌和*上找到的都是Stata的多种解决方案,而SPSS没有。
UPDATE: I managed to solve this problem using COUNTIF Excel function. I created a variabe that counted amount of times certain ID appeared in dataset and kept obseravtions for which this function=amount of years, thus dropping unbalanced IDs. However, i'm still in dire need of solution to the second problem :)
更新:我使用COUNTIF Excel函数解决了这个问题。我创建了一个变量,它计算了数据集中出现的某个ID的次数,并保留了这个函数=数年的值,从而减少了不平衡的ID。然而,我仍然迫切需要解决第二个问题:
- Second question is almost the same as the first one - I want to drop all data on such IDs for which there is a Year when they reported Wage = 0
- 第二个问题几乎和第一个问题一样——我想要删除所有这些id的数据,因为他们报告的工资= 0。
For example, if a guy with ID = 1 reported Wage = 0 in a Year = 2010, I want to drop all data for ID = 1.
例如,如果一个ID = 1的人在一年内的工资= 0,那么我想删除ID = 1的所有数据。
If there is a filling command in SPSS that balances unbalanced panel with missing values, it seems like solution to second problem is a solution to the first one at the same time.
如果SPSS中有一个填充命令,可以平衡不平衡面板和缺失值,那么似乎第二个问题的解决方案同时也是第一个问题的解决方案。
UPDATE 2: I solved this problem as well using COUNTIFS on Wage and ID. Excel is omnipotent, praise Excel.
更新2:我也解决了这个问题,在工资和ID上使用了COUNTIFS, Excel是万能的,praise Excel。
2 个解决方案
#1
1
This will solve both tasks:
这将解决这两项任务:
recode Wage (0=sysmis).
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /Wage_nmiss=NMISS(Wage).
select if Wage_nmiss=0.
execute.
#2
0
I don't know what the data is used for, but if it's something important, you should seriously reconsider deleting the observations with missing variables.
我不知道这些数据是用来做什么的,但是如果有什么重要的事情,你应该认真地重新考虑用缺失的变量删除观察结果。
Often, especially in data on wages, a missing value tells you something about the value that should have been recorded (Link to Wikipedia, Keywords: MAR, MCAR, MNAR)). There are no easy ways to get rid of this bias in your sample, but simply deleting the observation is not a serious option. There are algorithms that manage to cleverly impute missing values, based on the other values in the dataset.
通常,特别是在工资数据中,一个缺失的值告诉你应该记录的值(链接到*,关键字:MAR, MCAR, MNAR)。没有简单的方法可以消除样本中的这种偏差,但是简单地删除观察结果并不是一个严肃的选择。有一些算法可以根据数据集中的其他值来巧妙地估算缺失值。
If you'd like, I could invest a bit more time and help you find a suitable algorithm to impute the missing values..
如果你愿意,我可以多花点时间,帮你找到一个合适的算法来估算缺失的值。
#1
1
This will solve both tasks:
这将解决这两项任务:
recode Wage (0=sysmis).
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /Wage_nmiss=NMISS(Wage).
select if Wage_nmiss=0.
execute.
#2
0
I don't know what the data is used for, but if it's something important, you should seriously reconsider deleting the observations with missing variables.
我不知道这些数据是用来做什么的,但是如果有什么重要的事情,你应该认真地重新考虑用缺失的变量删除观察结果。
Often, especially in data on wages, a missing value tells you something about the value that should have been recorded (Link to Wikipedia, Keywords: MAR, MCAR, MNAR)). There are no easy ways to get rid of this bias in your sample, but simply deleting the observation is not a serious option. There are algorithms that manage to cleverly impute missing values, based on the other values in the dataset.
通常,特别是在工资数据中,一个缺失的值告诉你应该记录的值(链接到*,关键字:MAR, MCAR, MNAR)。没有简单的方法可以消除样本中的这种偏差,但是简单地删除观察结果并不是一个严肃的选择。有一些算法可以根据数据集中的其他值来巧妙地估算缺失值。
If you'd like, I could invest a bit more time and help you find a suitable algorithm to impute the missing values..
如果你愿意,我可以多花点时间,帮你找到一个合适的算法来估算缺失的值。