在SPSS中,将不平衡的面板转换为平衡/下降的多个观察

时间:2022-12-07 20:12:29

There is a dataset with 3 variables - ID, Wage and Year, it is an unbalanced panel.There are 2 problems:

有一个数据集有三个变量- ID,工资和年,这是一个不平衡的面板。有两个问题:

  1. I want to drop all data on such IDs for which there is a Year with no observations. Shortly, i want to convert my unbalanced panel into balanced dropping every id that creates this "unbalanceness".
  2. 我想删除所有关于这些id的数据,这些id有一年没有观察到。简单地说,我想将不平衡的面板转换成平衡的删除创建“unbalanceness”的每个id。

For example, if a guy with ID = 1 didn't report his Wage in a Year = 2010 (and therefore there is no observation with Year = 2010 and ID = 1), I want to drop all data for ID = 1.

例如,如果一个ID = 1的人在一年内没有报告他的工资= 2010(因此没有观察到Year = 2010和ID = 1),我想删除ID = 1的所有数据。

It seems like a popular question, but all I found on Google and * were multiple solutions for Stata and none for SPSS.

这似乎是一个流行的问题,但我在谷歌和*上找到的都是Stata的多种解决方案,而SPSS没有。

UPDATE: I managed to solve this problem using COUNTIF Excel function. I created a variabe that counted amount of times certain ID appeared in dataset and kept obseravtions for which this function=amount of years, thus dropping unbalanced IDs. However, i'm still in dire need of solution to the second problem :)

更新:我使用COUNTIF Excel函数解决了这个问题。我创建了一个变量,它计算了数据集中出现的某个ID的次数,并保留了这个函数=数年的值,从而减少了不平衡的ID。然而,我仍然迫切需要解决第二个问题:

  1. Second question is almost the same as the first one - I want to drop all data on such IDs for which there is a Year when they reported Wage = 0
  2. 第二个问题几乎和第一个问题一样——我想要删除所有这些id的数据,因为他们报告的工资= 0。

For example, if a guy with ID = 1 reported Wage = 0 in a Year = 2010, I want to drop all data for ID = 1.

例如,如果一个ID = 1的人在一年内的工资= 0,那么我想删除ID = 1的所有数据。

If there is a filling command in SPSS that balances unbalanced panel with missing values, it seems like solution to second problem is a solution to the first one at the same time.

如果SPSS中有一个填充命令,可以平衡不平衡面板和缺失值,那么似乎第二个问题的解决方案同时也是第一个问题的解决方案。

UPDATE 2: I solved this problem as well using COUNTIFS on Wage and ID. Excel is omnipotent, praise Excel.

更新2:我也解决了这个问题,在工资和ID上使用了COUNTIFS, Excel是万能的,praise Excel。

2 个解决方案

#1


1  

This will solve both tasks:

这将解决这两项任务:

recode Wage (0=sysmis).
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /Wage_nmiss=NMISS(Wage).
select if Wage_nmiss=0.
execute.

#2


0  

I don't know what the data is used for, but if it's something important, you should seriously reconsider deleting the observations with missing variables.

我不知道这些数据是用来做什么的,但是如果有什么重要的事情,你应该认真地重新考虑用缺失的变量删除观察结果。

Often, especially in data on wages, a missing value tells you something about the value that should have been recorded (Link to Wikipedia, Keywords: MAR, MCAR, MNAR)). There are no easy ways to get rid of this bias in your sample, but simply deleting the observation is not a serious option. There are algorithms that manage to cleverly impute missing values, based on the other values in the dataset.

通常,特别是在工资数据中,一个缺失的值告诉你应该记录的值(链接到*,关键字:MAR, MCAR, MNAR)。没有简单的方法可以消除样本中的这种偏差,但是简单地删除观察结果并不是一个严肃的选择。有一些算法可以根据数据集中的其他值来巧妙地估算缺失值。

If you'd like, I could invest a bit more time and help you find a suitable algorithm to impute the missing values..

如果你愿意,我可以多花点时间,帮你找到一个合适的算法来估算缺失的值。

#1


1  

This will solve both tasks:

这将解决这两项任务:

recode Wage (0=sysmis).
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /Wage_nmiss=NMISS(Wage).
select if Wage_nmiss=0.
execute.

#2


0  

I don't know what the data is used for, but if it's something important, you should seriously reconsider deleting the observations with missing variables.

我不知道这些数据是用来做什么的,但是如果有什么重要的事情,你应该认真地重新考虑用缺失的变量删除观察结果。

Often, especially in data on wages, a missing value tells you something about the value that should have been recorded (Link to Wikipedia, Keywords: MAR, MCAR, MNAR)). There are no easy ways to get rid of this bias in your sample, but simply deleting the observation is not a serious option. There are algorithms that manage to cleverly impute missing values, based on the other values in the dataset.

通常,特别是在工资数据中,一个缺失的值告诉你应该记录的值(链接到*,关键字:MAR, MCAR, MNAR)。没有简单的方法可以消除样本中的这种偏差,但是简单地删除观察结果并不是一个严肃的选择。有一些算法可以根据数据集中的其他值来巧妙地估算缺失值。

If you'd like, I could invest a bit more time and help you find a suitable algorithm to impute the missing values..

如果你愿意,我可以多花点时间,帮你找到一个合适的算法来估算缺失的值。