R语言学习笔记(十七):data.table包中melt与dcast函数的使用

时间:2023-03-09 08:10:44
R语言学习笔记(十七):data.table包中melt与dcast函数的使用

melt函数可以将宽数据转化为长数据

dcast函数可以将长数据转化为宽数据

> DT = fread("melt_default.csv")
> DT
family_id age_mother dob_child1 dob_child2 dob_child3
1: 1 30 1998-11-26 2000-01-29 NA
2: 2 27 1996-06-22 NA NA
3: 3 26 2002-07-11 2004-04-05 2007-09-02
4: 4 32 2004-10-10 2009-08-27 2012-07-21
5: 5 29 2000-12-05 2005-02-28 NA
> DT.m1 <- melt(DT, measure.vars = c("dob_child1", "dob_child2", "dob_child3"),
+ variable.name = "child", value.name = "dob")
> DT.m1
family_id age_mother child dob
1: 1 30 dob_child1 1998-11-26
2: 2 27 dob_child1 1996-06-22
3: 3 26 dob_child1 2002-07-11
4: 4 32 dob_child1 2004-10-10
5: 5 29 dob_child1 2000-12-05
6: 1 30 dob_child2 2000-01-29
7: 2 27 dob_child2 NA
8: 3 26 dob_child2 2004-04-05
9: 4 32 dob_child2 2009-08-27
10: 5 29 dob_child2 2005-02-28
11: 1 30 dob_child3 NA
12: 2 27 dob_child3 NA
13: 3 26 dob_child3 2007-09-02
14: 4 32 dob_child3 2012-07-21
15: 5 29 dob_child3 NA
> dcast(DT.m1, family_id + age_mother ~ child, value.var = "dob")
family_id age_mother dob_child1 dob_child2 dob_child3
1: 1 30 1998-11-26 2000-01-29 NA
2: 2 27 1996-06-22 NA NA
3: 3 26 2002-07-11 2004-04-05 2007-09-02
4: 4 32 2004-10-10 2009-08-27 2012-07-21
5: 5 29 2000-12-05 2005-02-28 NA

对于较为复杂的数据可以这样做

> DT <- fread("melt_enhanced.csv")
> DT
family_id age_mother dob_child1 dob_child2 dob_child3 gender_child1 gender_child2 gender_child3
1: 1 30 1998-11-26 2000-01-29 NA 1 2 NA
2: 2 27 1996-06-22 NA NA 2 NA NA
3: 3 26 2002-07-11 2004-04-05 2007-09-02 2 2 1
4: 4 32 2004-10-10 2009-08-27 2012-07-21 1 1 1
5: 5 29 2000-12-05 2005-02-28 NA 2 1 NA
> DT.m2 <- melt(DT, measure = patterns("^dob","^gender"), value.name = c("dob", "gender"))
> DT.m2
family_id age_mother variable dob gender
1: 1 30 1 1998-11-26 1
2: 2 27 1 1996-06-22 2
3: 3 26 1 2002-07-11 2
4: 4 32 1 2004-10-10 1
5: 5 29 1 2000-12-05 2
6: 1 30 2 2000-01-29 2
7: 2 27 2 NA NA
8: 3 26 2 2004-04-05 2
9: 4 32 2 2009-08-27 1
10: 5 29 2 2005-02-28 1
11: 1 30 3 NA NA
12: 2 27 3 NA NA
13: 3 26 3 2007-09-02 1
14: 4 32 3 2012-07-21 1
15: 5 29 3 NA NA
> DT.c2 <- dcast(DT.m2, family_id + age_mother ~ variable, value.var = c("dob","gender"))
> DT.c2
family_id age_mother dob_1 dob_2 dob_3 gender_1 gender_2 gender_3
1: 1 30 1998-11-26 2000-01-29 NA 1 2 NA
2: 2 27 1996-06-22 NA NA 2 NA NA
3: 3 26 2002-07-11 2004-04-05 2007-09-02 2 2 1
4: 4 32 2004-10-10 2009-08-27 2012-07-21 1 1 1
5: 5 29 2000-12-05 2005-02-28 NA 2 1 NA