I have three cross-sectional data sets, and I am trying to merge them into one longitudinal data set. Some measures are constant (id, sex, community) and others vary over time (x1 and y). I would like to have a long-form final data set with one column for each of the variables mentioned above. I thought merge_recurse() would do the trick but it produces two columns each for y and x1 (although data12 and data14 merge as I had hoped... perhaps because these variables are renamed after the first merge?). Any thoughts on how to do this simply and quickly? Example data below.
我有三个横截面数据集,我试图将它们合并为一个纵向数据集。一些措施是不变的(身份,性别,社区),其他措施随时间变化(x1和y)。我希望有一个长形式的最终数据集,上面提到的每个变量都有一列。我认为merge_recurse()可以做到这一点,但是它为y和x1分别生成了两列(尽管data12和data14按照我的希望合并...也许是因为这些变量在第一次合并后重命名了?)。关于如何简单快速地做到这一点的任何想法?以下示例数据。
#Constant over time
id = seq(1, 100, 1)
sex = sample(c("male","female"), 100, replace=TRUE)
community = sample(c("comA", "comB", "comC", "comD"), 100, replace=TRUE)
#2010
year = rep(2010, 100)
x1 = rnorm(100, mean=5, sd=1)
y = rnorm(100, mean=10, sd=2)
z = rep(5, 100)
data10 = data.frame(cbind(id, year, sex, community, y, x1, z))
#2012
year = rep(2012, 100)
x1 = rnorm(100, mean=6, sd=1)
y = rnorm(100, mean=11, sd=2)
data12 = data.frame(cbind(id, year, sex, community, y, x1))
#2014
year = rep(2014, 100)
x1 = rnorm(100, mean=7, sd=1)
y = rnorm(100, mean=12, sd=2)
data14 = data.frame(cbind(id, year, sex, community, y, x1))
#Merge each year's data
library(reshape)
#Create a list of all datasets
alldata=list(data10, data12, data14)
#Merge data from multiple dataframes
data = merge_recurse(alldata, by=c("id", "year", "sex", "community")
head(data)
id year sex community y.x x1.x z y.y x1.y
1 1 2010 female comC 13.1771632561173 4.87556993759158 5 <NA> <NA>
2 2 2010 female comB 13.7778630888456 6.69677435551805 5 <NA> <NA>
3 3 2010 male comD 9.42440506678606 3.10067578314296 5 <NA> <NA>
4 4 2010 female comB 11.0739409098036 4.12318001019941 5 <NA> <NA>
5 5 2010 male comB 11.6015489242693 4.9565493450503 5 <NA> <NA>
6 6 2010 female comB 6.52739602897104 3.76896148237067 5 <NA> <NA>
1 个解决方案
#1
1
I think you are looking for this:
我想你正在寻找这个:
all <- do.call(rbind, alldata)
final <- reshape(all, v.names=c("y", "x1"), idvar=c("id", "sex", "community"),
timevar="year", direction="wide")
head(final, 3)
# id sex community y.2010 x1.2010 y.2012 x1.2012 y.2014 x1.2014
# 1 1 female comA 7.711 5.510 13.952 6.502 11.480 6.629
# 2 2 male comB 9.130 5.672 11.470 5.500 10.295 7.338
# 3 3 male comC 15.322 4.889 10.185 5.774 12.257 5.941
#1
1
I think you are looking for this:
我想你正在寻找这个:
all <- do.call(rbind, alldata)
final <- reshape(all, v.names=c("y", "x1"), idvar=c("id", "sex", "community"),
timevar="year", direction="wide")
head(final, 3)
# id sex community y.2010 x1.2010 y.2012 x1.2012 y.2014 x1.2014
# 1 1 female comA 7.711 5.510 13.952 6.502 11.480 6.629
# 2 2 male comB 9.130 5.672 11.470 5.500 10.295 7.338
# 3 3 male comC 15.322 4.889 10.185 5.774 12.257 5.941