合并数据帧以创建纵向数据集

时间:2021-10-24 19:35:58

I have three cross-sectional data sets, and I am trying to merge them into one longitudinal data set. Some measures are constant (id, sex, community) and others vary over time (x1 and y). I would like to have a long-form final data set with one column for each of the variables mentioned above. I thought merge_recurse() would do the trick but it produces two columns each for y and x1 (although data12 and data14 merge as I had hoped... perhaps because these variables are renamed after the first merge?). Any thoughts on how to do this simply and quickly? Example data below.

我有三个横截面数据集,我试图将它们合并为一个纵向数据集。一些措施是不变的(身份,性别,社区),其他措施随时间变化(x1和y)。我希望有一个长形式的最终数据集,上面提到的每个变量都有一列。我认为merge_recurse()可以做到这一点,但是它为y和x1分别生成了两列(尽管data12和data14按照我的希望合并...也许是因为这些变量在第一次合并后重命名了?)。关于如何简单快速地做到这一点的任何想法?以下示例数据。

#Constant over time
id = seq(1, 100, 1)
sex = sample(c("male","female"), 100, replace=TRUE)
community = sample(c("comA", "comB", "comC", "comD"), 100, replace=TRUE)
#2010
year = rep(2010, 100)
x1 = rnorm(100, mean=5, sd=1)
y = rnorm(100, mean=10, sd=2)
z = rep(5, 100)
data10 = data.frame(cbind(id, year, sex, community, y, x1, z))
#2012
year = rep(2012, 100)
x1 = rnorm(100, mean=6, sd=1)
y = rnorm(100, mean=11, sd=2)
data12 = data.frame(cbind(id, year, sex, community, y, x1))
#2014
year = rep(2014, 100)
x1 = rnorm(100, mean=7, sd=1)
y = rnorm(100, mean=12, sd=2)
data14 = data.frame(cbind(id, year, sex, community, y, x1))
#Merge each year's data
library(reshape)
#Create a list of all datasets
alldata=list(data10, data12, data14)
#Merge data from multiple dataframes
data = merge_recurse(alldata, by=c("id", "year", "sex", "community")

head(data)

id year    sex community              y.x             x1.x z  y.y x1.y
1  1 2010 female      comC 13.1771632561173 4.87556993759158 5 <NA> <NA>
2  2 2010 female      comB 13.7778630888456 6.69677435551805 5 <NA> <NA>
3  3 2010   male      comD 9.42440506678606 3.10067578314296 5 <NA> <NA>
4  4 2010 female      comB 11.0739409098036 4.12318001019941 5 <NA> <NA>
5  5 2010   male      comB 11.6015489242693  4.9565493450503 5 <NA> <NA>
6  6 2010 female      comB 6.52739602897104 3.76896148237067 5 <NA> <NA>

1 个解决方案

#1


1  

I think you are looking for this:

我想你正在寻找这个:

all   <- do.call(rbind, alldata)
final <- reshape(all, v.names=c("y", "x1"), idvar=c("id", "sex", "community"),
                 timevar="year", direction="wide")

head(final, 3)
#   id    sex community  y.2010  x1.2010   y.2012  x1.2012   y.2014  x1.2014
# 1  1 female      comA   7.711    5.510   13.952    6.502   11.480    6.629
# 2  2   male      comB   9.130    5.672   11.470    5.500   10.295    7.338
# 3  3   male      comC  15.322    4.889   10.185    5.774   12.257    5.941

#1


1  

I think you are looking for this:

我想你正在寻找这个:

all   <- do.call(rbind, alldata)
final <- reshape(all, v.names=c("y", "x1"), idvar=c("id", "sex", "community"),
                 timevar="year", direction="wide")

head(final, 3)
#   id    sex community  y.2010  x1.2010   y.2012  x1.2012   y.2014  x1.2014
# 1  1 female      comA   7.711    5.510   13.952    6.502   11.480    6.629
# 2  2   male      comB   9.130    5.672   11.470    5.500   10.295    7.338
# 3  3   male      comC  15.322    4.889   10.185    5.774   12.257    5.941