I have a time series data set produced by a measuring software with the following structure:
我有一个测量软件生成的时间序列数据集,结构如下:
ID1 ID2 START mes1 mes2 mes3 mes4 mes5 mes6
myidA aa 2000 12 58 45 66 88 77
myidB aa 2004 44 89 NA NA NA NA
myidC ab 2001 69 58 77 88 87 NA
myidD ab 2004 78 66 NA NA NA NA
START indicates the year of the older measurement which was saved in the first measurement column (mes1). For each sample (each line of the data frame), the start year can be different.
START表示保存在第一个测量列(mes1)中的旧测量年份。对于每个样本(数据框的每一行),起始年份可以不同。
I'd like to create the following data frame, where measurements are ordered by years (to replace n° of measurement with years of the measure):
我想创建以下数据框,其中测量按年排序(用测量年数替换n°测量):
ID1 ID2 START 2000 2001 2002 2003 2004 2005
myidA aa 2000 12 58 45 66 88 77
myidB aa 2004 NA NA NA NA 44 89
myidC ab 2001 NA 69 58 77 88 87
myidD ab 2004 NA NA NA NA 78 66
I may have to use a time series object but I don't know how to cope with the IDs (I need to keep them) and with the START...
我可能不得不使用时间序列对象,但我不知道如何处理ID(我需要保留它们)和START ...
1 个解决方案
#1
1
Here's the approach I would take:
这是我要采取的方法:
library(reshape2)
dfL <- melt(mydf, id.vars=c("ID1", "ID2", "START"))
dfL <- dfL[complete.cases(dfL), ]
head(dfL)
# ID1 ID2 START variable value
# 1 myidA aa 2000 mes1 12
# 2 myidB aa 2004 mes1 44
# 3 myidC ab 2001 mes1 69
# 4 myidD ab 2004 mes1 78
# 5 myidA aa 2000 mes2 58
# 6 myidB aa 2004 mes2 89
dfL$year <- dfL$START + as.numeric(gsub("mes", "", dfL$variable))-1
dcast(dfL, ID1 + ID2 + START ~ year, value.var="value")
# ID1 ID2 START 2000 2001 2002 2003 2004 2005
# 1 myidA aa 2000 12 58 45 66 88 77
# 2 myidB aa 2004 NA NA NA NA 44 89
# 3 myidC ab 2001 NA 69 58 77 88 87
# 4 myidD ab 2004 NA NA NA NA 78 66
The basic idea is to make use of the "mes1", "mes2" values to "push" the values to their correct place in the newly widened data.frame
.
基本思想是利用“mes1”,“mes2”值将值“推”到新扩展的data.frame中的正确位置。
Here's the "mydf" that I used, in case anyone else wants to take a stab at this.
这是我使用过的“mydf”,以防其他人想要对此进行抨击。
mydf <- structure(
list(ID1 = c("myidA", "myidB", "myidC", "myidD"),
ID2 = c("aa", "aa", "ab", "ab"),
START = c(2000L, 2004L, 2001L, 2004L),
mes1 = c(12L, 44L, 69L, 78L), mes2 = c(58L, 89L, 58L, 66L),
mes3 = c(45L, NA, 77L, NA), mes4 = c(66L, NA, 88L, NA),
mes5 = c(88L, NA, 87L, NA), mes6 = c(77L, NA, NA, NA)),
.Names = c("ID1", "ID2", "START", "mes1", "mes2", "mes3",
"mes4", "mes5", "mes6"), class = "data.frame",
row.names = c(NA, -4L))
mydf
# ID1 ID2 START mes1 mes2 mes3 mes4 mes5 mes6
# 1 myidA aa 2000 12 58 45 66 88 77
# 2 myidB aa 2004 44 89 NA NA NA NA
# 3 myidC ab 2001 69 58 77 88 87 NA
# 4 myidD ab 2004 78 66 NA NA NA NA
#1
1
Here's the approach I would take:
这是我要采取的方法:
library(reshape2)
dfL <- melt(mydf, id.vars=c("ID1", "ID2", "START"))
dfL <- dfL[complete.cases(dfL), ]
head(dfL)
# ID1 ID2 START variable value
# 1 myidA aa 2000 mes1 12
# 2 myidB aa 2004 mes1 44
# 3 myidC ab 2001 mes1 69
# 4 myidD ab 2004 mes1 78
# 5 myidA aa 2000 mes2 58
# 6 myidB aa 2004 mes2 89
dfL$year <- dfL$START + as.numeric(gsub("mes", "", dfL$variable))-1
dcast(dfL, ID1 + ID2 + START ~ year, value.var="value")
# ID1 ID2 START 2000 2001 2002 2003 2004 2005
# 1 myidA aa 2000 12 58 45 66 88 77
# 2 myidB aa 2004 NA NA NA NA 44 89
# 3 myidC ab 2001 NA 69 58 77 88 87
# 4 myidD ab 2004 NA NA NA NA 78 66
The basic idea is to make use of the "mes1", "mes2" values to "push" the values to their correct place in the newly widened data.frame
.
基本思想是利用“mes1”,“mes2”值将值“推”到新扩展的data.frame中的正确位置。
Here's the "mydf" that I used, in case anyone else wants to take a stab at this.
这是我使用过的“mydf”,以防其他人想要对此进行抨击。
mydf <- structure(
list(ID1 = c("myidA", "myidB", "myidC", "myidD"),
ID2 = c("aa", "aa", "ab", "ab"),
START = c(2000L, 2004L, 2001L, 2004L),
mes1 = c(12L, 44L, 69L, 78L), mes2 = c(58L, 89L, 58L, 66L),
mes3 = c(45L, NA, 77L, NA), mes4 = c(66L, NA, 88L, NA),
mes5 = c(88L, NA, 87L, NA), mes6 = c(77L, NA, NA, NA)),
.Names = c("ID1", "ID2", "START", "mes1", "mes2", "mes3",
"mes4", "mes5", "mes6"), class = "data.frame",
row.names = c(NA, -4L))
mydf
# ID1 ID2 START mes1 mes2 mes3 mes4 mes5 mes6
# 1 myidA aa 2000 12 58 45 66 88 77
# 2 myidB aa 2004 44 89 NA NA NA NA
# 3 myidC ab 2001 69 58 77 88 87 NA
# 4 myidD ab 2004 78 66 NA NA NA NA