数据框中的时间序列:重新排序数据

时间:2021-11-10 16:58:58

I have a time series data set produced by a measuring software with the following structure:

我有一个测量软件生成的时间序列数据集,结构如下:

ID1 ID2 START   mes1    mes2    mes3    mes4    mes5    mes6
myidA   aa  2000    12  58  45  66  88  77
myidB   aa  2004    44  89  NA  NA  NA  NA
myidC   ab  2001    69  58  77  88  87  NA
myidD   ab  2004    78  66  NA  NA  NA  NA

START indicates the year of the older measurement which was saved in the first measurement column (mes1). For each sample (each line of the data frame), the start year can be different.

START表示保存在第一个测量列(mes1)中的旧测量年份。对于每个样本(数据框的每一行),起始年份可以不同。

I'd like to create the following data frame, where measurements are ordered by years (to replace n° of measurement with years of the measure):

我想创建以下数据框,其中测量按年排序(用测量年数替换n°测量):

ID1 ID2 START   2000    2001    2002    2003    2004    2005
myidA   aa  2000    12  58  45  66  88  77
myidB   aa  2004    NA  NA  NA  NA  44  89
myidC   ab  2001    NA  69  58  77  88  87
myidD   ab  2004    NA  NA  NA  NA  78  66

I may have to use a time series object but I don't know how to cope with the IDs (I need to keep them) and with the START...

我可能不得不使用时间序列对象,但我不知道如何处理ID(我需要保留它们)和START ...

1 个解决方案

#1


1  

Here's the approach I would take:

这是我要采取的方法:

library(reshape2)
dfL <- melt(mydf, id.vars=c("ID1", "ID2", "START"))
dfL <- dfL[complete.cases(dfL), ]
head(dfL)
#     ID1 ID2 START variable value
# 1 myidA  aa  2000     mes1    12
# 2 myidB  aa  2004     mes1    44
# 3 myidC  ab  2001     mes1    69
# 4 myidD  ab  2004     mes1    78
# 5 myidA  aa  2000     mes2    58
# 6 myidB  aa  2004     mes2    89

dfL$year <- dfL$START + as.numeric(gsub("mes", "", dfL$variable))-1

dcast(dfL, ID1 + ID2 + START ~ year, value.var="value")
#     ID1 ID2 START 2000 2001 2002 2003 2004 2005
# 1 myidA  aa  2000   12   58   45   66   88   77
# 2 myidB  aa  2004   NA   NA   NA   NA   44   89
# 3 myidC  ab  2001   NA   69   58   77   88   87
# 4 myidD  ab  2004   NA   NA   NA   NA   78   66

The basic idea is to make use of the "mes1", "mes2" values to "push" the values to their correct place in the newly widened data.frame.

基本思想是利用“mes1”,“mes2”值将值“推”到新扩展的data.frame中的正确位置。


Here's the "mydf" that I used, in case anyone else wants to take a stab at this.

这是我使用过的“mydf”,以防其他人想要对此进行抨击。

mydf <- structure(
  list(ID1 = c("myidA", "myidB", "myidC", "myidD"), 
       ID2 = c("aa", "aa", "ab", "ab"), 
       START = c(2000L, 2004L, 2001L, 2004L), 
       mes1 = c(12L, 44L, 69L, 78L), mes2 = c(58L, 89L, 58L, 66L), 
       mes3 = c(45L, NA, 77L, NA), mes4 = c(66L, NA, 88L, NA), 
       mes5 = c(88L, NA, 87L, NA), mes6 = c(77L, NA, NA, NA)), 
  .Names = c("ID1", "ID2", "START", "mes1", "mes2", "mes3", 
             "mes4", "mes5", "mes6"), class = "data.frame", 
  row.names = c(NA, -4L))
mydf
#     ID1 ID2 START mes1 mes2 mes3 mes4 mes5 mes6
# 1 myidA  aa  2000   12   58   45   66   88   77
# 2 myidB  aa  2004   44   89   NA   NA   NA   NA
# 3 myidC  ab  2001   69   58   77   88   87   NA
# 4 myidD  ab  2004   78   66   NA   NA   NA   NA

#1


1  

Here's the approach I would take:

这是我要采取的方法:

library(reshape2)
dfL <- melt(mydf, id.vars=c("ID1", "ID2", "START"))
dfL <- dfL[complete.cases(dfL), ]
head(dfL)
#     ID1 ID2 START variable value
# 1 myidA  aa  2000     mes1    12
# 2 myidB  aa  2004     mes1    44
# 3 myidC  ab  2001     mes1    69
# 4 myidD  ab  2004     mes1    78
# 5 myidA  aa  2000     mes2    58
# 6 myidB  aa  2004     mes2    89

dfL$year <- dfL$START + as.numeric(gsub("mes", "", dfL$variable))-1

dcast(dfL, ID1 + ID2 + START ~ year, value.var="value")
#     ID1 ID2 START 2000 2001 2002 2003 2004 2005
# 1 myidA  aa  2000   12   58   45   66   88   77
# 2 myidB  aa  2004   NA   NA   NA   NA   44   89
# 3 myidC  ab  2001   NA   69   58   77   88   87
# 4 myidD  ab  2004   NA   NA   NA   NA   78   66

The basic idea is to make use of the "mes1", "mes2" values to "push" the values to their correct place in the newly widened data.frame.

基本思想是利用“mes1”,“mes2”值将值“推”到新扩展的data.frame中的正确位置。


Here's the "mydf" that I used, in case anyone else wants to take a stab at this.

这是我使用过的“mydf”,以防其他人想要对此进行抨击。

mydf <- structure(
  list(ID1 = c("myidA", "myidB", "myidC", "myidD"), 
       ID2 = c("aa", "aa", "ab", "ab"), 
       START = c(2000L, 2004L, 2001L, 2004L), 
       mes1 = c(12L, 44L, 69L, 78L), mes2 = c(58L, 89L, 58L, 66L), 
       mes3 = c(45L, NA, 77L, NA), mes4 = c(66L, NA, 88L, NA), 
       mes5 = c(88L, NA, 87L, NA), mes6 = c(77L, NA, NA, NA)), 
  .Names = c("ID1", "ID2", "START", "mes1", "mes2", "mes3", 
             "mes4", "mes5", "mes6"), class = "data.frame", 
  row.names = c(NA, -4L))
mydf
#     ID1 ID2 START mes1 mes2 mes3 mes4 mes5 mes6
# 1 myidA  aa  2000   12   58   45   66   88   77
# 2 myidB  aa  2004   44   89   NA   NA   NA   NA
# 3 myidC  ab  2001   69   58   77   88   87   NA
# 4 myidD  ab  2004   78   66   NA   NA   NA   NA