如何使数据集与时间保持连续?[R]

时间:2021-02-22 22:51:02

I have a dataset for x, y date and time.

我有x, y日期和时间的数据集。

My Initial dataset is :

我最初的数据集是:

x    y   date    time
1    2    1-1-01  15:00
2    5    1-1-01  17:00
3    1    1-1-01  18:00
5    7    1-1-01  21:00
2    6    1-1-01  22:00
6    3    1-1-01  23:00
9    2    2-1-01  01:00
6    1    2-1-01  04:00
.....

I want it as:

我想要:

x    y   date    time
1    2    1-1-01  15:00
n/a n/a   1-1-01  16:00
2    5    1-1-01  17:00
3    1    1-1-01  18:00
n/a n/a   1-1-01  19:00
n/a n/a   1-1-01  20:00
5    7    1-1-01  21:00
2    6    1-1-01  22:00
6    3    1-1-01  23:00
n/a n/a   2-1-01  00:00
9    2    2-1-01  01:00
n/a n/a   2-1-01  02:00
n/a n/a   2-1-01  03:00
6    1    2-1-01  04:00
.....

How can I fill n/a values?

如何填入n/a值?

I tried to use xspline function to interpolate the 'x' and 'y'

我尝试使用xspline函数来插入x和y

plot(df[,2:1])
xspline(df[,2:1], shape=-0.3, lwd=1)

Using this plot can I find the values for n/a or is there any other way to find the values for n/a?

用这个图我能找到n/a的值吗?或者有其他方法来找到n/a的值吗?

2 个解决方案

#1


2  

We can create another dataset with sequence of 'time' grouped by 'date' and join with the original dataset. This can be done using the devel version of data.table. Instructions to install the devel version are here

我们可以用“日期”分组的“时间”序列创建另一个数据集,并与原始数据集连接。这可以使用data.table的devel版本来实现。安装devel版本的说明在这里。

library(data.table)
DT <- setDT(df1)[, {tmp <- as.numeric(substr(time,1,2))
  list(time=sprintf('%02d:00', min(tmp):max(tmp)))}, date]
df1[DT, on=c('date', 'time')]
# x  y   date  time
# 1:  1  2 1-1-01 15:00
# 2: NA NA 1-1-01 16:00
# 3:  2  5 1-1-01 17:00
# 4:  3  1 1-1-01 18:00
# 5: NA NA 1-1-01 19:00
# 6: NA NA 1-1-01 20:00
# 7:  5  7 1-1-01 21:00
# 8:  2  6 1-1-01 22:00
# 9:  6  3 1-1-01 23:00
#10:  9  2 2-1-01 01:00
#11: NA NA 2-1-01 02:00
#12: NA NA 2-1-01 03:00
#13:  6  1 2-1-01 04:00

Or if we wanted to create 'time' 00 to 23 hours and then delete the rows that are NA before the first non-NA value in 'x' and 'y' and similar for rows that are NA after the last non-NAs

或者,如果我们想要创建“time”00到23小时,然后删除第一个非NA值之前的NA,在“x”和“y”中的第一个非NA值之前的行,类似于最后一个非NA之后的NA

 DT <- setDT(df1)[, list(time=sprintf('%02d:00', 0:23)) , date]
 res <- df1[DT, on=c('date', 'time')
             ][,{tmp <- which(!(is.na(x) & is.na(y)))
            .SD[tmp[1L]:tmp[length(tmp)]]}]
 res 
 # x  y   date  time
 #1:  1  2 1-1-01 15:00
 #2: NA NA 1-1-01 16:00
 #3:  2  5 1-1-01 17:00
 #4:  3  1 1-1-01 18:00
 #5: NA NA 1-1-01 19:00
 #6: NA NA 1-1-01 20:00
 #7:  5  7 1-1-01 21:00
 #8:  2  6 1-1-01 22:00
 #9:  6  3 1-1-01 23:00
 #10:NA NA 2-1-01 00:00
 #11: 9  2 2-1-01 01:00
 #12:NA NA 2-1-01 02:00
 #13:NA NA 2-1-01 03:00
 #14: 6  1 2-1-01 04:00

I didn't read the last part. If you need to fill the NA values, as @bdecaf mentioned in the post (and the same thing I commented and removed earlier), you can use na.approx from library(zoo)

我没有读最后一部分。如果您需要填充NA值,如post中提到的@bdecaf(以及我之前评论和删除的内容),您可以使用NA。大约从库(动物园)

library(zoo)
res[, c('x', 'y') :=lapply(.SD, na.approx), .SDcols= x:y]
#           x        y   date  time
# 1: 1.000000 2.000000 1-1-01 15:00
# 2: 1.500000 3.500000 1-1-01 16:00
# 3: 2.000000 5.000000 1-1-01 17:00
# 4: 3.000000 1.000000 1-1-01 18:00
# 5: 3.666667 3.000000 1-1-01 19:00
# 6: 4.333333 5.000000 1-1-01 20:00
# 7: 5.000000 7.000000 1-1-01 21:00
# 8: 2.000000 6.000000 1-1-01 22:00
# 9: 6.000000 3.000000 1-1-01 23:00
#10: 7.500000 2.500000 2-1-01 00:00
#11: 9.000000 2.000000 2-1-01 01:00
#12: 8.000000 1.666667 2-1-01 02:00
#13: 7.000000 1.333333 2-1-01 03:00
#14: 6.000000 1.000000 2-1-01 04:00

data

df1 <- structure(list(x = c(1L, 2L, 3L, 5L, 2L, 6L, 9L, 6L), y = c(2L, 
5L, 1L, 7L, 6L, 3L, 2L, 1L), date = c("1-1-01", "1-1-01", "1-1-01", 
"1-1-01", "1-1-01", "1-1-01", "2-1-01", "2-1-01"), time = c("15:00", 
"17:00", "18:00", "21:00", "22:00", "23:00", "01:00", "04:00"
)), .Names = c("x", "y", "date", "time"), class = "data.frame",
row.names = c(NA, -8L))

#2


2  

about getting the required table

you can do this in base r:

你可以用r来做这个:

Data

数据

in.data <- read.table(text='x    y    date    time
1    2    1-1-01  15:00
2    5    1-1-01  17:00
3    1    1-1-01  18:00
5    7    1-1-01  21:00
2    6    1-1-01  22:00
6    3    1-1-01  23:00
9    2    2-1-01  1:00
6    1    2-1-01  4:00
', header=TRUE)

times <- paste0(0:23,':00')
dates <- paste0(1:2,'-1-01')

create wanted table

创建想要表

all.dt <- expand.grid(date=dates,time=times)

big.data <- merge(all.dt, in.data, all.x=TRUE)

about filling the nas:

tools provided by zoo

动物园提供的工具

They have numerous functions to deal with this problem: na.approx, na.spline and na.locf. E.g.

它们有很多功能来处理这个问题:na。约,na。样条和na.locf。如。

library(zoo)
big.data <- within(big.data,{
         x <- na.approx(x,na.rm=FALSE)
         y <- na.approx(y,na.rm=FALSE)
})

big.data then contains:

大了。数据包含:

     date  time        x        y
1  1-1-01  0:00       NA       NA
2  1-1-01  1:00       NA       NA
...
15 1-1-01 14:00       NA       NA
16 1-1-01 15:00 1.000000 2.000000
17 1-1-01 16:00 1.500000 3.500000
18 1-1-01 17:00 2.000000 5.000000
19 1-1-01 18:00 3.000000 1.000000
20 1-1-01 19:00 3.666667 3.000000
21 1-1-01 20:00 4.333333 5.000000
22 1-1-01 21:00 5.000000 7.000000
23 1-1-01 22:00 2.000000 6.000000
24 1-1-01 23:00 6.000000 3.000000
25 2-1-01  0:00 7.500000 2.500000
26 2-1-01  1:00 9.000000 2.000000
27 2-1-01  2:00 8.000000 1.666667
28 2-1-01  3:00 7.000000 1.333333
29 2-1-01  4:00 6.000000 1.000000
30 2-1-01  5:00       NA       NA
31 2-1-01  6:00       NA       NA
...

#1


2  

We can create another dataset with sequence of 'time' grouped by 'date' and join with the original dataset. This can be done using the devel version of data.table. Instructions to install the devel version are here

我们可以用“日期”分组的“时间”序列创建另一个数据集,并与原始数据集连接。这可以使用data.table的devel版本来实现。安装devel版本的说明在这里。

library(data.table)
DT <- setDT(df1)[, {tmp <- as.numeric(substr(time,1,2))
  list(time=sprintf('%02d:00', min(tmp):max(tmp)))}, date]
df1[DT, on=c('date', 'time')]
# x  y   date  time
# 1:  1  2 1-1-01 15:00
# 2: NA NA 1-1-01 16:00
# 3:  2  5 1-1-01 17:00
# 4:  3  1 1-1-01 18:00
# 5: NA NA 1-1-01 19:00
# 6: NA NA 1-1-01 20:00
# 7:  5  7 1-1-01 21:00
# 8:  2  6 1-1-01 22:00
# 9:  6  3 1-1-01 23:00
#10:  9  2 2-1-01 01:00
#11: NA NA 2-1-01 02:00
#12: NA NA 2-1-01 03:00
#13:  6  1 2-1-01 04:00

Or if we wanted to create 'time' 00 to 23 hours and then delete the rows that are NA before the first non-NA value in 'x' and 'y' and similar for rows that are NA after the last non-NAs

或者,如果我们想要创建“time”00到23小时,然后删除第一个非NA值之前的NA,在“x”和“y”中的第一个非NA值之前的行,类似于最后一个非NA之后的NA

 DT <- setDT(df1)[, list(time=sprintf('%02d:00', 0:23)) , date]
 res <- df1[DT, on=c('date', 'time')
             ][,{tmp <- which(!(is.na(x) & is.na(y)))
            .SD[tmp[1L]:tmp[length(tmp)]]}]
 res 
 # x  y   date  time
 #1:  1  2 1-1-01 15:00
 #2: NA NA 1-1-01 16:00
 #3:  2  5 1-1-01 17:00
 #4:  3  1 1-1-01 18:00
 #5: NA NA 1-1-01 19:00
 #6: NA NA 1-1-01 20:00
 #7:  5  7 1-1-01 21:00
 #8:  2  6 1-1-01 22:00
 #9:  6  3 1-1-01 23:00
 #10:NA NA 2-1-01 00:00
 #11: 9  2 2-1-01 01:00
 #12:NA NA 2-1-01 02:00
 #13:NA NA 2-1-01 03:00
 #14: 6  1 2-1-01 04:00

I didn't read the last part. If you need to fill the NA values, as @bdecaf mentioned in the post (and the same thing I commented and removed earlier), you can use na.approx from library(zoo)

我没有读最后一部分。如果您需要填充NA值,如post中提到的@bdecaf(以及我之前评论和删除的内容),您可以使用NA。大约从库(动物园)

library(zoo)
res[, c('x', 'y') :=lapply(.SD, na.approx), .SDcols= x:y]
#           x        y   date  time
# 1: 1.000000 2.000000 1-1-01 15:00
# 2: 1.500000 3.500000 1-1-01 16:00
# 3: 2.000000 5.000000 1-1-01 17:00
# 4: 3.000000 1.000000 1-1-01 18:00
# 5: 3.666667 3.000000 1-1-01 19:00
# 6: 4.333333 5.000000 1-1-01 20:00
# 7: 5.000000 7.000000 1-1-01 21:00
# 8: 2.000000 6.000000 1-1-01 22:00
# 9: 6.000000 3.000000 1-1-01 23:00
#10: 7.500000 2.500000 2-1-01 00:00
#11: 9.000000 2.000000 2-1-01 01:00
#12: 8.000000 1.666667 2-1-01 02:00
#13: 7.000000 1.333333 2-1-01 03:00
#14: 6.000000 1.000000 2-1-01 04:00

data

df1 <- structure(list(x = c(1L, 2L, 3L, 5L, 2L, 6L, 9L, 6L), y = c(2L, 
5L, 1L, 7L, 6L, 3L, 2L, 1L), date = c("1-1-01", "1-1-01", "1-1-01", 
"1-1-01", "1-1-01", "1-1-01", "2-1-01", "2-1-01"), time = c("15:00", 
"17:00", "18:00", "21:00", "22:00", "23:00", "01:00", "04:00"
)), .Names = c("x", "y", "date", "time"), class = "data.frame",
row.names = c(NA, -8L))

#2


2  

about getting the required table

you can do this in base r:

你可以用r来做这个:

Data

数据

in.data <- read.table(text='x    y    date    time
1    2    1-1-01  15:00
2    5    1-1-01  17:00
3    1    1-1-01  18:00
5    7    1-1-01  21:00
2    6    1-1-01  22:00
6    3    1-1-01  23:00
9    2    2-1-01  1:00
6    1    2-1-01  4:00
', header=TRUE)

times <- paste0(0:23,':00')
dates <- paste0(1:2,'-1-01')

create wanted table

创建想要表

all.dt <- expand.grid(date=dates,time=times)

big.data <- merge(all.dt, in.data, all.x=TRUE)

about filling the nas:

tools provided by zoo

动物园提供的工具

They have numerous functions to deal with this problem: na.approx, na.spline and na.locf. E.g.

它们有很多功能来处理这个问题:na。约,na。样条和na.locf。如。

library(zoo)
big.data <- within(big.data,{
         x <- na.approx(x,na.rm=FALSE)
         y <- na.approx(y,na.rm=FALSE)
})

big.data then contains:

大了。数据包含:

     date  time        x        y
1  1-1-01  0:00       NA       NA
2  1-1-01  1:00       NA       NA
...
15 1-1-01 14:00       NA       NA
16 1-1-01 15:00 1.000000 2.000000
17 1-1-01 16:00 1.500000 3.500000
18 1-1-01 17:00 2.000000 5.000000
19 1-1-01 18:00 3.000000 1.000000
20 1-1-01 19:00 3.666667 3.000000
21 1-1-01 20:00 4.333333 5.000000
22 1-1-01 21:00 5.000000 7.000000
23 1-1-01 22:00 2.000000 6.000000
24 1-1-01 23:00 6.000000 3.000000
25 2-1-01  0:00 7.500000 2.500000
26 2-1-01  1:00 9.000000 2.000000
27 2-1-01  2:00 8.000000 1.666667
28 2-1-01  3:00 7.000000 1.333333
29 2-1-01  4:00 6.000000 1.000000
30 2-1-01  5:00       NA       NA
31 2-1-01  6:00       NA       NA
...