I have a dataset for x, y date and time.
我有x, y日期和时间的数据集。
My Initial dataset is :
我最初的数据集是:
x y date time
1 2 1-1-01 15:00
2 5 1-1-01 17:00
3 1 1-1-01 18:00
5 7 1-1-01 21:00
2 6 1-1-01 22:00
6 3 1-1-01 23:00
9 2 2-1-01 01:00
6 1 2-1-01 04:00
.....
I want it as:
我想要:
x y date time
1 2 1-1-01 15:00
n/a n/a 1-1-01 16:00
2 5 1-1-01 17:00
3 1 1-1-01 18:00
n/a n/a 1-1-01 19:00
n/a n/a 1-1-01 20:00
5 7 1-1-01 21:00
2 6 1-1-01 22:00
6 3 1-1-01 23:00
n/a n/a 2-1-01 00:00
9 2 2-1-01 01:00
n/a n/a 2-1-01 02:00
n/a n/a 2-1-01 03:00
6 1 2-1-01 04:00
.....
How can I fill n/a values?
如何填入n/a值?
I tried to use xspline function to interpolate the 'x' and 'y'
我尝试使用xspline函数来插入x和y
plot(df[,2:1])
xspline(df[,2:1], shape=-0.3, lwd=1)
Using this plot can I find the values for n/a or is there any other way to find the values for n/a?
用这个图我能找到n/a的值吗?或者有其他方法来找到n/a的值吗?
2 个解决方案
#1
2
We can create another dataset with sequence of 'time' grouped by 'date' and join with the original dataset. This can be done using the devel
version of data.table
. Instructions to install the devel version are here
我们可以用“日期”分组的“时间”序列创建另一个数据集,并与原始数据集连接。这可以使用data.table的devel版本来实现。安装devel版本的说明在这里。
library(data.table)
DT <- setDT(df1)[, {tmp <- as.numeric(substr(time,1,2))
list(time=sprintf('%02d:00', min(tmp):max(tmp)))}, date]
df1[DT, on=c('date', 'time')]
# x y date time
# 1: 1 2 1-1-01 15:00
# 2: NA NA 1-1-01 16:00
# 3: 2 5 1-1-01 17:00
# 4: 3 1 1-1-01 18:00
# 5: NA NA 1-1-01 19:00
# 6: NA NA 1-1-01 20:00
# 7: 5 7 1-1-01 21:00
# 8: 2 6 1-1-01 22:00
# 9: 6 3 1-1-01 23:00
#10: 9 2 2-1-01 01:00
#11: NA NA 2-1-01 02:00
#12: NA NA 2-1-01 03:00
#13: 6 1 2-1-01 04:00
Or if we wanted to create 'time' 00
to 23
hours and then delete the rows that are NA before the first non-NA value in 'x' and 'y' and similar for rows that are NA after the last non-NAs
或者,如果我们想要创建“time”00到23小时,然后删除第一个非NA值之前的NA,在“x”和“y”中的第一个非NA值之前的行,类似于最后一个非NA之后的NA
DT <- setDT(df1)[, list(time=sprintf('%02d:00', 0:23)) , date]
res <- df1[DT, on=c('date', 'time')
][,{tmp <- which(!(is.na(x) & is.na(y)))
.SD[tmp[1L]:tmp[length(tmp)]]}]
res
# x y date time
#1: 1 2 1-1-01 15:00
#2: NA NA 1-1-01 16:00
#3: 2 5 1-1-01 17:00
#4: 3 1 1-1-01 18:00
#5: NA NA 1-1-01 19:00
#6: NA NA 1-1-01 20:00
#7: 5 7 1-1-01 21:00
#8: 2 6 1-1-01 22:00
#9: 6 3 1-1-01 23:00
#10:NA NA 2-1-01 00:00
#11: 9 2 2-1-01 01:00
#12:NA NA 2-1-01 02:00
#13:NA NA 2-1-01 03:00
#14: 6 1 2-1-01 04:00
I didn't read the last part. If you need to fill the NA values, as @bdecaf mentioned in the post (and the same thing I commented and removed earlier), you can use na.approx
from library(zoo)
我没有读最后一部分。如果您需要填充NA值,如post中提到的@bdecaf(以及我之前评论和删除的内容),您可以使用NA。大约从库(动物园)
library(zoo)
res[, c('x', 'y') :=lapply(.SD, na.approx), .SDcols= x:y]
# x y date time
# 1: 1.000000 2.000000 1-1-01 15:00
# 2: 1.500000 3.500000 1-1-01 16:00
# 3: 2.000000 5.000000 1-1-01 17:00
# 4: 3.000000 1.000000 1-1-01 18:00
# 5: 3.666667 3.000000 1-1-01 19:00
# 6: 4.333333 5.000000 1-1-01 20:00
# 7: 5.000000 7.000000 1-1-01 21:00
# 8: 2.000000 6.000000 1-1-01 22:00
# 9: 6.000000 3.000000 1-1-01 23:00
#10: 7.500000 2.500000 2-1-01 00:00
#11: 9.000000 2.000000 2-1-01 01:00
#12: 8.000000 1.666667 2-1-01 02:00
#13: 7.000000 1.333333 2-1-01 03:00
#14: 6.000000 1.000000 2-1-01 04:00
data
df1 <- structure(list(x = c(1L, 2L, 3L, 5L, 2L, 6L, 9L, 6L), y = c(2L,
5L, 1L, 7L, 6L, 3L, 2L, 1L), date = c("1-1-01", "1-1-01", "1-1-01",
"1-1-01", "1-1-01", "1-1-01", "2-1-01", "2-1-01"), time = c("15:00",
"17:00", "18:00", "21:00", "22:00", "23:00", "01:00", "04:00"
)), .Names = c("x", "y", "date", "time"), class = "data.frame",
row.names = c(NA, -8L))
#2
2
about getting the required table
you can do this in base r:
你可以用r来做这个:
Data
数据
in.data <- read.table(text='x y date time
1 2 1-1-01 15:00
2 5 1-1-01 17:00
3 1 1-1-01 18:00
5 7 1-1-01 21:00
2 6 1-1-01 22:00
6 3 1-1-01 23:00
9 2 2-1-01 1:00
6 1 2-1-01 4:00
', header=TRUE)
times <- paste0(0:23,':00')
dates <- paste0(1:2,'-1-01')
create wanted table
创建想要表
all.dt <- expand.grid(date=dates,time=times)
big.data <- merge(all.dt, in.data, all.x=TRUE)
about filling the nas:
tools provided by zoo
动物园提供的工具
They have numerous functions to deal with this problem: na.approx
, na.spline
and na.locf
. E.g.
它们有很多功能来处理这个问题:na。约,na。样条和na.locf。如。
library(zoo)
big.data <- within(big.data,{
x <- na.approx(x,na.rm=FALSE)
y <- na.approx(y,na.rm=FALSE)
})
big.data then contains:
大了。数据包含:
date time x y
1 1-1-01 0:00 NA NA
2 1-1-01 1:00 NA NA
...
15 1-1-01 14:00 NA NA
16 1-1-01 15:00 1.000000 2.000000
17 1-1-01 16:00 1.500000 3.500000
18 1-1-01 17:00 2.000000 5.000000
19 1-1-01 18:00 3.000000 1.000000
20 1-1-01 19:00 3.666667 3.000000
21 1-1-01 20:00 4.333333 5.000000
22 1-1-01 21:00 5.000000 7.000000
23 1-1-01 22:00 2.000000 6.000000
24 1-1-01 23:00 6.000000 3.000000
25 2-1-01 0:00 7.500000 2.500000
26 2-1-01 1:00 9.000000 2.000000
27 2-1-01 2:00 8.000000 1.666667
28 2-1-01 3:00 7.000000 1.333333
29 2-1-01 4:00 6.000000 1.000000
30 2-1-01 5:00 NA NA
31 2-1-01 6:00 NA NA
...
#1
2
We can create another dataset with sequence of 'time' grouped by 'date' and join with the original dataset. This can be done using the devel
version of data.table
. Instructions to install the devel version are here
我们可以用“日期”分组的“时间”序列创建另一个数据集,并与原始数据集连接。这可以使用data.table的devel版本来实现。安装devel版本的说明在这里。
library(data.table)
DT <- setDT(df1)[, {tmp <- as.numeric(substr(time,1,2))
list(time=sprintf('%02d:00', min(tmp):max(tmp)))}, date]
df1[DT, on=c('date', 'time')]
# x y date time
# 1: 1 2 1-1-01 15:00
# 2: NA NA 1-1-01 16:00
# 3: 2 5 1-1-01 17:00
# 4: 3 1 1-1-01 18:00
# 5: NA NA 1-1-01 19:00
# 6: NA NA 1-1-01 20:00
# 7: 5 7 1-1-01 21:00
# 8: 2 6 1-1-01 22:00
# 9: 6 3 1-1-01 23:00
#10: 9 2 2-1-01 01:00
#11: NA NA 2-1-01 02:00
#12: NA NA 2-1-01 03:00
#13: 6 1 2-1-01 04:00
Or if we wanted to create 'time' 00
to 23
hours and then delete the rows that are NA before the first non-NA value in 'x' and 'y' and similar for rows that are NA after the last non-NAs
或者,如果我们想要创建“time”00到23小时,然后删除第一个非NA值之前的NA,在“x”和“y”中的第一个非NA值之前的行,类似于最后一个非NA之后的NA
DT <- setDT(df1)[, list(time=sprintf('%02d:00', 0:23)) , date]
res <- df1[DT, on=c('date', 'time')
][,{tmp <- which(!(is.na(x) & is.na(y)))
.SD[tmp[1L]:tmp[length(tmp)]]}]
res
# x y date time
#1: 1 2 1-1-01 15:00
#2: NA NA 1-1-01 16:00
#3: 2 5 1-1-01 17:00
#4: 3 1 1-1-01 18:00
#5: NA NA 1-1-01 19:00
#6: NA NA 1-1-01 20:00
#7: 5 7 1-1-01 21:00
#8: 2 6 1-1-01 22:00
#9: 6 3 1-1-01 23:00
#10:NA NA 2-1-01 00:00
#11: 9 2 2-1-01 01:00
#12:NA NA 2-1-01 02:00
#13:NA NA 2-1-01 03:00
#14: 6 1 2-1-01 04:00
I didn't read the last part. If you need to fill the NA values, as @bdecaf mentioned in the post (and the same thing I commented and removed earlier), you can use na.approx
from library(zoo)
我没有读最后一部分。如果您需要填充NA值,如post中提到的@bdecaf(以及我之前评论和删除的内容),您可以使用NA。大约从库(动物园)
library(zoo)
res[, c('x', 'y') :=lapply(.SD, na.approx), .SDcols= x:y]
# x y date time
# 1: 1.000000 2.000000 1-1-01 15:00
# 2: 1.500000 3.500000 1-1-01 16:00
# 3: 2.000000 5.000000 1-1-01 17:00
# 4: 3.000000 1.000000 1-1-01 18:00
# 5: 3.666667 3.000000 1-1-01 19:00
# 6: 4.333333 5.000000 1-1-01 20:00
# 7: 5.000000 7.000000 1-1-01 21:00
# 8: 2.000000 6.000000 1-1-01 22:00
# 9: 6.000000 3.000000 1-1-01 23:00
#10: 7.500000 2.500000 2-1-01 00:00
#11: 9.000000 2.000000 2-1-01 01:00
#12: 8.000000 1.666667 2-1-01 02:00
#13: 7.000000 1.333333 2-1-01 03:00
#14: 6.000000 1.000000 2-1-01 04:00
data
df1 <- structure(list(x = c(1L, 2L, 3L, 5L, 2L, 6L, 9L, 6L), y = c(2L,
5L, 1L, 7L, 6L, 3L, 2L, 1L), date = c("1-1-01", "1-1-01", "1-1-01",
"1-1-01", "1-1-01", "1-1-01", "2-1-01", "2-1-01"), time = c("15:00",
"17:00", "18:00", "21:00", "22:00", "23:00", "01:00", "04:00"
)), .Names = c("x", "y", "date", "time"), class = "data.frame",
row.names = c(NA, -8L))
#2
2
about getting the required table
you can do this in base r:
你可以用r来做这个:
Data
数据
in.data <- read.table(text='x y date time
1 2 1-1-01 15:00
2 5 1-1-01 17:00
3 1 1-1-01 18:00
5 7 1-1-01 21:00
2 6 1-1-01 22:00
6 3 1-1-01 23:00
9 2 2-1-01 1:00
6 1 2-1-01 4:00
', header=TRUE)
times <- paste0(0:23,':00')
dates <- paste0(1:2,'-1-01')
create wanted table
创建想要表
all.dt <- expand.grid(date=dates,time=times)
big.data <- merge(all.dt, in.data, all.x=TRUE)
about filling the nas:
tools provided by zoo
动物园提供的工具
They have numerous functions to deal with this problem: na.approx
, na.spline
and na.locf
. E.g.
它们有很多功能来处理这个问题:na。约,na。样条和na.locf。如。
library(zoo)
big.data <- within(big.data,{
x <- na.approx(x,na.rm=FALSE)
y <- na.approx(y,na.rm=FALSE)
})
big.data then contains:
大了。数据包含:
date time x y
1 1-1-01 0:00 NA NA
2 1-1-01 1:00 NA NA
...
15 1-1-01 14:00 NA NA
16 1-1-01 15:00 1.000000 2.000000
17 1-1-01 16:00 1.500000 3.500000
18 1-1-01 17:00 2.000000 5.000000
19 1-1-01 18:00 3.000000 1.000000
20 1-1-01 19:00 3.666667 3.000000
21 1-1-01 20:00 4.333333 5.000000
22 1-1-01 21:00 5.000000 7.000000
23 1-1-01 22:00 2.000000 6.000000
24 1-1-01 23:00 6.000000 3.000000
25 2-1-01 0:00 7.500000 2.500000
26 2-1-01 1:00 9.000000 2.000000
27 2-1-01 2:00 8.000000 1.666667
28 2-1-01 3:00 7.000000 1.333333
29 2-1-01 4:00 6.000000 1.000000
30 2-1-01 5:00 NA NA
31 2-1-01 6:00 NA NA
...