Suppose I have this data frame:
假设我有这个数据框:
times vals
1 1 2
2 3 4
3 7 6
set up with
设置了
foo <- data.frame(times=c(1,3,7), vals=c(2,4,6))
and I want this one:
我想要这个:
times vals
1 1 2
2 2 2
3 3 4
4 4 4
5 5 4
6 6 4
7 7 6
That is, I want to fill in all the times from 1 to 7, and fill in the vals from the latest time that is not greater than the given time.
也就是说,我想把所有的时间从1到7填满,然后从最晚的时间填满,不大于给定的时间。
I have some code to do it using dplyr, but it is ugly. Suggestions for better?
我有一些使用dplyr的代码,但是它很难看。更好的建议吗?
library(dplyr)
foo <- merge(foo, data.frame(times=1:max(foo$times)), all.y=TRUE)
foo2 <- merge(foo, foo, by=c(), suffixes=c('', '.1'))
foo2 <- foo2 %>% filter(is.na(vals) & !is.na(vals.1) & times.1 <= times) %>%
group_by(times) %>% arrange(-times.1) %>% mutate(rn = row_number()) %>%
filter(rn == 1) %>%
mutate(vals = vals.1,
rn = NULL,
vals.1 = NULL,
times.1 = NULL)
foo <- merge(foo, foo2, by=c('times'), all.x=TRUE, suffixes=c('', '.2'))
foo <- mutate(foo,
vals = ifelse(is.na(vals), vals.2, vals),
vals.2 = NULL)
4 个解决方案
#1
5
A dplyr
and tidyr
option:
dplyr和tidyr选项:
library(dplyr)
library(tidyr)
foo %>%
right_join(data_frame(times = min(foo$times):max(foo$times))) %>%
fill(vals)
# Joining by: "times"
# times vals
# 1 1 2
# 2 2 2
# 3 3 4
# 4 4 4
# 5 5 4
# 6 6 4
# 7 7 6
#2
10
This is a standard rolling join problem:
这是一个标准的滚动连接问题:
library(data.table)
setDT(foo)[.(1:7), on = 'times', roll = T]
# times vals
#1: 1 2
#2: 2 2
#3: 3 4
#4: 4 4
#5: 5 4
#6: 6 4
#7: 7 6
The above is for devel version (1.9.7+), which is smarter about column matching during joins. For 1.9.6 you still need to specify column name for the inner table:
上面是devel版本(1.9.7+),它在连接期间更智能地进行列匹配。对于1.9.6,您仍然需要为内部表指定列名:
setDT(foo)[.(times = 1:7), on = 'times', roll = T]
#3
6
With approx
:
约:
data.frame(times = 1:7,
vals = unlist(approx(foo, xout = 1:7, method = "constant", f = 0)[2], use.names = F))
times vals
1 1 2
2 2 2
3 3 4
4 4 4
5 5 4
6 6 4
7 7 6
#4
4
This is a bit longer and more verbose base R solution:
这是一个更长,更详细的R解决方案:
# calculate the number of repetitions needed for vals variable
reps <- c(with(foo, times[2:length(times)]-times[1:length(times)-1]), 1)
# get result
fooDoneIt <- data.frame(times = min(foo$times):max(foo$times),
vals = rep(foo$vals, reps))
#1
5
A dplyr
and tidyr
option:
dplyr和tidyr选项:
library(dplyr)
library(tidyr)
foo %>%
right_join(data_frame(times = min(foo$times):max(foo$times))) %>%
fill(vals)
# Joining by: "times"
# times vals
# 1 1 2
# 2 2 2
# 3 3 4
# 4 4 4
# 5 5 4
# 6 6 4
# 7 7 6
#2
10
This is a standard rolling join problem:
这是一个标准的滚动连接问题:
library(data.table)
setDT(foo)[.(1:7), on = 'times', roll = T]
# times vals
#1: 1 2
#2: 2 2
#3: 3 4
#4: 4 4
#5: 5 4
#6: 6 4
#7: 7 6
The above is for devel version (1.9.7+), which is smarter about column matching during joins. For 1.9.6 you still need to specify column name for the inner table:
上面是devel版本(1.9.7+),它在连接期间更智能地进行列匹配。对于1.9.6,您仍然需要为内部表指定列名:
setDT(foo)[.(times = 1:7), on = 'times', roll = T]
#3
6
With approx
:
约:
data.frame(times = 1:7,
vals = unlist(approx(foo, xout = 1:7, method = "constant", f = 0)[2], use.names = F))
times vals
1 1 2
2 2 2
3 3 4
4 4 4
5 5 4
6 6 4
7 7 6
#4
4
This is a bit longer and more verbose base R solution:
这是一个更长,更详细的R解决方案:
# calculate the number of repetitions needed for vals variable
reps <- c(with(foo, times[2:length(times)]-times[1:length(times)-1]), 1)
# get result
fooDoneIt <- data.frame(times = min(foo$times):max(foo$times),
vals = rep(foo$vals, reps))