在R中填充数据帧中的值?

时间:2021-03-26 22:56:53

Suppose I have this data frame:

假设我有这个数据框:

  times vals
1     1    2
2     3    4
3     7    6

set up with

设置了

foo <- data.frame(times=c(1,3,7), vals=c(2,4,6))

and I want this one:

我想要这个:

  times vals
1     1    2
2     2    2
3     3    4
4     4    4
5     5    4
6     6    4
7     7    6

That is, I want to fill in all the times from 1 to 7, and fill in the vals from the latest time that is not greater than the given time.

也就是说,我想把所有的时间从1到7填满,然后从最晚的时间填满,不大于给定的时间。

I have some code to do it using dplyr, but it is ugly. Suggestions for better?

我有一些使用dplyr的代码,但是它很难看。更好的建议吗?

library(dplyr)

foo <- merge(foo, data.frame(times=1:max(foo$times)), all.y=TRUE)
foo2 <- merge(foo, foo, by=c(), suffixes=c('', '.1'))

foo2 <- foo2 %>% filter(is.na(vals) & !is.na(vals.1) & times.1 <= times) %>%
  group_by(times) %>% arrange(-times.1) %>% mutate(rn = row_number()) %>%
  filter(rn == 1) %>%
  mutate(vals = vals.1,
         rn = NULL,
         vals.1 = NULL,
         times.1 = NULL)

foo <- merge(foo, foo2, by=c('times'), all.x=TRUE, suffixes=c('', '.2'))
foo <- mutate(foo,
              vals = ifelse(is.na(vals), vals.2, vals),
              vals.2 = NULL)

4 个解决方案

#1


5  

A dplyr and tidyr option:

dplyr和tidyr选项:

library(dplyr)
library(tidyr)

foo %>%
 right_join(data_frame(times = min(foo$times):max(foo$times))) %>%
 fill(vals)
# Joining by: "times"
#   times vals
# 1     1    2
# 2     2    2
# 3     3    4
# 4     4    4
# 5     5    4
# 6     6    4
# 7     7    6

#2


10  

This is a standard rolling join problem:

这是一个标准的滚动连接问题:

library(data.table)

setDT(foo)[.(1:7), on = 'times', roll = T]
#   times vals
#1:     1    2
#2:     2    2
#3:     3    4
#4:     4    4
#5:     5    4
#6:     6    4
#7:     7    6

The above is for devel version (1.9.7+), which is smarter about column matching during joins. For 1.9.6 you still need to specify column name for the inner table:

上面是devel版本(1.9.7+),它在连接期间更智能地进行列匹配。对于1.9.6,您仍然需要为内部表指定列名:

setDT(foo)[.(times = 1:7), on = 'times', roll = T]

#3


6  

With approx:

约:

data.frame(times = 1:7, 
           vals = unlist(approx(foo, xout = 1:7, method = "constant", f = 0)[2], use.names = F))

  times vals
1     1    2
2     2    2
3     3    4
4     4    4
5     5    4
6     6    4
7     7    6

#4


4  

This is a bit longer and more verbose base R solution:

这是一个更长,更详细的R解决方案:

# calculate the number of repetitions needed for vals variable
reps <- c(with(foo, times[2:length(times)]-times[1:length(times)-1]), 1)

# get result
fooDoneIt <- data.frame(times = min(foo$times):max(foo$times),
              vals = rep(foo$vals, reps))

#1


5  

A dplyr and tidyr option:

dplyr和tidyr选项:

library(dplyr)
library(tidyr)

foo %>%
 right_join(data_frame(times = min(foo$times):max(foo$times))) %>%
 fill(vals)
# Joining by: "times"
#   times vals
# 1     1    2
# 2     2    2
# 3     3    4
# 4     4    4
# 5     5    4
# 6     6    4
# 7     7    6

#2


10  

This is a standard rolling join problem:

这是一个标准的滚动连接问题:

library(data.table)

setDT(foo)[.(1:7), on = 'times', roll = T]
#   times vals
#1:     1    2
#2:     2    2
#3:     3    4
#4:     4    4
#5:     5    4
#6:     6    4
#7:     7    6

The above is for devel version (1.9.7+), which is smarter about column matching during joins. For 1.9.6 you still need to specify column name for the inner table:

上面是devel版本(1.9.7+),它在连接期间更智能地进行列匹配。对于1.9.6,您仍然需要为内部表指定列名:

setDT(foo)[.(times = 1:7), on = 'times', roll = T]

#3


6  

With approx:

约:

data.frame(times = 1:7, 
           vals = unlist(approx(foo, xout = 1:7, method = "constant", f = 0)[2], use.names = F))

  times vals
1     1    2
2     2    2
3     3    4
4     4    4
5     5    4
6     6    4
7     7    6

#4


4  

This is a bit longer and more verbose base R solution:

这是一个更长,更详细的R解决方案:

# calculate the number of repetitions needed for vals variable
reps <- c(with(foo, times[2:length(times)]-times[1:length(times)-1]), 1)

# get result
fooDoneIt <- data.frame(times = min(foo$times):max(foo$times),
              vals = rep(foo$vals, reps))