在R中的data.frame或data.table类结构中的扩展日期之间创建NA。

时间:2022-02-01 09:14:18

I currently have a data table that looks like so:

我目前有一个看起来像这样的数据表:

> data
Name     Person     Date
A        1          1/1/2004
A        2          1/3/2004
A        3          1/9/2004 
B        4          1/7/2004
B        5          1/10/2004 
B        6          1/17/2004

I am trying to create something that looks like:

我正在尝试创建一些看起来像:

Name     Person     Date
A        1          1/1/2004
A        2          Repeat
A        2          1/3/2004
A        3          Repeat
A        3          Repeat
A        3          1/9/2004
B        4          1/7/2004
B        5          Repeat
B        5          1/10/2004
B        6          Repeat
B        6          Repeat
B        6          1/17/2004

The idea is to put a factor variable or even "NA" or space whenever the date had already repeated conditional on the name column.

想法是在日期已经在名称列上重复条件时放置因子变量或甚至是“NA”或空格。

This seems agonizingly simple, but the only implementation I can think of is to do a for loop with if else statements. While it gets the job done, the methodology breaks down at higher dimensions. My code looks like this:

这看起来很简单,但我能想到的唯一实现是使用if else语句执行for循环。虽然它完成了工作,但方法在更高的维度上崩溃了。我的代码如下所示:

for(i in nrow(data)){

if data$Date[i] == data$Date[i+1] & data$Name[i] == data$Name[i+1]
then as.vector(data[i,]) <- data$Date[i+1]

}

Would anyone have any faster ways to do this? I have tried to use the data.table package but I can't find something in there that would add the NA's/Dates. Any tips or inputs regarding another implementation or if data.table would work would be greatly appreciated. Thanks!

有没有人有更快的方法来做到这一点?我曾尝试使用data.table包但我找不到那些会添加NA /日期的东西。任何关于其他实现的提示或输入,或者data.table会起作用,将不胜感激。谢谢!

1 个解决方案

#1


1  

You could try:

你可以尝试:

res <- do.call(rbind,lapply(split(data, data$Name),
 function(x) {
 Date1 <- as.Date(x$Date, "%m/%d/%Y")
 x <- x[order(Date1),]
 indx <- seq_len(nrow(x))
 cbind(x[rep(indx,indx), 1:2], Date=x[sequence(indx),3])}))

 row.names(res) <- 1:nrow(res)

  res$Date <- as.character(res$Date)
  res$Date[duplicated(res$Date)] <- "Repeat"
   res
  #   Name Person      Date
  #1     A      1  1/1/2004
  #2     A      2    Repeat
  #3     A      2  1/3/2004
  #4     A      3    Repeat
  #5     A      3    Repeat
  #6     A      3  1/9/2004
  #7     B      4  1/7/2004
  #8     B      5    Repeat
  #9     B      5 1/10/2004
  #10    B      6    Repeat
  #11    B      6    Repeat
  #12    B      6 1/17/2004

Or using data.table (inspired from @David Arenburg's answer here

或者使用data.table(灵感来自@David Arenburg的答案

   DT1 <- setDT(data)[, list(Person=rep(Person, seq_len(.N)), 
           Date=Date[sequence(seq_len(.N))]), by= Name][duplicated(Date), Date:= "Repeat"]
  DT1
  #   Name Person      Date
  #1:    A      1  1/1/2004
  #2:    A      2    Repeat
  #3:    A      2  1/3/2004
  #4:    A      3    Repeat
  #5:    A      3    Repeat
  #6:    A      3  1/9/2004
  #7:    B      4  1/7/2004
  #8:    B      5    Repeat
  #9:    B      5 1/10/2004
 #10:    B      6    Repeat
 #11:    B      6    Repeat
 #12:    B      6 1/17/2004

#1


1  

You could try:

你可以尝试:

res <- do.call(rbind,lapply(split(data, data$Name),
 function(x) {
 Date1 <- as.Date(x$Date, "%m/%d/%Y")
 x <- x[order(Date1),]
 indx <- seq_len(nrow(x))
 cbind(x[rep(indx,indx), 1:2], Date=x[sequence(indx),3])}))

 row.names(res) <- 1:nrow(res)

  res$Date <- as.character(res$Date)
  res$Date[duplicated(res$Date)] <- "Repeat"
   res
  #   Name Person      Date
  #1     A      1  1/1/2004
  #2     A      2    Repeat
  #3     A      2  1/3/2004
  #4     A      3    Repeat
  #5     A      3    Repeat
  #6     A      3  1/9/2004
  #7     B      4  1/7/2004
  #8     B      5    Repeat
  #9     B      5 1/10/2004
  #10    B      6    Repeat
  #11    B      6    Repeat
  #12    B      6 1/17/2004

Or using data.table (inspired from @David Arenburg's answer here

或者使用data.table(灵感来自@David Arenburg的答案

   DT1 <- setDT(data)[, list(Person=rep(Person, seq_len(.N)), 
           Date=Date[sequence(seq_len(.N))]), by= Name][duplicated(Date), Date:= "Repeat"]
  DT1
  #   Name Person      Date
  #1:    A      1  1/1/2004
  #2:    A      2    Repeat
  #3:    A      2  1/3/2004
  #4:    A      3    Repeat
  #5:    A      3    Repeat
  #6:    A      3  1/9/2004
  #7:    B      4  1/7/2004
  #8:    B      5    Repeat
  #9:    B      5 1/10/2004
 #10:    B      6    Repeat
 #11:    B      6    Repeat
 #12:    B      6 1/17/2004