I currently have a data table that looks like so:
我目前有一个看起来像这样的数据表:
> data
Name Person Date
A 1 1/1/2004
A 2 1/3/2004
A 3 1/9/2004
B 4 1/7/2004
B 5 1/10/2004
B 6 1/17/2004
I am trying to create something that looks like:
我正在尝试创建一些看起来像:
Name Person Date
A 1 1/1/2004
A 2 Repeat
A 2 1/3/2004
A 3 Repeat
A 3 Repeat
A 3 1/9/2004
B 4 1/7/2004
B 5 Repeat
B 5 1/10/2004
B 6 Repeat
B 6 Repeat
B 6 1/17/2004
The idea is to put a factor variable or even "NA" or space whenever the date had already repeated conditional on the name column.
想法是在日期已经在名称列上重复条件时放置因子变量或甚至是“NA”或空格。
This seems agonizingly simple, but the only implementation I can think of is to do a for loop with if else statements. While it gets the job done, the methodology breaks down at higher dimensions. My code looks like this:
这看起来很简单,但我能想到的唯一实现是使用if else语句执行for循环。虽然它完成了工作,但方法在更高的维度上崩溃了。我的代码如下所示:
for(i in nrow(data)){
if data$Date[i] == data$Date[i+1] & data$Name[i] == data$Name[i+1]
then as.vector(data[i,]) <- data$Date[i+1]
}
Would anyone have any faster ways to do this? I have tried to use the data.table package but I can't find something in there that would add the NA's/Dates. Any tips or inputs regarding another implementation or if data.table would work would be greatly appreciated. Thanks!
有没有人有更快的方法来做到这一点?我曾尝试使用data.table包但我找不到那些会添加NA /日期的东西。任何关于其他实现的提示或输入,或者data.table会起作用,将不胜感激。谢谢!
1 个解决方案
#1
1
You could try:
你可以尝试:
res <- do.call(rbind,lapply(split(data, data$Name),
function(x) {
Date1 <- as.Date(x$Date, "%m/%d/%Y")
x <- x[order(Date1),]
indx <- seq_len(nrow(x))
cbind(x[rep(indx,indx), 1:2], Date=x[sequence(indx),3])}))
row.names(res) <- 1:nrow(res)
res$Date <- as.character(res$Date)
res$Date[duplicated(res$Date)] <- "Repeat"
res
# Name Person Date
#1 A 1 1/1/2004
#2 A 2 Repeat
#3 A 2 1/3/2004
#4 A 3 Repeat
#5 A 3 Repeat
#6 A 3 1/9/2004
#7 B 4 1/7/2004
#8 B 5 Repeat
#9 B 5 1/10/2004
#10 B 6 Repeat
#11 B 6 Repeat
#12 B 6 1/17/2004
Or using data.table
(inspired from @David Arenburg's answer here
或者使用data.table(灵感来自@David Arenburg的答案
DT1 <- setDT(data)[, list(Person=rep(Person, seq_len(.N)),
Date=Date[sequence(seq_len(.N))]), by= Name][duplicated(Date), Date:= "Repeat"]
DT1
# Name Person Date
#1: A 1 1/1/2004
#2: A 2 Repeat
#3: A 2 1/3/2004
#4: A 3 Repeat
#5: A 3 Repeat
#6: A 3 1/9/2004
#7: B 4 1/7/2004
#8: B 5 Repeat
#9: B 5 1/10/2004
#10: B 6 Repeat
#11: B 6 Repeat
#12: B 6 1/17/2004
#1
1
You could try:
你可以尝试:
res <- do.call(rbind,lapply(split(data, data$Name),
function(x) {
Date1 <- as.Date(x$Date, "%m/%d/%Y")
x <- x[order(Date1),]
indx <- seq_len(nrow(x))
cbind(x[rep(indx,indx), 1:2], Date=x[sequence(indx),3])}))
row.names(res) <- 1:nrow(res)
res$Date <- as.character(res$Date)
res$Date[duplicated(res$Date)] <- "Repeat"
res
# Name Person Date
#1 A 1 1/1/2004
#2 A 2 Repeat
#3 A 2 1/3/2004
#4 A 3 Repeat
#5 A 3 Repeat
#6 A 3 1/9/2004
#7 B 4 1/7/2004
#8 B 5 Repeat
#9 B 5 1/10/2004
#10 B 6 Repeat
#11 B 6 Repeat
#12 B 6 1/17/2004
Or using data.table
(inspired from @David Arenburg's answer here
或者使用data.table(灵感来自@David Arenburg的答案
DT1 <- setDT(data)[, list(Person=rep(Person, seq_len(.N)),
Date=Date[sequence(seq_len(.N))]), by= Name][duplicated(Date), Date:= "Repeat"]
DT1
# Name Person Date
#1: A 1 1/1/2004
#2: A 2 Repeat
#3: A 2 1/3/2004
#4: A 3 Repeat
#5: A 3 Repeat
#6: A 3 1/9/2004
#7: B 4 1/7/2004
#8: B 5 Repeat
#9: B 5 1/10/2004
#10: B 6 Repeat
#11: B 6 Repeat
#12: B 6 1/17/2004