if语句表示基于R中的日期的匹配值

时间:2022-03-04 22:52:15

I have a data frame where I essentially have an ID#, a year, and a status code. Here is an example of it:

我有一个数据框架,其中包含ID#、一年和状态代码。这里有一个例子:

> df <- data.frame(ID=c(100,100,100,102,102,102), 
                   Year=c(2010,2011,2012,2010,2011,2012),
                   Status=c("c","d","d","d","c","c"))
> df
   ID Year Status
1 100 2010      c
2 100 2011      d
3 100 2012      d
4 102 2010      d
5 102 2011      c
6 102 2012      c

I want to add a 4th column (df$def) as a binary based on the ID#'s status, however, once the status is "d" I need that to carry through the remaining years despite the status potentially changing to "c". I can write the simple IF statement to have a 0 for "c" and 1 for "d", but am having trouble factoring the dates moving forward.

我想要添加第4列(df$def)作为一个基于ID的二进制文件,但是,一旦状态为“d”,我需要它在剩余的年份中进行,尽管状态可能会变成“c”。我可以把简单的IF语句写成0表示“c”,1表示“d”,但我在分解日期时遇到了麻烦。

I would like the final table to look like this:

我希望最后的表格是这样的:

 df
   ID Year Status Def
1 100 2010      c   0
2 100 2011      d   1
3 100 2012      d   1
4 102 2010      d   1
5 102 2011      c   1
6 102 2012      c   1

Thanks for the help!

谢谢你的帮助!

3 个解决方案

#1


1  

You could use:

您可以使用:

  within(df, {def<- ave(Status=='d', ID, FUN=cumsum);def[def>1] <- 1 })
  #   ID Year Status def
  #1 100 2010      c   0
  #2 100 2011      d   1
  #3 100 2012      d   1
  #4 102 2010      d   1
  #5 102 2011      c   1
  #6 102 2012      c   1

Or for bigger dataset, you could use data.table

或者对于更大的数据集,可以使用data.table

  library(data.table)
  setDT(df)[, Def:=cumsum(Status=='d'), by=ID][ Def>1, Def:=1][]
  #    ID Year Status Def
 #1: 100 2010      c   0
 #2: 100 2011      d   1
 #3: 100 2012      d   1
 #4: 102 2010      d   1
 #5: 102 2011      c   1
 #6: 102 2012      c   1

Or you could use split

或者你可以用split

  res <- unsplit(lapply(split(df, df$ID), function(x) {
              indx <- which(x$Status=='d')
              x$Def <- 0
              if(length(indx)>0){
              indx1 <- indx[1] 
               x$Def[indx1:nrow(x)] <- 1
               }
               x}), df$ID)



   res
   #   ID Year Status Def
   #1 100 2010      c   0
   #2 100 2011      d   1
   #3 100 2012      d   1
   #4 102 2010      d   1
   #5 102 2011      c   1
   #6 102 2012      c   1

#2


1  

You can try using the function by() to get the cumulative sum by ID (not allowing it to go over 1)

您可以尝试使用by()函数来通过ID获取累积和(不允许它超过1)

df$def <- ifelse(df$Status == "c", 0, 1)
df$def <- pmin(1, unlist(by(df$def, df$ID, cumsum)))

#3


1  

Here's another way:

这是另一种方式:

within(df, {
    Def <- 
    ave(as.character(Status), ID, 
        FUN=function(x) ifelse(seq_along(x) < which.max(x == 'd'), 0, 1))
})
#    ID Year Status Def
# 1 100 2010      c   0
# 2 100 2011      d   1
# 3 100 2012      d   1
# 4 102 2010      d   1
# 5 102 2011      c   1
# 6 102 2012      c   1

#1


1  

You could use:

您可以使用:

  within(df, {def<- ave(Status=='d', ID, FUN=cumsum);def[def>1] <- 1 })
  #   ID Year Status def
  #1 100 2010      c   0
  #2 100 2011      d   1
  #3 100 2012      d   1
  #4 102 2010      d   1
  #5 102 2011      c   1
  #6 102 2012      c   1

Or for bigger dataset, you could use data.table

或者对于更大的数据集,可以使用data.table

  library(data.table)
  setDT(df)[, Def:=cumsum(Status=='d'), by=ID][ Def>1, Def:=1][]
  #    ID Year Status Def
 #1: 100 2010      c   0
 #2: 100 2011      d   1
 #3: 100 2012      d   1
 #4: 102 2010      d   1
 #5: 102 2011      c   1
 #6: 102 2012      c   1

Or you could use split

或者你可以用split

  res <- unsplit(lapply(split(df, df$ID), function(x) {
              indx <- which(x$Status=='d')
              x$Def <- 0
              if(length(indx)>0){
              indx1 <- indx[1] 
               x$Def[indx1:nrow(x)] <- 1
               }
               x}), df$ID)



   res
   #   ID Year Status Def
   #1 100 2010      c   0
   #2 100 2011      d   1
   #3 100 2012      d   1
   #4 102 2010      d   1
   #5 102 2011      c   1
   #6 102 2012      c   1

#2


1  

You can try using the function by() to get the cumulative sum by ID (not allowing it to go over 1)

您可以尝试使用by()函数来通过ID获取累积和(不允许它超过1)

df$def <- ifelse(df$Status == "c", 0, 1)
df$def <- pmin(1, unlist(by(df$def, df$ID, cumsum)))

#3


1  

Here's another way:

这是另一种方式:

within(df, {
    Def <- 
    ave(as.character(Status), ID, 
        FUN=function(x) ifelse(seq_along(x) < which.max(x == 'd'), 0, 1))
})
#    ID Year Status Def
# 1 100 2010      c   0
# 2 100 2011      d   1
# 3 100 2012      d   1
# 4 102 2010      d   1
# 5 102 2011      c   1
# 6 102 2012      c   1