按变量对行进行编号,但在条件被触发时重新开始

时间:2022-05-18 23:14:50

I want to number certain combinations of row in a dataframe (which is ordered on ID and on Time)

我想对数据帧中的某些行组合进行编号(在ID和时间上排序)

tc <- textConnection('
id              time       end_yn
abc             10         0
abc             11         0
abc             12         1
abc             13         0
def             10         0
def             15         1
def             16         0
def             17         0
def             18         1
')

test <- read.table(tc, header=TRUE)

The goal is to create a new column ("number") that numbers each row per id from 1 to n until end_yn == 1 is hit. After end_yn == 1, the numbering should start over.

目标是创建一个新列(“数字”),将每个id的每一行从1到n编号,直到命中end_yn == 1。在end_yn == 1之后,编号应该重新开始。

Without taking the end_yn == 1 condition into account the rows can be numbered using:

如果不考虑end_yn == 1条件,可以使用以下方法对行进行编号:

DT <- data.table(test)
DT[, id := seq_len(.N), by = id]

However the expected outcome should be:

但是预期的结果应该是:

id              time       end_yn   number
abc             10         0        1
abc             11         0        2
abc             12         1        3 
abc             13         0        1 
def             10         0        1
def             15         1        2
def             16         0        1
def             17         0        2
def             18         1        3

How to incorporate the end_yn == 1 condition?

如何合并end_yn == 1条件?

1 个解决方案

#1


5  

I'm guessing there are different ways to do this, but here's one:

我猜有不同的方法可以做到这一点,但这里有一个:

DT[, cEnd := c(0,cumsum(end_yn)[-.N])] # carry the end value forward

DT[, number := seq_len(.N), by = "id,cEnd"] # create your sequence

DT[, cEnd := NULL] # remove the column created above

Setting id as the key for DT might be worth while.

将id设置为DT的密钥可能值得。

#1


5  

I'm guessing there are different ways to do this, but here's one:

我猜有不同的方法可以做到这一点,但这里有一个:

DT[, cEnd := c(0,cumsum(end_yn)[-.N])] # carry the end value forward

DT[, number := seq_len(.N), by = "id,cEnd"] # create your sequence

DT[, cEnd := NULL] # remove the column created above

Setting id as the key for DT might be worth while.

将id设置为DT的密钥可能值得。