I want to number certain combinations of row in a dataframe (which is ordered on ID and on Time)
我想对数据帧中的某些行组合进行编号(在ID和时间上排序)
tc <- textConnection('
id time end_yn
abc 10 0
abc 11 0
abc 12 1
abc 13 0
def 10 0
def 15 1
def 16 0
def 17 0
def 18 1
')
test <- read.table(tc, header=TRUE)
The goal is to create a new column ("number
") that numbers each row per id
from 1 to n
until end_yn == 1
is hit. After end_yn == 1
, the numbering should start over.
目标是创建一个新列(“数字”),将每个id的每一行从1到n编号,直到命中end_yn == 1。在end_yn == 1之后,编号应该重新开始。
Without taking the end_yn == 1
condition into account the rows can be numbered using:
如果不考虑end_yn == 1条件,可以使用以下方法对行进行编号:
DT <- data.table(test)
DT[, id := seq_len(.N), by = id]
However the expected outcome should be:
但是预期的结果应该是:
id time end_yn number
abc 10 0 1
abc 11 0 2
abc 12 1 3
abc 13 0 1
def 10 0 1
def 15 1 2
def 16 0 1
def 17 0 2
def 18 1 3
How to incorporate the end_yn == 1
condition?
如何合并end_yn == 1条件?
1 个解决方案
#1
5
I'm guessing there are different ways to do this, but here's one:
我猜有不同的方法可以做到这一点,但这里有一个:
DT[, cEnd := c(0,cumsum(end_yn)[-.N])] # carry the end value forward
DT[, number := seq_len(.N), by = "id,cEnd"] # create your sequence
DT[, cEnd := NULL] # remove the column created above
Setting id
as the key for DT
might be worth while.
将id设置为DT的密钥可能值得。
#1
5
I'm guessing there are different ways to do this, but here's one:
我猜有不同的方法可以做到这一点,但这里有一个:
DT[, cEnd := c(0,cumsum(end_yn)[-.N])] # carry the end value forward
DT[, number := seq_len(.N), by = "id,cEnd"] # create your sequence
DT[, cEnd := NULL] # remove the column created above
Setting id
as the key for DT
might be worth while.
将id设置为DT的密钥可能值得。