I have a data table like this.
我有这样的数据表。
ID1 ID2 member
1 a x parent
2 a y child
3 a z parent
4 a p child
5 a q child
6 b x parent
7 b z parent
8 b q child
And I want to assign a sequence like below.
我想分配一个如下所示的序列。
ID1 ID2 member sequence
1 a x parent 1
2 a y child 2
3 a z parent 1
4 a p child 2
5 a q child 3
6 b x parent 1
7 b z parent 1
8 b q child 2
i.e.
即
> dt$sequence = 1, wherever dt$member == "parent"
> dt$sequence = previous_row_value + 1, wherever dt$member=="child"
As of now I have been doing it using loops, like below.
截至目前,我一直在使用循环,如下所示。
dt_sequence <- dt[ ,sequencing(.SD), by="ID1"]
sequencing <- function(dt){
for(i in 1:nrow(dt)){
if(i == 1){
dt$sequence[i] = 1
next
}
if(dt[i,member] %in% "child"){
dt$sequence[i] = as.numeric(dt$sequence[i-1]) + 1
}
else
dt$sequence[i] = 1
}
return(dt)
}
I ran this code on a data table of 400 000 rows and it took a lot of time to complete (around 15 mins). Can anyone suggest a faster way to do it.
我在40万行的数据表上运行此代码,并且需要花费大量时间才能完成(大约15分钟)。任何人都可以建议更快的方式来做到这一点。
1 个解决方案
#1
5
Here's one way with seq
:
这是seq的一种方式:
dt[ , sequence := seq(.N), by = cumsum(member == "parent")]
# ID1 ID2 member sequence
# 1: a x parent 1
# 2: a y child 2
# 3: a z parent 1
# 4: a p child 2
# 5: a q child 3
# 6: b x parent 1
# 7: b z parent 1
# 8: b q child 2
How it works?
怎么运行的?
The command member == "parent"
creates a logical vector. The function cumsum
is used to calculate the cumulative sum. In this case, it creates vector in which a parent and the following childs have the same number. This vector is used for grouping. Finally, seq(.N)
creates a sequence from 1 up to the number of elements in the group.
命令成员==“parent”创建逻辑向量。函数cumsum用于计算累积和。在这种情况下,它创建向量,其中父对象和后续子对象具有相同的数字。该向量用于分组。最后,seq(.N)创建一个从1到组中元素数量的序列。
#1
5
Here's one way with seq
:
这是seq的一种方式:
dt[ , sequence := seq(.N), by = cumsum(member == "parent")]
# ID1 ID2 member sequence
# 1: a x parent 1
# 2: a y child 2
# 3: a z parent 1
# 4: a p child 2
# 5: a q child 3
# 6: b x parent 1
# 7: b z parent 1
# 8: b q child 2
How it works?
怎么运行的?
The command member == "parent"
creates a logical vector. The function cumsum
is used to calculate the cumulative sum. In this case, it creates vector in which a parent and the following childs have the same number. This vector is used for grouping. Finally, seq(.N)
creates a sequence from 1 up to the number of elements in the group.
命令成员==“parent”创建逻辑向量。函数cumsum用于计算累积和。在这种情况下,它创建向量,其中父对象和后续子对象具有相同的数字。该向量用于分组。最后,seq(.N)创建一个从1到组中元素数量的序列。