获取data.table中前一个组的最后一行

时间:2021-06-24 22:50:42

This is what my data table looks like:

这就是我的数据表的样子:

library(data.table)
dt <- fread('
    Product  Group    LastProductOfPriorGroup
    A          1          NA
    B          1          NA
    C          2          B
    D          2          B
    E          2          B
    F          3          E
    G          3          E
')

The LastProductOfPriorGroup column is my desired column. I am trying to fetch the product from last row of the prior group. So in the first two rows, there are no prior groups and therefore it is NA. In the third row, the product in the last row of the prior group 1 is B. I am trying to accomplish this by

LastProductOfPriorGroup列是我想要的列。我试图从前一组的最后一行获取产品。所以在前两行中,没有先前的组,因此它是NA。在第三行中,前一组1的最后一行中的产品是B.我试图通过以下方式实现此目的

dt[,LastGroupProduct:= shift(Product,1), by=shift(Group,1)]

to no avail.

无济于事。

2 个解决方案

#1


14  

You could do

你可以做到

dt[, newcol := shift(dt[, last(Product), by = Group]$V1)[.GRP], by = Group]

This results in the following updated dt, where newcol matches your desired column with the unnecessarily long name. ;)

这会产生以下更新的dt,其中newcol与您想要的列匹配不必要的长名称。 ;)

   Product Group LastProductOfPriorGroup newcol
1:       A     1                      NA     NA
2:       B     1                      NA     NA
3:       C     2                       B      B
4:       D     2                       B      B
5:       E     2                       B      B
6:       F     3                       E      E
7:       G     3                       E      E

Let's break the code down from the inside out. I will use ... to denote the accumulated code:

让我们从内到外打破代码。我将使用...来表示累积的代码:

  • dt[, last(Product), by = Group]$V1 is getting the last values from each group as a character vector.
  • dt [,last(Product),by = Group] $ V1将每组中的最后一个值作为字符向量。

  • shift(...) shifts the character vector in the previous call
  • shift(...)在前一次调用中移动字符向量

  • dt[, newcol := ...[.GRP], by = Group] groups by Group and uses the internal .GRP values for indexing
  • dt [,newcol:= ... [。GRP],by = Group]按组分组并使用内部.GRP值进行索引

Update: Frank brings up a good point about my code above calculating the shift for every group over and over again. To avoid that, we can use either

更新:弗兰克为我的代码提出了一个很好的观点,一遍又一遍地计算每个组的班次。为避免这种情况,我们可以使用其中之一

shifted <- shift(dt[, last(Product), Group]$V1)
dt[, newcol := shifted[.GRP], by = Group]

so that we don't calculate the shift for every group. Or, we can take Frank's nice suggestion in the comments and do the following.

这样我们就不计算每个组的班次。或者,我们可以在评论中采纳弗兰克的好建议并执行以下操作。

dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v] 

#2


7  

Another way is to save the last group's value in a variable.

另一种方法是将最后一个组的值保存在变量中。

this = NA_character_    # initialize
dt[, LastProductOfPriorGroup:={ last<-this; this<-last(Product); last }, by=Group]
dt
   Product Group LastProductOfPriorGroup
1:       A     1                      NA
2:       B     1                      NA
3:       C     2                       B
4:       D     2                       B
5:       E     2                       B
6:       F     3                       E
7:       G     3                       E

NB: last() is a data.table function which returns the last item of a vector (of the Product column in this case).

注意:last()是一个data.table函数,它返回向量的最后一项(在本例中为Product列)。

This should also be fast since no logic is being invoked to fetch the last group's value; it just relies on the groups running in order (which they do).

这也应该很快,因为没有调用逻辑来获取最后一个组的值;它只依赖于按顺序运行的组(他们这样做)。

#1


14  

You could do

你可以做到

dt[, newcol := shift(dt[, last(Product), by = Group]$V1)[.GRP], by = Group]

This results in the following updated dt, where newcol matches your desired column with the unnecessarily long name. ;)

这会产生以下更新的dt,其中newcol与您想要的列匹配不必要的长名称。 ;)

   Product Group LastProductOfPriorGroup newcol
1:       A     1                      NA     NA
2:       B     1                      NA     NA
3:       C     2                       B      B
4:       D     2                       B      B
5:       E     2                       B      B
6:       F     3                       E      E
7:       G     3                       E      E

Let's break the code down from the inside out. I will use ... to denote the accumulated code:

让我们从内到外打破代码。我将使用...来表示累积的代码:

  • dt[, last(Product), by = Group]$V1 is getting the last values from each group as a character vector.
  • dt [,last(Product),by = Group] $ V1将每组中的最后一个值作为字符向量。

  • shift(...) shifts the character vector in the previous call
  • shift(...)在前一次调用中移动字符向量

  • dt[, newcol := ...[.GRP], by = Group] groups by Group and uses the internal .GRP values for indexing
  • dt [,newcol:= ... [。GRP],by = Group]按组分组并使用内部.GRP值进行索引

Update: Frank brings up a good point about my code above calculating the shift for every group over and over again. To avoid that, we can use either

更新:弗兰克为我的代码提出了一个很好的观点,一遍又一遍地计算每个组的班次。为避免这种情况,我们可以使用其中之一

shifted <- shift(dt[, last(Product), Group]$V1)
dt[, newcol := shifted[.GRP], by = Group]

so that we don't calculate the shift for every group. Or, we can take Frank's nice suggestion in the comments and do the following.

这样我们就不计算每个组的班次。或者,我们可以在评论中采纳弗兰克的好建议并执行以下操作。

dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v] 

#2


7  

Another way is to save the last group's value in a variable.

另一种方法是将最后一个组的值保存在变量中。

this = NA_character_    # initialize
dt[, LastProductOfPriorGroup:={ last<-this; this<-last(Product); last }, by=Group]
dt
   Product Group LastProductOfPriorGroup
1:       A     1                      NA
2:       B     1                      NA
3:       C     2                       B
4:       D     2                       B
5:       E     2                       B
6:       F     3                       E
7:       G     3                       E

NB: last() is a data.table function which returns the last item of a vector (of the Product column in this case).

注意:last()是一个data.table函数,它返回向量的最后一项(在本例中为Product列)。

This should also be fast since no logic is being invoked to fetch the last group's value; it just relies on the groups running in order (which they do).

这也应该很快,因为没有调用逻辑来获取最后一个组的值;它只依赖于按顺序运行的组(他们这样做)。

相关文章