This is what my data table looks like:
这就是我的数据表的样子:
library(data.table)
dt <- fread('
Product Group LastProductOfPriorGroup
A 1 NA
B 1 NA
C 2 B
D 2 B
E 2 B
F 3 E
G 3 E
')
The LastProductOfPriorGroup
column is my desired column. I am trying to fetch the product from last row of the prior group. So in the first two rows, there are no prior groups and therefore it is NA
. In the third row, the product in the last row of the prior group 1 is B
. I am trying to accomplish this by
LastProductOfPriorGroup列是我想要的列。我试图从前一组的最后一行获取产品。所以在前两行中,没有先前的组,因此它是NA。在第三行中,前一组1的最后一行中的产品是B.我试图通过以下方式实现此目的
dt[,LastGroupProduct:= shift(Product,1), by=shift(Group,1)]
to no avail.
无济于事。
2 个解决方案
#1
14
You could do
你可以做到
dt[, newcol := shift(dt[, last(Product), by = Group]$V1)[.GRP], by = Group]
This results in the following updated dt
, where newcol
matches your desired column with the unnecessarily long name. ;)
这会产生以下更新的dt,其中newcol与您想要的列匹配不必要的长名称。 ;)
Product Group LastProductOfPriorGroup newcol
1: A 1 NA NA
2: B 1 NA NA
3: C 2 B B
4: D 2 B B
5: E 2 B B
6: F 3 E E
7: G 3 E E
Let's break the code down from the inside out. I will use ...
to denote the accumulated code:
让我们从内到外打破代码。我将使用...来表示累积的代码:
-
dt[, last(Product), by = Group]$V1
is getting the last values from each group as a character vector. -
shift(...)
shifts the character vector in the previous call -
dt[, newcol := ...[.GRP], by = Group]
groups byGroup
and uses the internal.GRP
values for indexing
dt [,last(Product),by = Group] $ V1将每组中的最后一个值作为字符向量。
shift(...)在前一次调用中移动字符向量
dt [,newcol:= ... [。GRP],by = Group]按组分组并使用内部.GRP值进行索引
Update: Frank brings up a good point about my code above calculating the shift for every group over and over again. To avoid that, we can use either
更新:弗兰克为我的代码提出了一个很好的观点,一遍又一遍地计算每个组的班次。为避免这种情况,我们可以使用其中之一
shifted <- shift(dt[, last(Product), Group]$V1)
dt[, newcol := shifted[.GRP], by = Group]
so that we don't calculate the shift for every group. Or, we can take Frank's nice suggestion in the comments and do the following.
这样我们就不计算每个组的班次。或者,我们可以在评论中采纳弗兰克的好建议并执行以下操作。
dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v]
#2
7
Another way is to save the last group's value in a variable.
另一种方法是将最后一个组的值保存在变量中。
this = NA_character_ # initialize
dt[, LastProductOfPriorGroup:={ last<-this; this<-last(Product); last }, by=Group]
dt
Product Group LastProductOfPriorGroup
1: A 1 NA
2: B 1 NA
3: C 2 B
4: D 2 B
5: E 2 B
6: F 3 E
7: G 3 E
NB: last()
is a data.table
function which returns the last item of a vector (of the Product column in this case).
注意:last()是一个data.table函数,它返回向量的最后一项(在本例中为Product列)。
This should also be fast since no logic is being invoked to fetch the last group's value; it just relies on the groups running in order (which they do).
这也应该很快,因为没有调用逻辑来获取最后一个组的值;它只依赖于按顺序运行的组(他们这样做)。
#1
14
You could do
你可以做到
dt[, newcol := shift(dt[, last(Product), by = Group]$V1)[.GRP], by = Group]
This results in the following updated dt
, where newcol
matches your desired column with the unnecessarily long name. ;)
这会产生以下更新的dt,其中newcol与您想要的列匹配不必要的长名称。 ;)
Product Group LastProductOfPriorGroup newcol
1: A 1 NA NA
2: B 1 NA NA
3: C 2 B B
4: D 2 B B
5: E 2 B B
6: F 3 E E
7: G 3 E E
Let's break the code down from the inside out. I will use ...
to denote the accumulated code:
让我们从内到外打破代码。我将使用...来表示累积的代码:
-
dt[, last(Product), by = Group]$V1
is getting the last values from each group as a character vector. -
shift(...)
shifts the character vector in the previous call -
dt[, newcol := ...[.GRP], by = Group]
groups byGroup
and uses the internal.GRP
values for indexing
dt [,last(Product),by = Group] $ V1将每组中的最后一个值作为字符向量。
shift(...)在前一次调用中移动字符向量
dt [,newcol:= ... [。GRP],by = Group]按组分组并使用内部.GRP值进行索引
Update: Frank brings up a good point about my code above calculating the shift for every group over and over again. To avoid that, we can use either
更新:弗兰克为我的代码提出了一个很好的观点,一遍又一遍地计算每个组的班次。为避免这种情况,我们可以使用其中之一
shifted <- shift(dt[, last(Product), Group]$V1)
dt[, newcol := shifted[.GRP], by = Group]
so that we don't calculate the shift for every group. Or, we can take Frank's nice suggestion in the comments and do the following.
这样我们就不计算每个组的班次。或者,我们可以在评论中采纳弗兰克的好建议并执行以下操作。
dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v]
#2
7
Another way is to save the last group's value in a variable.
另一种方法是将最后一个组的值保存在变量中。
this = NA_character_ # initialize
dt[, LastProductOfPriorGroup:={ last<-this; this<-last(Product); last }, by=Group]
dt
Product Group LastProductOfPriorGroup
1: A 1 NA
2: B 1 NA
3: C 2 B
4: D 2 B
5: E 2 B
6: F 3 E
7: G 3 E
NB: last()
is a data.table
function which returns the last item of a vector (of the Product column in this case).
注意:last()是一个data.table函数,它返回向量的最后一项(在本例中为Product列)。
This should also be fast since no logic is being invoked to fetch the last group's value; it just relies on the groups running in order (which they do).
这也应该很快,因为没有调用逻辑来获取最后一个组的值;它只依赖于按顺序运行的组(他们这样做)。