I want to create a new column in a data.table calculated from the current value of one column and the previous of another. Is it possible to access previous rows?
我想在数据中创建一个新的列。表根据一列的当前值和另一列的前一列计算。是否可以访问先前的行?
E.g.:
例如:
> DT <- data.table(A=1:5, B=1:5*10, C=1:5*100)
> DT
A B C
1: 1 10 100
2: 2 20 200
3: 3 30 300
4: 4 40 400
5: 5 50 500
> DT[, D := C + BPreviousRow] # What is the correct code here?
The correct answer should be
正确的答案应该是
> DT
A B C D
1: 1 10 100 NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540
6 个解决方案
#1
81
With shift()
implemented in v1.9.6, this is quite straightforward.
通过在v1.9.6中实现shift(),这非常简单。
DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]
From NEWS:
从新闻:
- New function
shift()
implements fastlead/lag
of vector, list, data.frames or data.tables. It takes atype
argument which can be either "lag" (default) or "lead". It enables very convenient usage along with:=
orset()
. For example:DT[, (cols) := shift(.SD, 1L), by=id]
. Please have a look at?shift
for more info.- 新函数shift()实现了矢量、列表、数据、帧或表的快速超前/滞后。它采用的类型参数可以是“lag”(默认)或“lead”。它与:=或set()一起使用非常方便。例如:DT[, (cols):= shift(。SD,1 l)= id)。请查看一下?
See history for previous answers.
请参阅历史以了解以前的答案。
#2
19
Several folks have answered the specific question. See the code below for a general purpose function that I use in situations like this that may be helpful. Rather than just getting the prior row, you can go as many rows in the "past" or "future" as you'd like.
有几个人回答了这个具体的问题。请参阅下面的代码,了解我在类似这样的情况下使用的通用函数。不只是获取前一行,您可以在“过去”或“未来”中执行任意多的行。
rowShift <- function(x, shiftLen = 1L) {
r <- (1L + shiftLen):(length(x) + shiftLen)
r[r<1] <- NA
return(x[r])
}
# Create column D by adding column C and the value from the previous row of column B:
DT[, D := C + rowShift(B,-1)]
# Get the Old Faithul eruption length from two events ago, and three events in the future:
as.data.table(faithful)[1:5,list(eruptLengthCurrent=eruptions,
eruptLengthTwoPrior=rowShift(eruptions,-2),
eruptLengthThreeFuture=rowShift(eruptions,3))]
## eruptLengthCurrent eruptLengthTwoPrior eruptLengthThreeFuture
##1: 3.600 NA 2.283
##2: 1.800 NA 4.533
##3: 3.333 3.600 NA
##4: 2.283 1.800 NA
##5: 4.533 3.333 NA
#3
12
Based on @Steve Lianoglou 's comment above, why not just:
基于@Steve Lianoglou的评论,为什么不:
DT[, D:= C + c(NA, B[.I - 1]) ]
# A B C D
# 1: 1 10 100 NA
# 2: 2 20 200 210
# 3: 3 30 300 320
# 4: 4 40 400 430
# 5: 5 50 500 540
And avoid using seq_len
or head
or any other function.
避免使用seq_len或head或其他函数。
#4
12
Using dplyr
you could do:
使用dplyr你可以做到:
mutate(DT, D = lag(B) + C)
Which gives:
这使:
# A B C D
#1: 1 10 100 NA
#2: 2 20 200 210
#3: 3 30 300 320
#4: 4 40 400 430
#5: 5 50 500 540
#5
9
Following Arun's solution, a similar results can be obtained without referring to to .N
根据阿伦的解决方案,可以不参照。n得到类似的结果
> DT[, D := C + c(NA, head(B, -1))][]
A B C D
1: 1 10 100 NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540
#6
1
I added a padding argument and changed some names and called it shift
. https://github.com/geneorama/geneorama/blob/master/R/shift.R
我添加了一个填充参数,并更改了一些名称,并将其命名为shift。https://github.com/geneorama/geneorama/blob/master/R/shift.R
#1
81
With shift()
implemented in v1.9.6, this is quite straightforward.
通过在v1.9.6中实现shift(),这非常简单。
DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]
From NEWS:
从新闻:
- New function
shift()
implements fastlead/lag
of vector, list, data.frames or data.tables. It takes atype
argument which can be either "lag" (default) or "lead". It enables very convenient usage along with:=
orset()
. For example:DT[, (cols) := shift(.SD, 1L), by=id]
. Please have a look at?shift
for more info.- 新函数shift()实现了矢量、列表、数据、帧或表的快速超前/滞后。它采用的类型参数可以是“lag”(默认)或“lead”。它与:=或set()一起使用非常方便。例如:DT[, (cols):= shift(。SD,1 l)= id)。请查看一下?
See history for previous answers.
请参阅历史以了解以前的答案。
#2
19
Several folks have answered the specific question. See the code below for a general purpose function that I use in situations like this that may be helpful. Rather than just getting the prior row, you can go as many rows in the "past" or "future" as you'd like.
有几个人回答了这个具体的问题。请参阅下面的代码,了解我在类似这样的情况下使用的通用函数。不只是获取前一行,您可以在“过去”或“未来”中执行任意多的行。
rowShift <- function(x, shiftLen = 1L) {
r <- (1L + shiftLen):(length(x) + shiftLen)
r[r<1] <- NA
return(x[r])
}
# Create column D by adding column C and the value from the previous row of column B:
DT[, D := C + rowShift(B,-1)]
# Get the Old Faithul eruption length from two events ago, and three events in the future:
as.data.table(faithful)[1:5,list(eruptLengthCurrent=eruptions,
eruptLengthTwoPrior=rowShift(eruptions,-2),
eruptLengthThreeFuture=rowShift(eruptions,3))]
## eruptLengthCurrent eruptLengthTwoPrior eruptLengthThreeFuture
##1: 3.600 NA 2.283
##2: 1.800 NA 4.533
##3: 3.333 3.600 NA
##4: 2.283 1.800 NA
##5: 4.533 3.333 NA
#3
12
Based on @Steve Lianoglou 's comment above, why not just:
基于@Steve Lianoglou的评论,为什么不:
DT[, D:= C + c(NA, B[.I - 1]) ]
# A B C D
# 1: 1 10 100 NA
# 2: 2 20 200 210
# 3: 3 30 300 320
# 4: 4 40 400 430
# 5: 5 50 500 540
And avoid using seq_len
or head
or any other function.
避免使用seq_len或head或其他函数。
#4
12
Using dplyr
you could do:
使用dplyr你可以做到:
mutate(DT, D = lag(B) + C)
Which gives:
这使:
# A B C D
#1: 1 10 100 NA
#2: 2 20 200 210
#3: 3 30 300 320
#4: 4 40 400 430
#5: 5 50 500 540
#5
9
Following Arun's solution, a similar results can be obtained without referring to to .N
根据阿伦的解决方案,可以不参照。n得到类似的结果
> DT[, D := C + c(NA, head(B, -1))][]
A B C D
1: 1 10 100 NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540
#6
1
I added a padding argument and changed some names and called it shift
. https://github.com/geneorama/geneorama/blob/master/R/shift.R
我添加了一个填充参数,并更改了一些名称,并将其命名为shift。https://github.com/geneorama/geneorama/blob/master/R/shift.R