在R数据中使用前一行的值。表计算

时间:2020-12-16 22:47:38

I want to create a new column in a data.table calculated from the current value of one column and the previous of another. Is it possible to access previous rows?

我想在数据中创建一个新的列。表根据一列的当前值和另一列的前一列计算。是否可以访问先前的行?

E.g.:

例如:

> DT <- data.table(A=1:5, B=1:5*10, C=1:5*100)
> DT
   A  B   C
1: 1 10 100
2: 2 20 200
3: 3 30 300
4: 4 40 400
5: 5 50 500
> DT[, D := C + BPreviousRow] # What is the correct code here?

The correct answer should be

正确的答案应该是

> DT
   A  B   C   D
1: 1 10 100  NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540

6 个解决方案

#1


81  

With shift() implemented in v1.9.6, this is quite straightforward.

通过在v1.9.6中实现shift(),这非常简单。

DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]

From NEWS:

从新闻:

  1. New function shift() implements fast lead/lag of vector, list, data.frames or data.tables. It takes a type argument which can be either "lag" (default) or "lead". It enables very convenient usage along with := or set(). For example: DT[, (cols) := shift(.SD, 1L), by=id]. Please have a look at ?shift for more info.
  2. 新函数shift()实现了矢量、列表、数据、帧或表的快速超前/滞后。它采用的类型参数可以是“lag”(默认)或“lead”。它与:=或set()一起使用非常方便。例如:DT[, (cols):= shift(。SD,1 l)= id)。请查看一下?

See history for previous answers.

请参阅历史以了解以前的答案。

#2


19  

Several folks have answered the specific question. See the code below for a general purpose function that I use in situations like this that may be helpful. Rather than just getting the prior row, you can go as many rows in the "past" or "future" as you'd like.

有几个人回答了这个具体的问题。请参阅下面的代码,了解我在类似这样的情况下使用的通用函数。不只是获取前一行,您可以在“过去”或“未来”中执行任意多的行。

rowShift <- function(x, shiftLen = 1L) {
  r <- (1L + shiftLen):(length(x) + shiftLen)
  r[r<1] <- NA
  return(x[r])
}

# Create column D by adding column C and the value from the previous row of column B:
DT[, D := C + rowShift(B,-1)]

# Get the Old Faithul eruption length from two events ago, and three events in the future:
as.data.table(faithful)[1:5,list(eruptLengthCurrent=eruptions,
                                 eruptLengthTwoPrior=rowShift(eruptions,-2), 
                                 eruptLengthThreeFuture=rowShift(eruptions,3))]
##   eruptLengthCurrent eruptLengthTwoPrior eruptLengthThreeFuture
##1:              3.600                  NA                  2.283
##2:              1.800                  NA                  4.533
##3:              3.333               3.600                     NA
##4:              2.283               1.800                     NA
##5:              4.533               3.333                     NA

#3


12  

Based on @Steve Lianoglou 's comment above, why not just:

基于@Steve Lianoglou的评论,为什么不:

DT[, D:= C + c(NA, B[.I - 1]) ]
#    A  B   C   D
# 1: 1 10 100  NA
# 2: 2 20 200 210
# 3: 3 30 300 320
# 4: 4 40 400 430
# 5: 5 50 500 540

And avoid using seq_len or head or any other function.

避免使用seq_len或head或其他函数。

#4


12  

Using dplyr you could do:

使用dplyr你可以做到:

mutate(DT, D = lag(B) + C)

Which gives:

这使:

#   A  B   C   D
#1: 1 10 100  NA
#2: 2 20 200 210
#3: 3 30 300 320
#4: 4 40 400 430
#5: 5 50 500 540

#5


9  

Following Arun's solution, a similar results can be obtained without referring to to .N

根据阿伦的解决方案,可以不参照。n得到类似的结果

> DT[, D := C + c(NA, head(B, -1))][]
   A  B   C   D
1: 1 10 100  NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540

#6


1  

I added a padding argument and changed some names and called it shift. https://github.com/geneorama/geneorama/blob/master/R/shift.R

我添加了一个填充参数,并更改了一些名称,并将其命名为shift。https://github.com/geneorama/geneorama/blob/master/R/shift.R

#1


81  

With shift() implemented in v1.9.6, this is quite straightforward.

通过在v1.9.6中实现shift(),这非常简单。

DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]

From NEWS:

从新闻:

  1. New function shift() implements fast lead/lag of vector, list, data.frames or data.tables. It takes a type argument which can be either "lag" (default) or "lead". It enables very convenient usage along with := or set(). For example: DT[, (cols) := shift(.SD, 1L), by=id]. Please have a look at ?shift for more info.
  2. 新函数shift()实现了矢量、列表、数据、帧或表的快速超前/滞后。它采用的类型参数可以是“lag”(默认)或“lead”。它与:=或set()一起使用非常方便。例如:DT[, (cols):= shift(。SD,1 l)= id)。请查看一下?

See history for previous answers.

请参阅历史以了解以前的答案。

#2


19  

Several folks have answered the specific question. See the code below for a general purpose function that I use in situations like this that may be helpful. Rather than just getting the prior row, you can go as many rows in the "past" or "future" as you'd like.

有几个人回答了这个具体的问题。请参阅下面的代码,了解我在类似这样的情况下使用的通用函数。不只是获取前一行,您可以在“过去”或“未来”中执行任意多的行。

rowShift <- function(x, shiftLen = 1L) {
  r <- (1L + shiftLen):(length(x) + shiftLen)
  r[r<1] <- NA
  return(x[r])
}

# Create column D by adding column C and the value from the previous row of column B:
DT[, D := C + rowShift(B,-1)]

# Get the Old Faithul eruption length from two events ago, and three events in the future:
as.data.table(faithful)[1:5,list(eruptLengthCurrent=eruptions,
                                 eruptLengthTwoPrior=rowShift(eruptions,-2), 
                                 eruptLengthThreeFuture=rowShift(eruptions,3))]
##   eruptLengthCurrent eruptLengthTwoPrior eruptLengthThreeFuture
##1:              3.600                  NA                  2.283
##2:              1.800                  NA                  4.533
##3:              3.333               3.600                     NA
##4:              2.283               1.800                     NA
##5:              4.533               3.333                     NA

#3


12  

Based on @Steve Lianoglou 's comment above, why not just:

基于@Steve Lianoglou的评论,为什么不:

DT[, D:= C + c(NA, B[.I - 1]) ]
#    A  B   C   D
# 1: 1 10 100  NA
# 2: 2 20 200 210
# 3: 3 30 300 320
# 4: 4 40 400 430
# 5: 5 50 500 540

And avoid using seq_len or head or any other function.

避免使用seq_len或head或其他函数。

#4


12  

Using dplyr you could do:

使用dplyr你可以做到:

mutate(DT, D = lag(B) + C)

Which gives:

这使:

#   A  B   C   D
#1: 1 10 100  NA
#2: 2 20 200 210
#3: 3 30 300 320
#4: 4 40 400 430
#5: 5 50 500 540

#5


9  

Following Arun's solution, a similar results can be obtained without referring to to .N

根据阿伦的解决方案,可以不参照。n得到类似的结果

> DT[, D := C + c(NA, head(B, -1))][]
   A  B   C   D
1: 1 10 100  NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540

#6


1  

I added a padding argument and changed some names and called it shift. https://github.com/geneorama/geneorama/blob/master/R/shift.R

我添加了一个填充参数,并更改了一些名称,并将其命名为shift。https://github.com/geneorama/geneorama/blob/master/R/shift.R