R:如何按数据表的组计算多列的滞后

时间:2021-05-07 16:20:48

I would like to calculate the diff of variables in a data table, grouped by id. Here is some sample data. The data is recorded at a sample rate of 1 Hz. I would like to estimate the first and second derivatives (speed, acceleration)

我想计算数据表中变量的差异,按id分组。这是一些示例数据。以1Hz的采样率记录数据。我想估计一阶和二阶导数(速度,加速度)

df <- read.table(text='x y id
                 1 2 1
                 2 4 1
                 3 5 1
                 1 8 2
                 5 2 2
                 6 3 2',header=TRUE)
dt<-data.table(df)

Expected output

预期产出

# dx dy id
# NA NA 1
# 1  2  1
# 1  1  1
# NA NA 2
# 4  -6  2
# 1 1    2

Here's what I've tried

这是我尝试过的

dx_dt<-dt[, diff:=c(NA,diff(dt[,'x',with=FALSE])),by = id]

Output is

输出是

Error in `[.data.frame`(dt, , `:=`(diff, c(NA, diff(dt[, "x", with = FALSE]))),  : 
  unused argument (by = id)

As pointed out by Akrun, the 'speed' terms (dx, dy) can be obtained using either data table or plyr. However, I'm unable to understand the calculation well enough to extend it to acceleration terms. So, how to calculate the 2nd lag terms?

正如Akrun所指出的,可以使用数据表或plyr获得“速度”项(dx,dy)。但是,我无法很好地理解计算,无法将其扩展到加速项。那么,如何计算第二个滞后项呢?

dt[, c('dx', 'dy'):=lapply(.SD, function(x) c(NA, diff(x))),
+ by=id]

produces

产生

   x y id dx dy
1: 1 2  1 NA NA
2: 2 4  1  1  2
3: 3 5  1  1  1
4: 1 8  2 NA NA
5: 5 2  2  4 -6
6: 6 3  2  1  1

How to expand to get a second diff, or the diff of dx, dy?

如何扩展以获得第二个差异,或dx,dy的差异?

   x y id dx dy  dx2  dy2
1: 1 2  1 NA NA   NA   NA
2: 2 4  1  1  2   NA   NA
3: 3 5  1  1  1    0   -1
4: 1 8  2 NA NA   NA   NA
5: 5 2  2  4 -6   NA   NA
6: 6 3  2  1  1   -3    7

1 个解决方案

#1


1  

You can try

你可以试试

 setnames(dt[, lapply(.SD, function(x) c(NA,diff(x))), by=id], 
                2:3, c('dx', 'dy'))[]
 #    id dx dy
  #1:  1 NA NA
  #2:  1  1  2
  #3:  1  1  1
  #4:  2 NA NA
  #5:  2  4 -6
  #6:  2  1  1

Another option would be to use dplyr

另一种选择是使用dplyr

 library(dplyr)
 df %>% 
     group_by(id) %>%
     mutate_each(funs(c(NA,diff(.))))%>%
     rename(dx=x, dy=y)

Update

You can repeat the step twice

您可以重复该步骤两次

dt[, c('dx', 'dy'):=lapply(.SD, function(x) c(NA, diff(x))), by=id]
dt[,c('dx2', 'dy2'):= lapply(.SD, function(x) c(NA, diff(x))),
                                            by=id, .SDcols=4:5]
 dt
 #   x y id dx dy dx2 dy2
 #1: 1 2  1 NA NA  NA  NA
 #2: 2 4  1  1  2  NA  NA
 #3: 3 5  1  1  1   0  -1
 #4: 1 8  2 NA NA  NA  NA
 #5: 5 2  2  4 -6  NA  NA
 #6: 6 3  2  1  1  -3   7

Or we can use the shift function from data.table

或者我们可以使用data.table中的shift函数

dt[, paste0("d", c("x", "y")) := .SD - shift(.SD), by = id
  ][, paste0("d", c("x2", "y2")) := .SD - shift(.SD) , by =  id, .SDcols = 4:5 ]

#1


1  

You can try

你可以试试

 setnames(dt[, lapply(.SD, function(x) c(NA,diff(x))), by=id], 
                2:3, c('dx', 'dy'))[]
 #    id dx dy
  #1:  1 NA NA
  #2:  1  1  2
  #3:  1  1  1
  #4:  2 NA NA
  #5:  2  4 -6
  #6:  2  1  1

Another option would be to use dplyr

另一种选择是使用dplyr

 library(dplyr)
 df %>% 
     group_by(id) %>%
     mutate_each(funs(c(NA,diff(.))))%>%
     rename(dx=x, dy=y)

Update

You can repeat the step twice

您可以重复该步骤两次

dt[, c('dx', 'dy'):=lapply(.SD, function(x) c(NA, diff(x))), by=id]
dt[,c('dx2', 'dy2'):= lapply(.SD, function(x) c(NA, diff(x))),
                                            by=id, .SDcols=4:5]
 dt
 #   x y id dx dy dx2 dy2
 #1: 1 2  1 NA NA  NA  NA
 #2: 2 4  1  1  2  NA  NA
 #3: 3 5  1  1  1   0  -1
 #4: 1 8  2 NA NA  NA  NA
 #5: 5 2  2  4 -6  NA  NA
 #6: 6 3  2  1  1  -3   7

Or we can use the shift function from data.table

或者我们可以使用data.table中的shift函数

dt[, paste0("d", c("x", "y")) := .SD - shift(.SD), by = id
  ][, paste0("d", c("x2", "y2")) := .SD - shift(.SD) , by =  id, .SDcols = 4:5 ]