I would like to calculate the diff of variables in a data table, grouped by id. Here is some sample data. The data is recorded at a sample rate of 1 Hz. I would like to estimate the first and second derivatives (speed, acceleration)
我想计算数据表中变量的差异,按id分组。这是一些示例数据。以1Hz的采样率记录数据。我想估计一阶和二阶导数(速度,加速度)
df <- read.table(text='x y id
1 2 1
2 4 1
3 5 1
1 8 2
5 2 2
6 3 2',header=TRUE)
dt<-data.table(df)
Expected output
预期产出
# dx dy id
# NA NA 1
# 1 2 1
# 1 1 1
# NA NA 2
# 4 -6 2
# 1 1 2
Here's what I've tried
这是我尝试过的
dx_dt<-dt[, diff:=c(NA,diff(dt[,'x',with=FALSE])),by = id]
Output is
输出是
Error in `[.data.frame`(dt, , `:=`(diff, c(NA, diff(dt[, "x", with = FALSE]))), :
unused argument (by = id)
As pointed out by Akrun, the 'speed' terms (dx, dy) can be obtained using either data table or plyr. However, I'm unable to understand the calculation well enough to extend it to acceleration terms. So, how to calculate the 2nd lag terms?
正如Akrun所指出的,可以使用数据表或plyr获得“速度”项(dx,dy)。但是,我无法很好地理解计算,无法将其扩展到加速项。那么,如何计算第二个滞后项呢?
dt[, c('dx', 'dy'):=lapply(.SD, function(x) c(NA, diff(x))),
+ by=id]
produces
产生
x y id dx dy
1: 1 2 1 NA NA
2: 2 4 1 1 2
3: 3 5 1 1 1
4: 1 8 2 NA NA
5: 5 2 2 4 -6
6: 6 3 2 1 1
How to expand to get a second diff, or the diff of dx, dy?
如何扩展以获得第二个差异,或dx,dy的差异?
x y id dx dy dx2 dy2
1: 1 2 1 NA NA NA NA
2: 2 4 1 1 2 NA NA
3: 3 5 1 1 1 0 -1
4: 1 8 2 NA NA NA NA
5: 5 2 2 4 -6 NA NA
6: 6 3 2 1 1 -3 7
1 个解决方案
#1
1
You can try
你可以试试
setnames(dt[, lapply(.SD, function(x) c(NA,diff(x))), by=id],
2:3, c('dx', 'dy'))[]
# id dx dy
#1: 1 NA NA
#2: 1 1 2
#3: 1 1 1
#4: 2 NA NA
#5: 2 4 -6
#6: 2 1 1
Another option would be to use dplyr
另一种选择是使用dplyr
library(dplyr)
df %>%
group_by(id) %>%
mutate_each(funs(c(NA,diff(.))))%>%
rename(dx=x, dy=y)
Update
You can repeat the step twice
您可以重复该步骤两次
dt[, c('dx', 'dy'):=lapply(.SD, function(x) c(NA, diff(x))), by=id]
dt[,c('dx2', 'dy2'):= lapply(.SD, function(x) c(NA, diff(x))),
by=id, .SDcols=4:5]
dt
# x y id dx dy dx2 dy2
#1: 1 2 1 NA NA NA NA
#2: 2 4 1 1 2 NA NA
#3: 3 5 1 1 1 0 -1
#4: 1 8 2 NA NA NA NA
#5: 5 2 2 4 -6 NA NA
#6: 6 3 2 1 1 -3 7
Or we can use the shift
function from data.table
或者我们可以使用data.table中的shift函数
dt[, paste0("d", c("x", "y")) := .SD - shift(.SD), by = id
][, paste0("d", c("x2", "y2")) := .SD - shift(.SD) , by = id, .SDcols = 4:5 ]
#1
1
You can try
你可以试试
setnames(dt[, lapply(.SD, function(x) c(NA,diff(x))), by=id],
2:3, c('dx', 'dy'))[]
# id dx dy
#1: 1 NA NA
#2: 1 1 2
#3: 1 1 1
#4: 2 NA NA
#5: 2 4 -6
#6: 2 1 1
Another option would be to use dplyr
另一种选择是使用dplyr
library(dplyr)
df %>%
group_by(id) %>%
mutate_each(funs(c(NA,diff(.))))%>%
rename(dx=x, dy=y)
Update
You can repeat the step twice
您可以重复该步骤两次
dt[, c('dx', 'dy'):=lapply(.SD, function(x) c(NA, diff(x))), by=id]
dt[,c('dx2', 'dy2'):= lapply(.SD, function(x) c(NA, diff(x))),
by=id, .SDcols=4:5]
dt
# x y id dx dy dx2 dy2
#1: 1 2 1 NA NA NA NA
#2: 2 4 1 1 2 NA NA
#3: 3 5 1 1 1 0 -1
#4: 1 8 2 NA NA NA NA
#5: 5 2 2 4 -6 NA NA
#6: 6 3 2 1 1 -3 7
Or we can use the shift
function from data.table
或者我们可以使用data.table中的shift函数
dt[, paste0("d", c("x", "y")) := .SD - shift(.SD), by = id
][, paste0("d", c("x2", "y2")) := .SD - shift(.SD) , by = id, .SDcols = 4:5 ]