data.table相同组的非连续记录行之间的差异

时间:2022-06-05 21:30:29

I need to calculate the difference between non-consecutive records of a variable, grouped by another. That is, I want to take the last value of the variable in a run and subtract it from the first value in the next run (if there is any).

我需要计算变量的非连续记录之间的差异,按另一个分组。也就是说,我想在运行中获取变量的最后一个值,并从下一次运行中的第一个值中减去它(如果有的话)。

I know I can use rleid along with shift to calculate differences in consecutive rows, but this time I need to get rid of those.

我知道我可以使用rleid和shift来计算连续行中的差异,但这次我需要摆脱它们。

Example data

dput(iris)

structure(list(Sepal.Length = c(4.4, 6.3, 4.6, 5.8, 6.4, 6.5, 
4.9, 5.4, 6.4, 6.7), Sepal.Width = c(3, 2.8, 3.1, 2.7, 2.7, 3, 
3.6, 3.9, 2.8, 3.1), Petal.Length = c(1.3, 5.1, 1.5, 4.1, 5.3, 
5.5, 1.4, 1.7, 5.6, 4.7), Petal.Width = c(0.2, 1.5, 0.2, 1, 1.9, 
1.8, 0.1, 0.4, 2.1, 1.5), Species = c("setosa", "virginica", 
"setosa", "versicolor", "virginica", "virginica", "setosa", "setosa", 
"virginica", "versicolor")), .Names = c("Sepal.Length", "Sepal.Width", 
"Petal.Length", "Petal.Width", "Species"), row.names = c(NA, 
-10L), class = c("data.table", "data.frame"))

library(data.table)
setDT(iris, key = "Sepal.Width")

I thnik something like

我喜欢这样的东西

 iris[, diff(Petal.Width), by = .(Species, !rleid(Species))]

(of course this doesn't work!) is what I need, but can't think of anything to achieve it.

(当然这不起作用!)是我需要的,但却无法想到任何事情来实现它。

Expected result (diffing Petal.Width):

      Species   V1
1: versicolor  0.5
2:  virginica -0.3
3:     setosa  0.0
4:     setosa -0.1

(I achieved it doing iris[, diff(Petal.Width), by = .(Species)] and then hand-picking .Last.Value[, c(1, 4, 5, 6)])

(我实现了它做iris [,diff(Petal.Width),by =。(Species)]然后手工挑选.Last.Value [,c(1,4,5,6)])

2 个解决方案

#1


2  

Well, there's

iris[, .(first(Petal.Width), last(Petal.Width)), by=.(Species, rleid(Species))][, 
  tail(V1 - shift(V2), -1), by=Species]

      Species   V1
1: versicolor  0.5
2:  virginica -0.3
3:     setosa  0.0
4:     setosa -0.1

Or...

iris[, Petal.Width[c(1L, .N)], by=.(Species, rleid(Species))][, {
  v = V1[-c(1L, .N)]
  v[c(TRUE,FALSE)] - v[c(FALSE,TRUE)]
}, by=Species]

      Species   V1
1: versicolor -0.5
2:  virginica  0.3
3:     setosa  0.0
4:     setosa  0.1

#2


0  

After trying several things, I came with this hacky solution. I guess there sould be something "cleaner":

在尝试了几件事之后,我带来了这个hacky解决方案。我猜有些东西“更干净”:

iris[, .(Species, Petal.Width, rl= rleid(Species))][, .(pd= ifelse(diff(rl)>0, diff(Petal.Width), NA)), by = Species][!is.na(pd),]

If there's a function to achieve this in a cleanner way, I appreciate the pointer.

如果有一个以清洁方式实现这一功能的功能,我很欣赏指针。

#1


2  

Well, there's

iris[, .(first(Petal.Width), last(Petal.Width)), by=.(Species, rleid(Species))][, 
  tail(V1 - shift(V2), -1), by=Species]

      Species   V1
1: versicolor  0.5
2:  virginica -0.3
3:     setosa  0.0
4:     setosa -0.1

Or...

iris[, Petal.Width[c(1L, .N)], by=.(Species, rleid(Species))][, {
  v = V1[-c(1L, .N)]
  v[c(TRUE,FALSE)] - v[c(FALSE,TRUE)]
}, by=Species]

      Species   V1
1: versicolor -0.5
2:  virginica  0.3
3:     setosa  0.0
4:     setosa  0.1

#2


0  

After trying several things, I came with this hacky solution. I guess there sould be something "cleaner":

在尝试了几件事之后,我带来了这个hacky解决方案。我猜有些东西“更干净”:

iris[, .(Species, Petal.Width, rl= rleid(Species))][, .(pd= ifelse(diff(rl)>0, diff(Petal.Width), NA)), by = Species][!is.na(pd),]

If there's a function to achieve this in a cleanner way, I appreciate the pointer.

如果有一个以清洁方式实现这一功能的功能,我很欣赏指针。