与dplyr组中的第一组差异

时间:2022-02-16 22:58:00

I'm trying to create a window function with dplyr, that will return a new vector with the difference between each value and the first of its group. For example, given this dataset:

我正在尝试使用dplyr创建一个窗口函数,它将返回一个新的向量,其中每个值与其第一个组之间存在差异。例如,给定此数据集:

dummy <- data.frame(userId=rep(1,6),
     libId=rep(999,6),
     curatorId=c(1:2,1:2,1:2),
     iterationNum=c(0,0,1,1,2,2),
     rf=c(5,10,0,15,30,40)
)

That creates this dataset:

这会创建此数据集:

  userId libId curatorId iterationNum rf
1      1   999         1            0  5
2      1   999         2            0 10
3      1   999         1            1  0
4      1   999         2            1 15
5      1   999         1            2 30
6      1   999         2            2 40

And given this grouping:

鉴于这种分组:

 dummy<-group_by(dummy,libId,userId,curatorId)

Would give this result:

会得到这个结果:

  userId libId curatorId iterationNum   rf   rf.diff
1      1   999         1            0  5    0
2      1   999         2            0 10    0
3      1   999         1            1  0   -5
4      1   999         2            1 15   -5
5      1   999         1            2 30    25
6      1   999         2            2 40    30

So for each group of users, libs and curators, I would get the rf value, minus the rf value with iterationNum=0. I tried playing with the first function, the rank function and others, but couldn't find a way to nail it.

因此,对于每组用户,库和策展人,我会得到rf值,减去rf值,迭代编号为0。我尝试使用第一个功能,等级功能和其他功能,但找不到一种方法来钉它。

---EDIT---

This is what I tried:

这是我试过的:

dummy %>% 
  group_by(userId,libId,curatorId) %>% 
  mutate(rf.diff = rf - subset(dummy,iterationNum==0)[['rf']])

And:

dummy %>% 
  group_by(userId,libId,curatorId) %>% 
  mutate(rf.diff = rf - first(x = rf,order_by=iterationNum))

Which crashes R and returns this error message:

哪个崩溃R并返回此错误消息:

pure virtual method called terminate called after throwing an instance of 'Rcpp::exception' what(): incompatible size (%d), expecting %d (the group size) or 1`

抛出'Rcpp :: exception'实例后调用的名为terminate的纯虚方法what():不兼容的大小(%d),期望%d(组大小)或1

1 个解决方案

#1


5  

The two approaches I commented above are as follows.

我上面评论的两种方法如下。

dummy %>%
  group_by(libId, userId, curatorId) %>%
  mutate(rf.diff = rf - rf[iterationNum == 0])
#Source: local data frame [6 x 6]
#Groups: libId, userId, curatorId
#
#  userId libId curatorId iterationNum rf rf.diff
#1      1   999         1            0  5       0
#2      1   999         2            0 10       0
#3      1   999         1            1  0      -5
#4      1   999         2            1 15       5
#5      1   999         1            2 30      25
#6      1   999         2            2 40      30

Or using arrange to order the data by iterationNum:

或使用安排按iterationNum排序数据:

dummy %>%
  arrange(iterationNum) %>%
  group_by(libId, userId, curatorId) %>%
  mutate(rf.diff = rf - first(rf))
#Source: local data frame [6 x 6]
#Groups: libId, userId, curatorId
#
#  userId libId curatorId iterationNum rf rf.diff
#1      1   999         1            0  5       0
#2      1   999         2            0 10       0
#3      1   999         1            1  0      -5
#4      1   999         2            1 15       5
#5      1   999         1            2 30      25
#6      1   999         2            2 40      30

As you can see, both produce the same output for the sample data.

如您所见,两者都为样本数据生成相同的输出。

#1


5  

The two approaches I commented above are as follows.

我上面评论的两种方法如下。

dummy %>%
  group_by(libId, userId, curatorId) %>%
  mutate(rf.diff = rf - rf[iterationNum == 0])
#Source: local data frame [6 x 6]
#Groups: libId, userId, curatorId
#
#  userId libId curatorId iterationNum rf rf.diff
#1      1   999         1            0  5       0
#2      1   999         2            0 10       0
#3      1   999         1            1  0      -5
#4      1   999         2            1 15       5
#5      1   999         1            2 30      25
#6      1   999         2            2 40      30

Or using arrange to order the data by iterationNum:

或使用安排按iterationNum排序数据:

dummy %>%
  arrange(iterationNum) %>%
  group_by(libId, userId, curatorId) %>%
  mutate(rf.diff = rf - first(rf))
#Source: local data frame [6 x 6]
#Groups: libId, userId, curatorId
#
#  userId libId curatorId iterationNum rf rf.diff
#1      1   999         1            0  5       0
#2      1   999         2            0 10       0
#3      1   999         1            1  0      -5
#4      1   999         2            1 15       5
#5      1   999         1            2 30      25
#6      1   999         2            2 40      30

As you can see, both produce the same output for the sample data.

如您所见,两者都为样本数据生成相同的输出。