使用数据。用于标记组中的第一个(或最后一个)记录的表

时间:2022-02-11 12:28:29

Given a sortkey, is there a data.table shortcut to duplicate the first and last functionalities found in SAS and SPSS ?

给定一个sortkey,是否有数据。表快捷方式来复制在SAS和SPSS中找到的第一个和最后一个功能?

The pedestrian approach below flags the first record of a group.

下面的行人通道标志着一个组的第一个记录。

Given the elegance of data.table (with which I'm slowly getting familiar), I'm assuming there's a shortcut using a self join & mult, but I'm still trying to figure it out.

考虑到数据的优雅性。表(我慢慢地熟悉了它),我假设有一个使用self join & mult的快捷方式,但我仍在尝试找出它。

Here's the example:

这是例子:

require(data.table)

set.seed(123)
n <- 17
DT <- data.table(x=sample(letters[1:3],n,replace=T),
                 y=sample(LETTERS[1:3],n,replace=T))
sortkey  <- c("x","y")
setkeyv(DT,sortkey)
key <- paste(DT$x,DT$y,sep="-")
nw <- c( T , key[2:n]!=key[1:(n-1)] )
DT$first <- 1*nw
DT

3 个解决方案

#1


21  

Here are couple of solutions using data.table:

下面是一些使用数据的解决方案。

## Option 1 (cleaner solution, added 2016-11-29)
uDT <- unique(DT)
DT[, c("first","last"):=0L]
DT[uDT, first:=1L, mult="first"]
DT[uDT, last:=1L, mult="last"]


## Option 2 (original answer, retained for posterity)
DT <- cbind(DT, first=0L, last=0L)
DT[DT[unique(DT),,mult="first", which=TRUE], first:=1L]
DT[DT[unique(DT),,mult="last", which=TRUE], last:=1L]

head(DT)
#      x y first last
# [1,] a A     1    1
# [2,] a B     1    1
# [3,] a C     1    0
# [4,] a C     0    1
# [5,] b A     1    1
# [6,] b B     1    1

There's obviously a lot packed into each of those lines. The key construct, though, is the following, which returns the row index of the first record in each group:

很明显,每一行都有很多内容。不过,关键构造如下,它返回每个组中第一个记录的行索引:

DT[unique(DT),,mult="first", which=TRUE]
# [1]  1  2  3  5  6  7 11 13 15

#2


8  

One easy way is to use the duplicated() function. When applied to a data-frame, it produces a vector where an entry is TRUE if and only if the row value combination has not occurred before, when moving down the data-frame.

一个简单的方法是使用duplicate()函数。当应用到数据帧时,它生成一个向量,其中当且仅当行值组合之前没有发生时,当向下移动数据帧时,条目为真。

DT$first <- !duplicated( DT[, list(x,y) ])                                                                                                                                                                                                                                    
DT$last <- rev(!duplicated( DT[, list(rev(x),rev(y)) ]))                                                                                                                                                                                                                      

> DT                                                                                                                                                                                                                                                                         
       x y first  last                                                                                                                                                                                                                                                        
  [1,] a A  TRUE  TRUE                                                                                                                                                                                                                                                        
  [2,] a B  TRUE  TRUE                                                                                                                                                                                                                                                        
  [3,] a C  TRUE FALSE                                                                                                                                                                                                                                                        
  [4,] a C FALSE  TRUE                                                                                                                                                                                                                                                        
  [5,] b A  TRUE  TRUE                                                                                                                                                                                                                                                        
  [6,] b B  TRUE  TRUE                                                                                                                                                                                                                                                        
  [7,] b C  TRUE FALSE                                                                                                                                                                                                                                                        
  [8,] b C FALSE FALSE                                                                                                                                                                                                                                                        
  [9,] b C FALSE FALSE                                                                                                                                                                                                                                                        
 [10,] b C FALSE  TRUE                                                                                                                                                                                                                                                        
 [11,] c A  TRUE FALSE                                                                                                                                                                                                                                                        
 [12,] c A FALSE  TRUE                                                                                                                                                                                                                                                        
 [13,] c B  TRUE FALSE                                                                                                                                                                                                                                                        
 [14,] c B FALSE  TRUE                                                                                                                                                                                                                                                        
 [15,] c C  TRUE FALSE                                                                                                                                                                                                                                                        
 [16,] c C FALSE FALSE                                                                                                                                                                                                                                                        
 [17,] c C FALSE  TRUE            

Another way without using duplicated() is:

另一种不使用重复的方法是:

DT[ unique(DT), list(first = c(1, rep(0,length(y)-1)),                                                                                                                                                                                                                        
                     last =  c(rep(0,length(y)-1),1 )) ]      

      x y  first last                                                                                                                                                                                                                                                   
  [1,] a A     1    1                                                                                                                                                                                                                                                         
  [2,] a B     1    1                                                                                                                                                                                                                                                         
  [3,] a C     1    0                                                                                                                                                                                                                                                         
  [4,] a C     0    1                                                                                                                                                                                                                                                         
  [5,] b A     1    1                                                                                                                                                                                                                                                         
  [6,] b B     1    1                                                                                                                                                                                                                                                         
  [7,] b C     1    0                                                                                                                                                                                                                                                         
  [8,] b C     0    0                                                                                                                                                                                                                                                         
  [9,] b C     0    0                                                                                                                                                                                                                                                         
 [10,] b C     0    1                                                                                                                                                                                                                                                         
 [11,] c A     1    0                                                                                                                                                                                                                                                         
 [12,] c A     0    1                                                                                                                                                                                                                                                         
 [13,] c B     1    0                                                                                                                                                                                                                                                         
 [14,] c B     0    1                                                                                                                                                                                                                                                         
 [15,] c C     1    0                                                                                                                                                                                                                                                         
 [16,] c C     0    0                                                                                                                                                                                                                                                         
 [17,] c C     0    1          

#3


1  

A simpler way than Josh may be

比Josh更简单的方法

unique(DT)
unique(DT,fromLast=TRUE]

#1


21  

Here are couple of solutions using data.table:

下面是一些使用数据的解决方案。

## Option 1 (cleaner solution, added 2016-11-29)
uDT <- unique(DT)
DT[, c("first","last"):=0L]
DT[uDT, first:=1L, mult="first"]
DT[uDT, last:=1L, mult="last"]


## Option 2 (original answer, retained for posterity)
DT <- cbind(DT, first=0L, last=0L)
DT[DT[unique(DT),,mult="first", which=TRUE], first:=1L]
DT[DT[unique(DT),,mult="last", which=TRUE], last:=1L]

head(DT)
#      x y first last
# [1,] a A     1    1
# [2,] a B     1    1
# [3,] a C     1    0
# [4,] a C     0    1
# [5,] b A     1    1
# [6,] b B     1    1

There's obviously a lot packed into each of those lines. The key construct, though, is the following, which returns the row index of the first record in each group:

很明显,每一行都有很多内容。不过,关键构造如下,它返回每个组中第一个记录的行索引:

DT[unique(DT),,mult="first", which=TRUE]
# [1]  1  2  3  5  6  7 11 13 15

#2


8  

One easy way is to use the duplicated() function. When applied to a data-frame, it produces a vector where an entry is TRUE if and only if the row value combination has not occurred before, when moving down the data-frame.

一个简单的方法是使用duplicate()函数。当应用到数据帧时,它生成一个向量,其中当且仅当行值组合之前没有发生时,当向下移动数据帧时,条目为真。

DT$first <- !duplicated( DT[, list(x,y) ])                                                                                                                                                                                                                                    
DT$last <- rev(!duplicated( DT[, list(rev(x),rev(y)) ]))                                                                                                                                                                                                                      

> DT                                                                                                                                                                                                                                                                         
       x y first  last                                                                                                                                                                                                                                                        
  [1,] a A  TRUE  TRUE                                                                                                                                                                                                                                                        
  [2,] a B  TRUE  TRUE                                                                                                                                                                                                                                                        
  [3,] a C  TRUE FALSE                                                                                                                                                                                                                                                        
  [4,] a C FALSE  TRUE                                                                                                                                                                                                                                                        
  [5,] b A  TRUE  TRUE                                                                                                                                                                                                                                                        
  [6,] b B  TRUE  TRUE                                                                                                                                                                                                                                                        
  [7,] b C  TRUE FALSE                                                                                                                                                                                                                                                        
  [8,] b C FALSE FALSE                                                                                                                                                                                                                                                        
  [9,] b C FALSE FALSE                                                                                                                                                                                                                                                        
 [10,] b C FALSE  TRUE                                                                                                                                                                                                                                                        
 [11,] c A  TRUE FALSE                                                                                                                                                                                                                                                        
 [12,] c A FALSE  TRUE                                                                                                                                                                                                                                                        
 [13,] c B  TRUE FALSE                                                                                                                                                                                                                                                        
 [14,] c B FALSE  TRUE                                                                                                                                                                                                                                                        
 [15,] c C  TRUE FALSE                                                                                                                                                                                                                                                        
 [16,] c C FALSE FALSE                                                                                                                                                                                                                                                        
 [17,] c C FALSE  TRUE            

Another way without using duplicated() is:

另一种不使用重复的方法是:

DT[ unique(DT), list(first = c(1, rep(0,length(y)-1)),                                                                                                                                                                                                                        
                     last =  c(rep(0,length(y)-1),1 )) ]      

      x y  first last                                                                                                                                                                                                                                                   
  [1,] a A     1    1                                                                                                                                                                                                                                                         
  [2,] a B     1    1                                                                                                                                                                                                                                                         
  [3,] a C     1    0                                                                                                                                                                                                                                                         
  [4,] a C     0    1                                                                                                                                                                                                                                                         
  [5,] b A     1    1                                                                                                                                                                                                                                                         
  [6,] b B     1    1                                                                                                                                                                                                                                                         
  [7,] b C     1    0                                                                                                                                                                                                                                                         
  [8,] b C     0    0                                                                                                                                                                                                                                                         
  [9,] b C     0    0                                                                                                                                                                                                                                                         
 [10,] b C     0    1                                                                                                                                                                                                                                                         
 [11,] c A     1    0                                                                                                                                                                                                                                                         
 [12,] c A     0    1                                                                                                                                                                                                                                                         
 [13,] c B     1    0                                                                                                                                                                                                                                                         
 [14,] c B     0    1                                                                                                                                                                                                                                                         
 [15,] c C     1    0                                                                                                                                                                                                                                                         
 [16,] c C     0    0                                                                                                                                                                                                                                                         
 [17,] c C     0    1          

#3


1  

A simpler way than Josh may be

比Josh更简单的方法

unique(DT)
unique(DT,fromLast=TRUE]