R:data.table,将组的第一个和最后一个值设置为NA

时间:2021-09-29 22:47:51

I would like to set the first and the last value in a group to NA. Here is an example:

我想将组中的第一个和最后一个值设置为NA。这是一个例子:

DT <- data.table(v = rnorm(12), class=rep(1:3, each=4))
DT[, v[c(1,.N)] := NA , by=class]

But this is not working. How can I do it?

但这不起作用。我该怎么做?

4 个解决方案

#1


9  

At the moment, the way to go about this would be to first extract the indices, and then do one assignment by reference.

目前,解决这个问题的方法是首先提取索引,然后通过引用进行一次分配。

idx = DT[, .(idx = .I[c(1L, .N)]), by=class]$idx
DT[idx, v := NA]

I'll try and add this example to the Reference semantics vignette.

我将尝试将此示例添加到Reference语义晕影中。

#2


3  

This may not be a one-liner, but it does have 'first' and 'last' in the code :)

这可能不是一个单行,但它在代码中确实有“第一个”和“最后一个”:)

> DT <- data.table(v = rnorm(12), class=rep(1:3, each=4))
> setkey(DT, class)
> classes = DT[, .(unique(class))]
> DT[classes, v := NA, mult='first']
> DT[classes, v := NA, mult='last']
> DT
          v class
 1:      NA     1
 2: -1.8191     1
 3: -0.6355     1
 4:      NA     1
 5:      NA     2
 6: -1.1771     2
 7: -0.8125     2
 8:      NA     2
 9:      NA     3
10:  0.2357     3
11:  0.3416     3
12:      NA     3
> 

Order is also preserved for the non-key columns. I think that is a documented (committed to) feature.

还为非键列保留顺序。我认为这是一个记录(承诺)功能。

#3


1  

With a helper function it's easy

使用辅助功能很容易

set.na = function(x,y) {x[y] = NA; x}
DT[, set.na(v,c(1,.N)) , by=class]

#4


0  

The canonical way to modify subsets of the data is to use i to define the subset. You cannot use [ together with :=. Either create a temporary i as suggested by @David Arenburg or you can create the outcome vector yourself using a c(NA, v[-c(1, .N)], NA) construction.

修改数据子集的规范方法是使用i来定义子集。你不能使用[with with:=。根据@David Arenburg的建议创建临时i,或者您可以使用c(NA,v [-c(1,.N)],NA)构造自己创建结果向量。

DT[, v := c(NA, v[-c(1, .N)], NA)[1:.N], by = class]

However, you should also note that the row order can change when you e.g. set a new key or use any number of functions. So you should be very careful with this operation.

但是,您还应该注意,例如,您可以更改行顺序。设置新密钥或使用任意数量的功能。所以你应该非常小心这个操作。

#1


9  

At the moment, the way to go about this would be to first extract the indices, and then do one assignment by reference.

目前,解决这个问题的方法是首先提取索引,然后通过引用进行一次分配。

idx = DT[, .(idx = .I[c(1L, .N)]), by=class]$idx
DT[idx, v := NA]

I'll try and add this example to the Reference semantics vignette.

我将尝试将此示例添加到Reference语义晕影中。

#2


3  

This may not be a one-liner, but it does have 'first' and 'last' in the code :)

这可能不是一个单行,但它在代码中确实有“第一个”和“最后一个”:)

> DT <- data.table(v = rnorm(12), class=rep(1:3, each=4))
> setkey(DT, class)
> classes = DT[, .(unique(class))]
> DT[classes, v := NA, mult='first']
> DT[classes, v := NA, mult='last']
> DT
          v class
 1:      NA     1
 2: -1.8191     1
 3: -0.6355     1
 4:      NA     1
 5:      NA     2
 6: -1.1771     2
 7: -0.8125     2
 8:      NA     2
 9:      NA     3
10:  0.2357     3
11:  0.3416     3
12:      NA     3
> 

Order is also preserved for the non-key columns. I think that is a documented (committed to) feature.

还为非键列保留顺序。我认为这是一个记录(承诺)功能。

#3


1  

With a helper function it's easy

使用辅助功能很容易

set.na = function(x,y) {x[y] = NA; x}
DT[, set.na(v,c(1,.N)) , by=class]

#4


0  

The canonical way to modify subsets of the data is to use i to define the subset. You cannot use [ together with :=. Either create a temporary i as suggested by @David Arenburg or you can create the outcome vector yourself using a c(NA, v[-c(1, .N)], NA) construction.

修改数据子集的规范方法是使用i来定义子集。你不能使用[with with:=。根据@David Arenburg的建议创建临时i,或者您可以使用c(NA,v [-c(1,.N)],NA)构造自己创建结果向量。

DT[, v := c(NA, v[-c(1, .N)], NA)[1:.N], by = class]

However, you should also note that the row order can change when you e.g. set a new key or use any number of functions. So you should be very careful with this operation.

但是,您还应该注意,例如,您可以更改行顺序。设置新密钥或使用任意数量的功能。所以你应该非常小心这个操作。