I have a data.table with a large number of missing values. I would like to fill these by adding or subtracting values from the available values in the data.table. In particular, consider this data:
我有一个data.table,有大量的缺失值。我想通过在data.table中的可用值中添加或减去值来填充这些值。特别要考虑这些数据:
> test = data.table(id=c("A","A","A","A","A","B","B","B","B","B"), x=c(NA,NA,0,NA,NA,NA,NA,0,NA,NA))
> test
id x
1: A NA
2: A NA
3: A 0
4: A NA
5: A NA
6: B NA
7: B NA
8: B 0
9: B NA
10: B NA
I need an operation which transforms this into that:
我需要一个将其转换为以下内容的操作:
id x
1: A -2
2: A -1
3: A 0
4: A 1
5: A 2
6: B -2
7: B -1
8: B 0
9: B 1
10: B 2
Basically a version of na.locf which increments the last value rather than repeating it.
基本上是na.locf的一个版本,它增加最后一个值而不是重复它。
1 个解决方案
#1
8
We can group by 'id', and take the difference of the row number (seq_len(.N)
) with the position (which
) in 'x' where it is 0 (!x
). I am wrapping with as.numeric
as the 'x' column is numeric
in the input dataset, but from the difference, it is converted to 'integer'. If there is a * in class
while assigning (:=
), the data.table will show error as it needs matching class
.
我们可以按'id'进行分组,并将行号(seq_len(.N))的差值与'x'中的位置(哪个)取为0(!x)。我用as.numeric包装,因为'x'列在输入数据集中是数字,但从差异来看,它被转换为'整数'。如果在分配(:=)时类中存在冲突,则data.table将显示错误,因为它需要匹配类。
test[, x:= as.numeric(seq_len(.N)-which(!x)), id]
test
# id x
# 1: A -2
# 2: A -1
# 3: A 0
# 4: A 1
# 5: A 2
# 6: B -2
# 7: B -1
# 8: B 0
# 9: B 1
#10: B 2
!x
is otherwise written more clearly as x==0
. It returns a logical vector of TRUE/FALSE
. If there are NA
values, it will remain as NA
. By wrapping with which
, we get the position of 0
value. In the example, it is 3
for each 'id'.
!x更清楚地写成x == 0。它返回逻辑向量TRUE / FALSE。如果有NA值,它将保持为NA。通过包装,我们得到0值的位置。在示例中,每个'id'为3。
#1
8
We can group by 'id', and take the difference of the row number (seq_len(.N)
) with the position (which
) in 'x' where it is 0 (!x
). I am wrapping with as.numeric
as the 'x' column is numeric
in the input dataset, but from the difference, it is converted to 'integer'. If there is a * in class
while assigning (:=
), the data.table will show error as it needs matching class
.
我们可以按'id'进行分组,并将行号(seq_len(.N))的差值与'x'中的位置(哪个)取为0(!x)。我用as.numeric包装,因为'x'列在输入数据集中是数字,但从差异来看,它被转换为'整数'。如果在分配(:=)时类中存在冲突,则data.table将显示错误,因为它需要匹配类。
test[, x:= as.numeric(seq_len(.N)-which(!x)), id]
test
# id x
# 1: A -2
# 2: A -1
# 3: A 0
# 4: A 1
# 5: A 2
# 6: B -2
# 7: B -1
# 8: B 0
# 9: B 1
#10: B 2
!x
is otherwise written more clearly as x==0
. It returns a logical vector of TRUE/FALSE
. If there are NA
values, it will remain as NA
. By wrapping with which
, we get the position of 0
value. In the example, it is 3
for each 'id'.
!x更清楚地写成x == 0。它返回逻辑向量TRUE / FALSE。如果有NA值,它将保持为NA。通过包装,我们得到0值的位置。在示例中,每个'id'为3。