The following code segfaults my R 2.15.0
, running data.table 1.8.9
.
以下代码段落了我的R 2.15.0,运行data.table 1.8.9。
library(data.table)
d = data.table(date = c(1,2,3,4,5), value = c(1,2,3,4,5))
# works as expected
d[-5][, mean(value), by = list(I(as.integer((date+1)/2)))]
# crashes R
d[-5, mean(value), by = list(I(as.integer((date+1)/2)))]
And on a related note, the following two commands have very different outputs:
在相关的说明中,以下两个命令具有非常不同的输出:
d[-5][, value, by = list(I(as.integer((date+1)/2)))]
# I value
# 1: 1 1
# 2: 1 2
# 3: 2 3
# 4: 2 4
d[-5, value, by = list(I(as.integer((date+1)/2)))]
# I value
# 1: 1 2.121996e-314
# 2: 1 2.470328e-323
# 3: 2 3.920509e-316
# 4: 2 2.470328e-323
Simpler command crashing my R
from the comments:
更简单的命令使我的R从评论中崩溃:
d[-5, value, by = date]
As Ricardo points out, it's the combination of negative indexing and by
that creates the problem.
正如里卡多指出的那样,这是负面索引和由此产生问题的组合。
2 个解决方案
#1
#2
4
One hypothesis is that the problem is related to the following lines in [.data.table
:
一个假设是问题与[.data.table中的以下行有关:
o__ = if (length(o__)) irows[o__]
else irows
o__
eventually gets passed to the C code (dogroups.C) as -5
in this case. One could imagine this causing issues with pointer arithmetic leading to segfaults and/or erroneous values.
在这种情况下,o__最终被传递给C代码(dogroups.C)为-5。可以想象这导致指针算法导致段错误和/或错误值的问题。
A potential workaround would be to use data.table
's not-join syntax:
一个潜在的解决方法是使用data.table的非连接语法:
d[!5, mean(value), by = list(I(as.integer((date+1)/2)))]
which passes through some different logic on the way to C:
在通往C的路上经过一些不同的逻辑:
if (notjoin) {
... Omitted for brevity ...
i = irows = if (length(irows)) seq_len(nrow(x))[-irows] else NULL
}
#1
4
UPDATE: This has been fixed in v1.8.11. From NEWS :
Crash or incorrect aggregate results with negative indexing in i is fixed, #2697. Thanks to Eduard Antonyan (eddi) for reporting. Tests added.
在i中带有负索引的崩溃或不正确的聚合结果是固定的,#2697。感谢Eduard Antonyan(eddi)的报道。测试补充说。
#2
4
One hypothesis is that the problem is related to the following lines in [.data.table
:
一个假设是问题与[.data.table中的以下行有关:
o__ = if (length(o__)) irows[o__]
else irows
o__
eventually gets passed to the C code (dogroups.C) as -5
in this case. One could imagine this causing issues with pointer arithmetic leading to segfaults and/or erroneous values.
在这种情况下,o__最终被传递给C代码(dogroups.C)为-5。可以想象这导致指针算法导致段错误和/或错误值的问题。
A potential workaround would be to use data.table
's not-join syntax:
一个潜在的解决方法是使用data.table的非连接语法:
d[!5, mean(value), by = list(I(as.integer((date+1)/2)))]
which passes through some different logic on the way to C:
在通往C的路上经过一些不同的逻辑:
if (notjoin) {
... Omitted for brevity ...
i = irows = if (length(irows)) seq_len(nrow(x))[-irows] else NULL
}