data.table错误,导致R中的段错误

时间:2022-06-29 17:00:23

The following code segfaults my R 2.15.0, running data.table 1.8.9.

以下代码段落了我的R 2.15.0,运行data.table 1.8.9。

library(data.table)
d = data.table(date = c(1,2,3,4,5), value = c(1,2,3,4,5))

# works as expected
d[-5][, mean(value), by = list(I(as.integer((date+1)/2)))]

# crashes R
d[-5, mean(value), by = list(I(as.integer((date+1)/2)))]

And on a related note, the following two commands have very different outputs:

在相关的说明中,以下两个命令具有非常不同的输出:

d[-5][, value, by = list(I(as.integer((date+1)/2)))]
#    I value
# 1: 1     1
# 2: 1     2
# 3: 2     3
# 4: 2     4

d[-5, value, by = list(I(as.integer((date+1)/2)))]
#    I         value
# 1: 1 2.121996e-314
# 2: 1 2.470328e-323
# 3: 2 3.920509e-316
# 4: 2 2.470328e-323

Simpler command crashing my R from the comments:

更简单的命令使我的R从评论中崩溃:

d[-5, value, by = date]

As Ricardo points out, it's the combination of negative indexing and by that creates the problem.

正如里卡多指出的那样,这是负面索引和由此产生问题的组合。

2 个解决方案

#1


4  

UPDATE: This has been fixed in v1.8.11. From NEWS :

Crash or incorrect aggregate results with negative indexing in i is fixed, #2697. Thanks to Eduard Antonyan (eddi) for reporting. Tests added.

在i中带有负索引的崩溃或不正确的聚合结果是固定的,#2697。感谢Eduard Antonyan(eddi)的报道。测试补充说。

#2


4  

One hypothesis is that the problem is related to the following lines in [.data.table:

一个假设是问题与[.data.table中的以下行有关:

o__ = if (length(o__)) irows[o__]
              else irows

o__ eventually gets passed to the C code (dogroups.C) as -5 in this case. One could imagine this causing issues with pointer arithmetic leading to segfaults and/or erroneous values.

在这种情况下,o__最终被传递给C代码(dogroups.C)为-5。可以想象这导致指针算法导致段错误和/或错误值的问题。

A potential workaround would be to use data.table's not-join syntax:

一个潜在的解决方法是使用data.table的非连接语法:

d[!5, mean(value), by = list(I(as.integer((date+1)/2)))]

which passes through some different logic on the way to C:

在通往C的路上经过一些不同的逻辑:

if (notjoin) {
            ... Omitted for brevity ...
            i = irows = if (length(irows)) seq_len(nrow(x))[-irows] else NULL
        }

#1


4  

UPDATE: This has been fixed in v1.8.11. From NEWS :

Crash or incorrect aggregate results with negative indexing in i is fixed, #2697. Thanks to Eduard Antonyan (eddi) for reporting. Tests added.

在i中带有负索引的崩溃或不正确的聚合结果是固定的,#2697。感谢Eduard Antonyan(eddi)的报道。测试补充说。

#2


4  

One hypothesis is that the problem is related to the following lines in [.data.table:

一个假设是问题与[.data.table中的以下行有关:

o__ = if (length(o__)) irows[o__]
              else irows

o__ eventually gets passed to the C code (dogroups.C) as -5 in this case. One could imagine this causing issues with pointer arithmetic leading to segfaults and/or erroneous values.

在这种情况下,o__最终被传递给C代码(dogroups.C)为-5。可以想象这导致指针算法导致段错误和/或错误值的问题。

A potential workaround would be to use data.table's not-join syntax:

一个潜在的解决方法是使用data.table的非连接语法:

d[!5, mean(value), by = list(I(as.integer((date+1)/2)))]

which passes through some different logic on the way to C:

在通往C的路上经过一些不同的逻辑:

if (notjoin) {
            ... Omitted for brevity ...
            i = irows = if (length(irows)) seq_len(nrow(x))[-irows] else NULL
        }