快速加入data.table(潜在错误,报告前检查)

时间:2022-09-22 07:37:00

This might be a bug. In that case, I will delete this question and report as bug. I would like someone to take a look to make sure I'm not doing something incorrectly so I don't waste the developer time.

这可能是一个错误。在这种情况下,我将删除此问题并报告为错误。我希望有人看一看,确保我没有做错事,所以我不浪费开发时间。

test = data.table(mo=1:100, b=100:1, key=c("mo", "b"))
mo = 1
test[J(mo)]

That returns the entire test data.table instead of the correct result returned by

这将返回整个测试data.table而不是返回的正确结果

test[J(1)]

I believe the error might be coming from test having the same column name as the table which is being joined by, mo. Does anyone else get the same problem?

我相信错误可能来自测试,其具有与正在加入的表相同的列名,mo。有没有其他人得到同样的问题?

2 个解决方案

#1


9  

This is a scoping issue, similar to the one discussed in data.table-faq 2.13 (warning, pdf). Because test contains a column named mo, when J(mo) is evaluated, it returns that entire column, rather than value of the mo found in the global environment, which it masks. (This scoping behavior is, of course, quite nice when you want to do something like test[mo<4]!)

这是一个范围问题,类似于data.table-faq 2.13(warning,pdf)中讨论的问题。因为test包含一个名为mo的列,所以当评估J(mo)时,它会返回整个列,而不是它掩盖的全局环境中找到的mo的值。 (当你想做像测试[mo <4]这样的事情时,这种范围行为当然是非常好的!)

Try this to see what's going on:

试试看看发生了什么:

test <- data.table(mo=1:5, b=5:1, key=c("mo", "b"))
mo <-  1
test[browser()]
Browse[1]> J(mo)
#    mo
# 1:  1
# 2:  2
# 3:  3
# 4:  4
# 5:  5
# Browse[1]> 

As suggested in the linked FAQ, a simple solution is to rename the indexing variable:

正如链接的FAQ中所建议的,一个简单的解决方案是重命名索引变量:

MO <- 1
test[J(MO)]
#    mo b
# 1:  1 6

(This will also work, for reasons discussed in the documentation of i in ?data.table):

(这也可以,因为i in?data.table中的文档中讨论的原因):

mo <- data.table(1)
test[mo]
#    mo b
# 1:  1 6

#2


4  

This is not a bug, but documented behaviour afaik. It's a scoping issue:

这不是一个错误,但记录的行为是afaik。这是一个范围问题:

test[J(globalenv()$mo)]
   mo   b
1:  1 100

#1


9  

This is a scoping issue, similar to the one discussed in data.table-faq 2.13 (warning, pdf). Because test contains a column named mo, when J(mo) is evaluated, it returns that entire column, rather than value of the mo found in the global environment, which it masks. (This scoping behavior is, of course, quite nice when you want to do something like test[mo<4]!)

这是一个范围问题,类似于data.table-faq 2.13(warning,pdf)中讨论的问题。因为test包含一个名为mo的列,所以当评估J(mo)时,它会返回整个列,而不是它掩盖的全局环境中找到的mo的值。 (当你想做像测试[mo <4]这样的事情时,这种范围行为当然是非常好的!)

Try this to see what's going on:

试试看看发生了什么:

test <- data.table(mo=1:5, b=5:1, key=c("mo", "b"))
mo <-  1
test[browser()]
Browse[1]> J(mo)
#    mo
# 1:  1
# 2:  2
# 3:  3
# 4:  4
# 5:  5
# Browse[1]> 

As suggested in the linked FAQ, a simple solution is to rename the indexing variable:

正如链接的FAQ中所建议的,一个简单的解决方案是重命名索引变量:

MO <- 1
test[J(MO)]
#    mo b
# 1:  1 6

(This will also work, for reasons discussed in the documentation of i in ?data.table):

(这也可以,因为i in?data.table中的文档中讨论的原因):

mo <- data.table(1)
test[mo]
#    mo b
# 1:  1 6

#2


4  

This is not a bug, but documented behaviour afaik. It's a scoping issue:

这不是一个错误,但记录的行为是afaik。这是一个范围问题:

test[J(globalenv()$mo)]
   mo   b
1:  1 100