使用R data.table中的双变量过滤因子变量

时间:2022-06-30 14:58:05

How come I can filter a factor variable using a double variable in one case, but not in another?

为什么我可以在一种情况下使用双变量过滤因子变量,而不是在另一种情况下?

Example data below:

以下示例数据:

dt <- data.table(id=1:9,
                 var=factor(81:89))

# > dt
#    id var
# 1:  1  81
# 2:  2  82
# 3:  3  83
# 4:  4  84
# 5:  5  85
# 6:  6  86
# 7:  7  87
# 8:  8  88
# 9:  9  89

Why does this work...

为什么这样做......

dt[id %in% 1:7 & var %in% c(82, 84)]

#    id var
# 1:  2  82
# 2:  4  84

...but this gives an error?

...但这会出错?

dt[var %in% c(82, 84)]

# Error in bmerge(i, x, leftcols, rightcols, io <- FALSE, xo, roll = 0,  : 
#  x.'var' is a factor column being joined to i.'V1' which is type 'double'.
# Factor columns must join to factor or character columns.`

Seems a bit inconsequent and might be a bug?

似乎有点不可能,可能是一个错误?

1 个解决方案

#1


9  

The difference is that the second example is optimized by automatic indexing, which throws this error. You can switch off this feature like this:

不同之处在于第二个示例是通过自动索引优化的,这会抛出此错误。您可以像这样关闭此功能:

dt[(var %in% c(82, 84))]
#   id var
#1:  2  82
#2:  4  84

Then a base R vector scan is used and usual coercion rules apply. From help("%in%"):

然后使用基本R矢量扫描并应用通常的强制规则。来自帮助(“%in%”):

Factors, raw vectors and lists are converted to character vectors, and then x and table are coerced to a common type

将因子,原始向量和列表转换为字符向量,然后将x和表强制转换为通用类型

var <- factor(81:89)
var %in% c(82, 84)
#[1] FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

The problem has been fixed in data.table version 1.9.7.

data.table版本1.9.7中已修复此问题。

#1


9  

The difference is that the second example is optimized by automatic indexing, which throws this error. You can switch off this feature like this:

不同之处在于第二个示例是通过自动索引优化的,这会抛出此错误。您可以像这样关闭此功能:

dt[(var %in% c(82, 84))]
#   id var
#1:  2  82
#2:  4  84

Then a base R vector scan is used and usual coercion rules apply. From help("%in%"):

然后使用基本R矢量扫描并应用通常的强制规则。来自帮助(“%in%”):

Factors, raw vectors and lists are converted to character vectors, and then x and table are coerced to a common type

将因子,原始向量和列表转换为字符向量,然后将x和表强制转换为通用类型

var <- factor(81:89)
var %in% c(82, 84)
#[1] FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE

The problem has been fixed in data.table version 1.9.7.

data.table版本1.9.7中已修复此问题。