An example case is here:
这里有一个例子:
DT = data.table(x=1:4, y=6:9, z=3:6)
setkey(DT, x, y)
Join columns have multiple values:
连接列有多个值:
xc = c(1, 2, 4)
yc = c(6, 9)
DT[J(xc, yc), nomatch=0]
x y z
1: 1 6 3
This use of J()
returns only single row. Actually, I want to join as %in%
operator.
这种J()的使用只返回单行。实际上,我想以%运算符的%加入。
DT[x %in% xc & y %in% yc]
x y z
1: 1 6 3
2: 4 9 6
But using %in%
operator makes the search a vector scan which is very slow compared to binary search. In order to have binary search, I build every possible combination of join values:
但是使用%in%运算符会使搜索成为矢量扫描,与二进制搜索相比,这种搜索速度非常慢。为了进行二进制搜索,我构建了每个可能的连接值组合:
xc2 = rep(xc, length(yc))
yc2 = unlist(lapply(yc, rep, length(xc)))
DT[J(xc2, yc2), nomatch=0]
x y z
1: 1 6 3
2: 4 9 6
But building xc2, yc2 in this way makes code difficult to read. Is there a better way to have the speed of binary search and the simplicity of %in%
operator in this case?
但是以这种方式构建xc2,yc2会使代码难以阅读。在这种情况下,有没有更好的方法来获得二进制搜索的速度和%运算符%的简单性?
1 个解决方案
#1
1
Answering to remove this question from DT tag open questions.
Code from Arun's comment DT[CJ(xc,yc), nomatch=0L]
will do the job.
回答从DT标签打开问题中删除此问题。来自Arun的评论DT [CJ(xc,yc),nomatch = 0L]的代码将完成这项工作。
#1
1
Answering to remove this question from DT tag open questions.
Code from Arun's comment DT[CJ(xc,yc), nomatch=0L]
will do the job.
回答从DT标签打开问题中删除此问题。来自Arun的评论DT [CJ(xc,yc),nomatch = 0L]的代码将完成这项工作。