如何使用data.table执行“setdiff”合并?

时间:2022-10-15 19:22:50

Say I have two tables:

说我有两张桌子:

library(data.table)
set.seed(1)

tab1 <- data.table(
  let = rep(letters[1:2], each = 3),
  num = rep(1:3, 2),
  val = rnorm(6),
  key = c("let", "num")
)

tab2 <- data.table(
  let = rep(letters[1:2], each = 2),
  num = rep(1:2, 2),
  val = rnorm(4),
  key = c("let", "num")
)

Table 1:

> tab1
   let num        val
1:   a   1 -0.6264538
2:   a   2  0.1836433
3:   a   3 -0.8356286
4:   b   1  1.5952808
5:   b   2  0.3295078
6:   b   3 -0.8204684

Table 2:

> tab2
   let num
1:   a   1
2:   a   2
3:   b   1
4:   b   2

Is there a way to "merge" these tables such that I get all the results in tab1 that are not in tab2?:

有没有办法“合并”这些表,以便我得到tab1中不在tab2中的所有结果?:

   let num        val
1:   a   3 -0.8356286
2:   b   3 -0.8204684

2 个解决方案

#1


12  

In this case, it's equivalent to an anti join:

在这种情况下,它相当于反连接:

tab1[!tab2, on=c("let", "num")]

But setdiff() would only the first row for every let,num. This is marked for v1.9.8, FR #547.

但是setdiff()只会是每个let的第一行,num。这标记为v1.9.8,FR#547。

#2


0  

One solution would be to do a merge and remove the rows where there are values from tab2

一种解决方案是进行合并并删除tab2中有值的行

d<-as.data.frame(merge(tab1,tab2,all=T))
t<-is.na(d[,4])
d[t,][,-4]

 let num      val.x
3   a   3 -0.8356286
6   b   3 -0.8204684

Using data.table :

使用data.table:

merge(tab1,tab2,all=T)[is.na(val.y),1:3,with=F]

 let num      val.x
1:   a   3 -0.8356286
2:   b   3 -0.8204684

#1


12  

In this case, it's equivalent to an anti join:

在这种情况下,它相当于反连接:

tab1[!tab2, on=c("let", "num")]

But setdiff() would only the first row for every let,num. This is marked for v1.9.8, FR #547.

但是setdiff()只会是每个let的第一行,num。这标记为v1.9.8,FR#547。

#2


0  

One solution would be to do a merge and remove the rows where there are values from tab2

一种解决方案是进行合并并删除tab2中有值的行

d<-as.data.frame(merge(tab1,tab2,all=T))
t<-is.na(d[,4])
d[t,][,-4]

 let num      val.x
3   a   3 -0.8356286
6   b   3 -0.8204684

Using data.table :

使用data.table:

merge(tab1,tab2,all=T)[is.na(val.y),1:3,with=F]

 let num      val.x
1:   a   3 -0.8356286
2:   b   3 -0.8204684