I have found myself doing a "conditional left join" several times in R. To illustrate with an example; if you have two data frames such as:
我发现自己在R中多次进行“有条件的左连接”。用一个例子说明;如果您有两个数据框,例如:
> df
a b
1 1 0
2 2 0
> other.df
a b
1 2 3
The goal is to end up with this data frame:
目标是最终得到这个数据框:
> final.df
a b
1 1 0
2 2 3
The code I've been written so far:
我到目前为止编写的代码:
c <- merge(df, other.df, by=c("a"), all.x = TRUE)
c[is.na(c$b.y),]$b.y <- 0
d<-subset(c, select=c("a","b.y"))
colnames(d)[2]<-b
to finally arrive with the result I wanted.
最终到达我想要的结果。
Doing this in effectively four lines makes the code very opaque. Is there any better, less cumbersome way to do this?
有效地在四行中执行此操作会使代码非常不透明。有没有更好,更简单的方法来做到这一点?
2 个解决方案
#1
1
Here are two ways. In both cases the first line does a left merge returning the required columns. In the case of merge
we then have to set the names. The final line in both lines replaces NA
s with 0
.
这有两种方式。在这两种情况下,第一行都会执行左合并,返回所需的列。在合并的情况下,我们必须设置名称。两行中的最后一行用0替换NA。
merge
res1 <- merge(df, other.df, by = "a", all.x = TRUE)[-2]
names(res1) <- names(df)
res1[is.na(res1)] <- 0
sqldf
library(sqldf)
res2 <- sqldf("select a, o.b from df left join 'other.df' o using(a)")
res2[is.na(res2)] <- 0
#2
0
In two lines:
分为两行:
c <- merge(df, other.df,all=T)
c=c[which(!duplicated(c$a)),]
So this takes the values from both data sets and omits rows with id duplicates from the second. I am not sure which is left and which is right, so if you want the other: flip the data upside down and do the same thing.
因此,这将从两个数据集中获取值,并省略第二个ID重复的行。我不确定哪个是剩下的,哪个是正确的,所以如果你想要另一个:翻转数据并做同样的事情。
c=c[length(c$a):1,]
c=c[which(!duplicated(c$a)),]
#1
1
Here are two ways. In both cases the first line does a left merge returning the required columns. In the case of merge
we then have to set the names. The final line in both lines replaces NA
s with 0
.
这有两种方式。在这两种情况下,第一行都会执行左合并,返回所需的列。在合并的情况下,我们必须设置名称。两行中的最后一行用0替换NA。
merge
res1 <- merge(df, other.df, by = "a", all.x = TRUE)[-2]
names(res1) <- names(df)
res1[is.na(res1)] <- 0
sqldf
library(sqldf)
res2 <- sqldf("select a, o.b from df left join 'other.df' o using(a)")
res2[is.na(res2)] <- 0
#2
0
In two lines:
分为两行:
c <- merge(df, other.df,all=T)
c=c[which(!duplicated(c$a)),]
So this takes the values from both data sets and omits rows with id duplicates from the second. I am not sure which is left and which is right, so if you want the other: flip the data upside down and do the same thing.
因此,这将从两个数据集中获取值,并省略第二个ID重复的行。我不确定哪个是剩下的,哪个是正确的,所以如果你想要另一个:翻转数据并做同样的事情。
c=c[length(c$a):1,]
c=c[which(!duplicated(c$a)),]