I am trying to merge two data frames using two common columns using the following code.
我正在尝试使用以下代码使用两个公共列合并两个数据帧。
data = merge(df1, df2,by.x=c("b_id"), by.y=c("e_id"), all=T)
This works fine. BUT there are some rows (cases of data) which have an ID and data for the second data frame, and not the first (and vice versa). This means I return lines of NA for the first data frame (or vice versa).
这很好用。但是有一些行(数据的情况)具有第二数据帧的ID和数据,而不是第一个(反之亦然)。这意味着我返回第一个数据帧的NA行(反之亦然)。
I am wondering how I could return a merged data set where the second data frames ID number is appended to the first data frames ID number in the merged data frame. In programmes like SPSS or STATA it does this automatically if you merge two data sets with differing completeness of data.
我想知道如何返回合并数据集,其中第二个数据帧ID号被附加到合并数据帧中的第一个数据帧ID号。在SPSS或STATA等程序中,如果合并具有不同数据完整性的两个数据集,它会自动执行此操作。
e.g. I want to return this.
例如我想要归还这个。
b_id dfv1 dfv2
1101 5 NA
1102 5 5
1103 8 9
1104 NA 3
1105 NA 12
Not this!
不是这个!
b_id dfv1 dfv2
1101 5 NA
1102 5 5
1103 8 9
NA NA 3
NA NA 12
From these two dataframes:
从这两个数据帧:
b_id dfv1
1101 5
1102 5
1103 8
NA NA
NA NA
e_id dfv2
NA NA
1102 5
1103 9
1104 3
1105 12
Thanks
谢谢
2 个解决方案
#1
1
Since the input data is not provided, it is kind of hard to be sure about what is asked for. Based on what I understand from the question, the input could look like this:
由于未提供输入数据,因此很难确定所要求的内容。根据我对问题的理解,输入可能如下所示:
df1 <- data.frame(b_id = c(1101, 1102, 1103), dfv1 = c(5,5,8))
df2 <- data.frame(e_id = c(1102, 1103,1104,1105), dfv2 = c(5,9,3,12))
> df1
b_id dfv1
1 1101 5
2 1102 5
3 1103 8
> df2
e_id dfv2
1 1102 5
2 1103 9
3 1104 3
4 1105 12
Now, if you run
现在,如果你跑了
merge(df1, df2, by.x = "b_id", by.y = "e_id", all = TRUE)
b_id dfv1 dfv2
1 1101 5 NA
2 1102 5 5
3 1103 8 9
4 1104 NA 3
5 1105 NA 12
Does this answer the question? If not, please edit your question to include the input data.
这回答了这个问题吗?如果没有,请编辑您的问题以包含输入数据。
Update
With the input data provided, it is now possible to answer your question. This seems to produce what you are looking for with the input data you provided:
提供输入数据后,现在可以回答您的问题。这似乎通过您提供的输入数据产生您正在寻找的内容:
merge(df1[complete.cases(df1),], df2[complete.cases(df2),], by.x = "b_id", by.y = "e_id", all = T)
b_id dfv1 dfv2
1 1101 5 NA
2 1102 5 5
3 1103 8 9
4 1104 NA 3
5 1105 NA 12
So basically you exclude all rows that are not complete in each data.frame and the merge the two (which creates some new NA as in your desired output).
所以基本上你排除了每个data.frame中不完整的所有行,并将两者合并(这会在你想要的输出中创建一些新的NA)。
#2
0
Try to use data = merge(df1, df2, all.x = TRUE, by=c("b_id","e_id"))
I did it some days ago ! It worked for me !
尝试使用data = merge(df1,df2,all.x = TRUE,by = c(“b_id”,“e_id”))我几天前就做过了!它对我有用!
#1
1
Since the input data is not provided, it is kind of hard to be sure about what is asked for. Based on what I understand from the question, the input could look like this:
由于未提供输入数据,因此很难确定所要求的内容。根据我对问题的理解,输入可能如下所示:
df1 <- data.frame(b_id = c(1101, 1102, 1103), dfv1 = c(5,5,8))
df2 <- data.frame(e_id = c(1102, 1103,1104,1105), dfv2 = c(5,9,3,12))
> df1
b_id dfv1
1 1101 5
2 1102 5
3 1103 8
> df2
e_id dfv2
1 1102 5
2 1103 9
3 1104 3
4 1105 12
Now, if you run
现在,如果你跑了
merge(df1, df2, by.x = "b_id", by.y = "e_id", all = TRUE)
b_id dfv1 dfv2
1 1101 5 NA
2 1102 5 5
3 1103 8 9
4 1104 NA 3
5 1105 NA 12
Does this answer the question? If not, please edit your question to include the input data.
这回答了这个问题吗?如果没有,请编辑您的问题以包含输入数据。
Update
With the input data provided, it is now possible to answer your question. This seems to produce what you are looking for with the input data you provided:
提供输入数据后,现在可以回答您的问题。这似乎通过您提供的输入数据产生您正在寻找的内容:
merge(df1[complete.cases(df1),], df2[complete.cases(df2),], by.x = "b_id", by.y = "e_id", all = T)
b_id dfv1 dfv2
1 1101 5 NA
2 1102 5 5
3 1103 8 9
4 1104 NA 3
5 1105 NA 12
So basically you exclude all rows that are not complete in each data.frame and the merge the two (which creates some new NA as in your desired output).
所以基本上你排除了每个data.frame中不完整的所有行,并将两者合并(这会在你想要的输出中创建一些新的NA)。
#2
0
Try to use data = merge(df1, df2, all.x = TRUE, by=c("b_id","e_id"))
I did it some days ago ! It worked for me !
尝试使用data = merge(df1,df2,all.x = TRUE,by = c(“b_id”,“e_id”))我几天前就做过了!它对我有用!