合并数据帧但保留公共列

时间:2021-09-03 22:58:14

I am trying to merge two data frames using two common columns using the following code.

我正在尝试使用以下代码使用两个公共列合并两个数据帧。

data = merge(df1, df2,by.x=c("b_id"), by.y=c("e_id"), all=T)

This works fine. BUT there are some rows (cases of data) which have an ID and data for the second data frame, and not the first (and vice versa). This means I return lines of NA for the first data frame (or vice versa).

这很好用。但是有一些行(数据的情况)具有第二数据帧的ID和数据,而不是第一个(反之亦然)。这意味着我返回第一个数据帧的NA行(反之亦然)。

I am wondering how I could return a merged data set where the second data frames ID number is appended to the first data frames ID number in the merged data frame. In programmes like SPSS or STATA it does this automatically if you merge two data sets with differing completeness of data.

我想知道如何返回合并数据集,其中第二个数据帧ID号被附加到合并数据帧中的第一个数据帧ID号。在SPSS或STATA等程序中,如果合并具有不同数据完整性的两个数据集,它会自动执行此操作。

e.g. I want to return this.

例如我想要归还这个。

    b_id  dfv1  dfv2
    1101    5   NA
    1102    5   5
    1103    8   9
    1104    NA  3
    1105    NA  12

Not this!

不是这个!

    b_id  dfv1 dfv2
    1101    5   NA
    1102    5   5
    1103    8   9
    NA      NA  3
    NA      NA  12

From these two dataframes:

从这两个数据帧:

    b_id  dfv1              
    1101    5               
    1102    5               
    1103    8               
    NA      NA              
    NA      NA              

    e_id    dfv2              
    NA      NA              
    1102    5               
    1103    9               
    1104    3               
    1105    12   

Thanks

谢谢

2 个解决方案

#1


1  

Since the input data is not provided, it is kind of hard to be sure about what is asked for. Based on what I understand from the question, the input could look like this:

由于未提供输入数据,因此很难确定所要求的内容。根据我对问题的理解,输入可能如下所示:

df1 <- data.frame(b_id = c(1101, 1102, 1103), dfv1 = c(5,5,8))
df2 <- data.frame(e_id = c(1102, 1103,1104,1105), dfv2 = c(5,9,3,12))

> df1
  b_id dfv1
1 1101    5
2 1102    5
3 1103    8

> df2
  e_id dfv2
1 1102    5
2 1103    9
3 1104    3
4 1105   12

Now, if you run

现在,如果你跑了

merge(df1, df2, by.x = "b_id", by.y = "e_id", all = TRUE)

  b_id dfv1 dfv2
1 1101    5   NA
2 1102    5    5
3 1103    8    9
4 1104   NA    3
5 1105   NA   12

Does this answer the question? If not, please edit your question to include the input data.

这回答了这个问题吗?如果没有,请编辑您的问题以包含输入数据。

Update

With the input data provided, it is now possible to answer your question. This seems to produce what you are looking for with the input data you provided:

提供输入数据后,现在可以回答您的问题。这似乎通过您提供的输入数据产生您正在寻找的内容:

merge(df1[complete.cases(df1),], df2[complete.cases(df2),], by.x = "b_id", by.y = "e_id", all = T)

  b_id dfv1 dfv2
1 1101    5   NA
2 1102    5    5
3 1103    8    9
4 1104   NA    3
5 1105   NA   12

So basically you exclude all rows that are not complete in each data.frame and the merge the two (which creates some new NA as in your desired output).

所以基本上你排除了每个data.frame中不完整的所有行,并将两者合并(这会在你想要的输出中创建一些新的NA)。

#2


0  

Try to use data = merge(df1, df2, all.x = TRUE, by=c("b_id","e_id")) I did it some days ago ! It worked for me !

尝试使用data = merge(df1,df2,all.x = TRUE,by = c(“b_id”,“e_id”))我几天前就做过了!它对我有用!

#1


1  

Since the input data is not provided, it is kind of hard to be sure about what is asked for. Based on what I understand from the question, the input could look like this:

由于未提供输入数据,因此很难确定所要求的内容。根据我对问题的理解,输入可能如下所示:

df1 <- data.frame(b_id = c(1101, 1102, 1103), dfv1 = c(5,5,8))
df2 <- data.frame(e_id = c(1102, 1103,1104,1105), dfv2 = c(5,9,3,12))

> df1
  b_id dfv1
1 1101    5
2 1102    5
3 1103    8

> df2
  e_id dfv2
1 1102    5
2 1103    9
3 1104    3
4 1105   12

Now, if you run

现在,如果你跑了

merge(df1, df2, by.x = "b_id", by.y = "e_id", all = TRUE)

  b_id dfv1 dfv2
1 1101    5   NA
2 1102    5    5
3 1103    8    9
4 1104   NA    3
5 1105   NA   12

Does this answer the question? If not, please edit your question to include the input data.

这回答了这个问题吗?如果没有,请编辑您的问题以包含输入数据。

Update

With the input data provided, it is now possible to answer your question. This seems to produce what you are looking for with the input data you provided:

提供输入数据后,现在可以回答您的问题。这似乎通过您提供的输入数据产生您正在寻找的内容:

merge(df1[complete.cases(df1),], df2[complete.cases(df2),], by.x = "b_id", by.y = "e_id", all = T)

  b_id dfv1 dfv2
1 1101    5   NA
2 1102    5    5
3 1103    8    9
4 1104   NA    3
5 1105   NA   12

So basically you exclude all rows that are not complete in each data.frame and the merge the two (which creates some new NA as in your desired output).

所以基本上你排除了每个data.frame中不完整的所有行,并将两者合并(这会在你想要的输出中创建一些新的NA)。

#2


0  

Try to use data = merge(df1, df2, all.x = TRUE, by=c("b_id","e_id")) I did it some days ago ! It worked for me !

尝试使用data = merge(df1,df2,all.x = TRUE,by = c(“b_id”,“e_id”))我几天前就做过了!它对我有用!