按日期和第二个变量合并R中的数据帧

时间:2022-02-08 22:59:01

I'm interested in merging two dataframes in R. I'd like to merge both by a date and a second ID variable. For example, creating two dataframes (df1 and df2) that have some but not complete overlap:

我有兴趣在R中合并两个数据帧。我想用日期和第二个ID变量合并它们。例如,创建两个具有一些但不完全重叠的数据帧(df1和df2):

df1 <- data.frame(ID=c(rep(1,5),rep(2,3),rep(3,7)),
              Date=c(seq( as.Date("2011-07-01"), by=1, len=5),
                     seq( as.Date("2011-07-01"), by=1, len=3),
                     seq( as.Date("2011-07-01"), by=1, len=7)))
df2 <- data.frame(ID=c(rep(1,3),rep(2,2),rep(3,7)),
              Date=c(seq( as.Date("2011-07-01"), by=1, len=3),
                     seq( as.Date("2011-07-01"), by=1, len=2),
                     seq( as.Date("2011-07-01"), by=1, len=7)),
              var=c(rep(12,3),rep(5,2),rep(9,7)))

Yields:

产量:

> df1
   ID   Date
1   1 2011-07-01
2   1 2011-07-02
3   1 2011-07-03
4   1 2011-07-04
5   1 2011-07-05

> df2
   ID   Date     var
1   1 2011-07-01  12
2   1 2011-07-02  12
3   1 2011-07-03  12
4   2 2011-07-01  5 , etc..

I'm wanting the equivalent of join_all(type="full") or merge(all=TRUE), so that NAs appear when overlap is not possible. I'm wanting the matching on cases with equivalent IDs and Dates.

我想要相当于join_all(type =“full”)或merge(all = TRUE),以便在无法重叠时出现NA。我想要在具有等效ID和日期的案例上进行匹配。

> df3
 ID    Date       var
1 1   2011-07-01  12
2 1   2011-07-02  12
3 1   2011-07-03  12
4 1   2011-07-04  NA
5 1   2011-07-05  NA, etc.

Thanks!

谢谢!

2 个解决方案

#1


1  

Seems like you want something like this,

好像你想要这样的东西,

> merge(df1, df2, by=c("ID", "Date"), all = TRUE)
   ID       Date var
1   1 2011-07-01  12
2   1 2011-07-02  12
3   1 2011-07-03  12
4   1 2011-07-04  NA
5   1 2011-07-05  NA
6   2 2011-07-01   5
7   2 2011-07-02   5
8   2 2011-07-03  NA
9   3 2011-07-01   9
10  3 2011-07-02   9
11  3 2011-07-03   9
12  3 2011-07-04   9
13  3 2011-07-05   9
14  3 2011-07-06   9
15  3 2011-07-07   9

#2


0  

merge is a useful base function. But also suggest to take a look at dplyr package. Quite sure you will bump into it soon.

合并是一个有用的基础功能。但也建议看看dplyr包。相当肯定你很快会遇到它。

library(dplyr)
left_join(df1,df2)
# Joining by: c("ID", "Date")
#    ID  Date var
# 1   1 15156  12
# 2   1 15157  12
# 3   1 15158  12
# 4   1 15159  NA
# 5   1 15160  NA
# 6   2 15156   5
# 7   2 15157   5
# 8   2 15158  NA
# 9   3 15156   9
# 10  3 15157   9
# 11  3 15158   9
# 12  3 15159   9
# 13  3 15160   9
# 14  3 15161   9
# 15  3 15162   9

#1


1  

Seems like you want something like this,

好像你想要这样的东西,

> merge(df1, df2, by=c("ID", "Date"), all = TRUE)
   ID       Date var
1   1 2011-07-01  12
2   1 2011-07-02  12
3   1 2011-07-03  12
4   1 2011-07-04  NA
5   1 2011-07-05  NA
6   2 2011-07-01   5
7   2 2011-07-02   5
8   2 2011-07-03  NA
9   3 2011-07-01   9
10  3 2011-07-02   9
11  3 2011-07-03   9
12  3 2011-07-04   9
13  3 2011-07-05   9
14  3 2011-07-06   9
15  3 2011-07-07   9

#2


0  

merge is a useful base function. But also suggest to take a look at dplyr package. Quite sure you will bump into it soon.

合并是一个有用的基础功能。但也建议看看dplyr包。相当肯定你很快会遇到它。

library(dplyr)
left_join(df1,df2)
# Joining by: c("ID", "Date")
#    ID  Date var
# 1   1 15156  12
# 2   1 15157  12
# 3   1 15158  12
# 4   1 15159  NA
# 5   1 15160  NA
# 6   2 15156   5
# 7   2 15157   5
# 8   2 15158  NA
# 9   3 15156   9
# 10  3 15157   9
# 11  3 15158   9
# 12  3 15159   9
# 13  3 15160   9
# 14  3 15161   9
# 15  3 15162   9