I am new to using DataFrame and I would like to know how to perform a SQL equivalent of left outer join on multiple columns on a series of tables
我是使用DataFrame的新手,我想知道如何在一系列表的多个列上执行左外连接的SQL等价物
Example:
例:
df1:
Year Week Colour Val1
2014 A Red 50
2014 B Red 60
2014 B Black 70
2014 C Red 10
2014 D Green 20
df2:
Year Week Colour Val2
2014 A Black 30
2014 B Black 100
2014 C Green 50
2014 C Red 20
2014 D Red 40
df3:
Year Week Colour Val3
2013 B Red 60
2013 C Black 80
2013 B Black 10
2013 D Green 20
2013 D Red 50
Essentially I want to do something like this SQL code (Notice that df3 is not joined on Year):
基本上我想做这样的SQL代码(注意df3没有加入Year):
SELECT df1.*, df2.Val2, df3.Val3
FROM df1
LEFT OUTER JOIN df2
ON df1.Year = df2.Year
AND df1.Week = df2.Week
AND df1.Colour = df2.Colour
LEFT OUTER JOIN df3
ON df1.Week = df3.Week
AND df1.Colour = df3.Colour
The result should look like:
结果应如下所示:
Year Week Colour Val1 Val2 Val3
2014 A Red 50 Null Null
2014 B Red 60 Null 60
2014 B Black 70 100 Null
2014 C Red 10 20 Null
2014 D Green 20 Null Null
I have tried using merge and join but can't figure out how to do it on multiple tables and when there are multiple joints involved. Could someone help me on this please?
我已经尝试过使用merge和join但是无法弄清楚如何在多个表上执行它以及何时涉及多个关节。有人可以帮我吗?
Thanks
谢谢
2 个解决方案
#1
63
Merge them in two steps, df1
and df2
first, and then the result of that to df3
.
首先将它们合并为两个步骤,df1和df2,然后将结果合并到df3。
In [33]: s1 = pd.merge(df1, df2, how='left', on=['Year', 'Week', 'Colour'])
I dropped year from df3 since you don't need it for the last join.
我从df3掉了一年,因为你最后一次加入时不需要它。
In [39]: df = pd.merge(s1, df3[['Week', 'Colour', 'Val3']],
how='left', on=['Week', 'Colour'])
In [40]: df
Out[40]:
Year Week Colour Val1 Val2 Val3
0 2014 A Red 50 NaN NaN
1 2014 B Red 60 NaN 60
2 2014 B Black 70 100 10
3 2014 C Red 10 20 NaN
4 2014 D Green 20 NaN 20
[5 rows x 6 columns]
#2
6
One can also do this with a compact version of @TomAugspurger's answer, like so:
也可以使用@ TomAugspurger的答案的紧凑版本来做到这一点,如下所示:
df = df1.merge(df2, how='left', on=['Year', 'Week', 'Colour']).merge(df3[['Week', 'Colour', 'Val3']], how='left', on=['Week', 'Colour'])
#1
63
Merge them in two steps, df1
and df2
first, and then the result of that to df3
.
首先将它们合并为两个步骤,df1和df2,然后将结果合并到df3。
In [33]: s1 = pd.merge(df1, df2, how='left', on=['Year', 'Week', 'Colour'])
I dropped year from df3 since you don't need it for the last join.
我从df3掉了一年,因为你最后一次加入时不需要它。
In [39]: df = pd.merge(s1, df3[['Week', 'Colour', 'Val3']],
how='left', on=['Week', 'Colour'])
In [40]: df
Out[40]:
Year Week Colour Val1 Val2 Val3
0 2014 A Red 50 NaN NaN
1 2014 B Red 60 NaN 60
2 2014 B Black 70 100 10
3 2014 C Red 10 20 NaN
4 2014 D Green 20 NaN 20
[5 rows x 6 columns]
#2
6
One can also do this with a compact version of @TomAugspurger's answer, like so:
也可以使用@ TomAugspurger的答案的紧凑版本来做到这一点,如下所示:
df = df1.merge(df2, how='left', on=['Year', 'Week', 'Colour']).merge(df3[['Week', 'Colour', 'Val3']], how='left', on=['Week', 'Colour'])