I am new to pandas and can't seem to get this to work with merge function:
我是pandas的新手,似乎无法使用merge函数:
>>> left >>> right
a b c a c d
0 1 4 9 0 1 7 13
1 2 5 10 1 2 8 14
2 3 6 11 2 3 9 15
3 4 7 12
With a left join on column a, I would like to update common columns BY THE JOINED KEYS. Note last value in column c is from LEFT table since there is no match.
在a列的左连接中,我想通过JOINED KEYS更新常用列。注意列c中的最后一个值来自LEFT表,因为没有匹配项。
>>> final
a b c d
0 1 4 7 13
1 2 5 8 14
2 3 6 9 15
3 4 7 12 NAN
How should I do this with Pandas merge function? Thank you.
我应该如何使用Pandas合并功能?谢谢。
3 个解决方案
#1
One way to do this is to set the a column as the index and update
:
一种方法是将a列设置为索引并更新:
In [11]: left_a = left.set_index('a')
In [12]: right_a = right.set_index('a')
Note: update
only does a left join (not merges), so as well as set_index you also need to include the additional columns not present in left_a
.
注意:更新只执行左连接(不合并),因此set_index还需要包含left_a中不存在的其他列。
In [13]: res = left_a.reindex(columns=left_a.columns.union(right_a.columns))
In [14]: res.update(right_a)
In [15]: res.reset_index(inplace=True)
In [16]: res
Out[16]:
a b c d
0 1 4 7 13
1 2 5 8 14
2 3 6 9 15
3 4 7 12 NaN
#2
You can use merge()
between left
and right
with how='left'
on 'a'
column.
您可以在左侧和右侧之间使用merge(),在'a'列上使用how ='left'。
In [74]: final = left.merge(right, on='a', how='left')
In [75]: final
Out[75]:
a b c_x c_y d
0 1 4 9 7 13
1 2 5 10 8 14
2 3 6 11 9 15
3 4 7 12 NaN NaN
Replace NaN
value from c_y
with c_x
value
将c_y中的NaN值替换为c_x值
In [76]: final['c'] = final['c_y'].fillna(final['c_x'])
In [77]: final
Out[77]:
a b c_x c_y d c
0 1 4 9 7 13 7
1 2 5 10 8 14 8
2 3 6 11 9 15 9
3 4 7 12 NaN NaN 12
Drop unwanted columns, and you have the result
删除不需要的列,您就得到了结果
In [79]: final.drop(['c_x', 'c_y'], axis=1)
Out[79]:
a b d c
0 1 4 13 7
1 2 5 14 8
2 3 6 15 9
3 4 7 NaN 12
#3
Here's a way to do it with join
:
这是一种使用join的方法:
In [632]: t = left.set_index('a').join(right.set_index('a'), rsuffix='_right')
In [633]: t
Out[633]:
b c c_right d
a
1 4 9 7 13
2 5 10 8 14
3 6 11 9 15
4 7 12 NaN NaN
Now, we want to set null values of c_right
(which is from the right
dataframe) with values from c
column from the left
dataframe. Updated the below process with a method taking from @John Galt's answer
现在,我们要设置c_right的空值(来自右侧数据帧),其值来自左侧数据帧中的c列。使用@John Galt的答案更新了以下过程
In [657]: t['c_right'] = t['c_right'].fillna(t['c'])
In [658]: t
Out[658]:
b c c_right d
a
1 4 9 7 13
2 5 10 8 14
3 6 11 9 15
4 7 12 12 NaN
In [659]: t.drop('c_right', axis=1)
Out[659]:
b c d
a
1 4 9 13
2 5 10 14
3 6 11 15
4 7 12 NaN
#1
One way to do this is to set the a column as the index and update
:
一种方法是将a列设置为索引并更新:
In [11]: left_a = left.set_index('a')
In [12]: right_a = right.set_index('a')
Note: update
only does a left join (not merges), so as well as set_index you also need to include the additional columns not present in left_a
.
注意:更新只执行左连接(不合并),因此set_index还需要包含left_a中不存在的其他列。
In [13]: res = left_a.reindex(columns=left_a.columns.union(right_a.columns))
In [14]: res.update(right_a)
In [15]: res.reset_index(inplace=True)
In [16]: res
Out[16]:
a b c d
0 1 4 7 13
1 2 5 8 14
2 3 6 9 15
3 4 7 12 NaN
#2
You can use merge()
between left
and right
with how='left'
on 'a'
column.
您可以在左侧和右侧之间使用merge(),在'a'列上使用how ='left'。
In [74]: final = left.merge(right, on='a', how='left')
In [75]: final
Out[75]:
a b c_x c_y d
0 1 4 9 7 13
1 2 5 10 8 14
2 3 6 11 9 15
3 4 7 12 NaN NaN
Replace NaN
value from c_y
with c_x
value
将c_y中的NaN值替换为c_x值
In [76]: final['c'] = final['c_y'].fillna(final['c_x'])
In [77]: final
Out[77]:
a b c_x c_y d c
0 1 4 9 7 13 7
1 2 5 10 8 14 8
2 3 6 11 9 15 9
3 4 7 12 NaN NaN 12
Drop unwanted columns, and you have the result
删除不需要的列,您就得到了结果
In [79]: final.drop(['c_x', 'c_y'], axis=1)
Out[79]:
a b d c
0 1 4 13 7
1 2 5 14 8
2 3 6 15 9
3 4 7 NaN 12
#3
Here's a way to do it with join
:
这是一种使用join的方法:
In [632]: t = left.set_index('a').join(right.set_index('a'), rsuffix='_right')
In [633]: t
Out[633]:
b c c_right d
a
1 4 9 7 13
2 5 10 8 14
3 6 11 9 15
4 7 12 NaN NaN
Now, we want to set null values of c_right
(which is from the right
dataframe) with values from c
column from the left
dataframe. Updated the below process with a method taking from @John Galt's answer
现在,我们要设置c_right的空值(来自右侧数据帧),其值来自左侧数据帧中的c列。使用@John Galt的答案更新了以下过程
In [657]: t['c_right'] = t['c_right'].fillna(t['c'])
In [658]: t
Out[658]:
b c c_right d
a
1 4 9 7 13
2 5 10 8 14
3 6 11 9 15
4 7 12 12 NaN
In [659]: t.drop('c_right', axis=1)
Out[659]:
b c d
a
1 4 9 13
2 5 10 14
3 6 11 15
4 7 12 NaN