pandas离开了join并更新了现有的列

时间:2021-07-14 22:58:08

I am new to pandas and can't seem to get this to work with merge function:

我是pandas的新手,似乎无法使用merge函数:

>>> left       >>> right
   a  b   c       a  c   d 
0  1  4   9    0  1  7  13
1  2  5  10    1  2  8  14
2  3  6  11    2  3  9  15
3  4  7  12    

With a left join on column a, I would like to update common columns BY THE JOINED KEYS. Note last value in column c is from LEFT table since there is no match.

在a列的左连接中,我想通过JOINED KEYS更新常用列。注意列c中的最后一个值来自LEFT表,因为没有匹配项。

>>> final       
   a  b   c   d 
0  1  4   7   13
1  2  5   8   14
2  3  6   9   15
3  4  7   12  NAN 

How should I do this with Pandas merge function? Thank you.

我应该如何使用Pandas合并功能?谢谢。

3 个解决方案

#1


One way to do this is to set the a column as the index and update:

一种方法是将a列设置为索引并更新:

In [11]: left_a = left.set_index('a')

In [12]: right_a = right.set_index('a')

Note: update only does a left join (not merges), so as well as set_index you also need to include the additional columns not present in left_a.

注意:更新只执行左连接(不合并),因此set_index还需要包含left_a中不存在的其他列。

In [13]: res = left_a.reindex(columns=left_a.columns.union(right_a.columns))

In [14]: res.update(right_a)

In [15]: res.reset_index(inplace=True)

In [16]: res
Out[16]:
   a   b   c   d
0  1   4   7  13
1  2   5   8  14
2  3   6   9  15
3  4   7  12 NaN

#2


You can use merge() between left and right with how='left' on 'a' column.

您可以在左侧和右侧之间使用merge(),在'a'列上使用how ='left'。

In [74]: final = left.merge(right, on='a', how='left')

In [75]: final
Out[75]:
   a  b  c_x  c_y   d
0  1  4    9    7  13
1  2  5   10    8  14
2  3  6   11    9  15
3  4  7   12  NaN NaN

Replace NaN value from c_y with c_x value

将c_y中的NaN值替换为c_x值

In [76]: final['c'] = final['c_y'].fillna(final['c_x'])

In [77]: final
Out[77]:
   a  b  c_x  c_y   d   c
0  1  4    9    7  13   7
1  2  5   10    8  14   8
2  3  6   11    9  15   9
3  4  7   12  NaN NaN  12

Drop unwanted columns, and you have the result

删除不需要的列,您就得到了结果

In [79]: final.drop(['c_x', 'c_y'], axis=1)
Out[79]:
   a  b   d   c
0  1  4  13   7
1  2  5  14   8
2  3  6  15   9
3  4  7 NaN  12

#3


Here's a way to do it with join:

这是一种使用join的方法:

In [632]: t = left.set_index('a').join(right.set_index('a'), rsuffix='_right')

In [633]: t
Out[633]: 
   b   c  c_right   d
a                    
1  4   9        7  13
2  5  10        8  14
3  6  11        9  15
4  7  12      NaN NaN

Now, we want to set null values of c_right (which is from the right dataframe) with values from c column from the left dataframe. Updated the below process with a method taking from @John Galt's answer

现在,我们要设置c_right的空值(来自右侧数据帧),其值来自左侧数据帧中的c列。使用@John Galt的答案更新了以下过程

In [657]: t['c_right'] = t['c_right'].fillna(t['c'])

In [658]: t
Out[658]: 
   b   c  c_right   d
a                    
1  4   9        7  13
2  5  10        8  14
3  6  11        9  15
4  7  12       12 NaN

In [659]: t.drop('c_right', axis=1)
Out[659]: 
   b   c   d
a           
1  4   9  13
2  5  10  14
3  6  11  15
4  7  12 NaN

#1


One way to do this is to set the a column as the index and update:

一种方法是将a列设置为索引并更新:

In [11]: left_a = left.set_index('a')

In [12]: right_a = right.set_index('a')

Note: update only does a left join (not merges), so as well as set_index you also need to include the additional columns not present in left_a.

注意:更新只执行左连接(不合并),因此set_index还需要包含left_a中不存在的其他列。

In [13]: res = left_a.reindex(columns=left_a.columns.union(right_a.columns))

In [14]: res.update(right_a)

In [15]: res.reset_index(inplace=True)

In [16]: res
Out[16]:
   a   b   c   d
0  1   4   7  13
1  2   5   8  14
2  3   6   9  15
3  4   7  12 NaN

#2


You can use merge() between left and right with how='left' on 'a' column.

您可以在左侧和右侧之间使用merge(),在'a'列上使用how ='left'。

In [74]: final = left.merge(right, on='a', how='left')

In [75]: final
Out[75]:
   a  b  c_x  c_y   d
0  1  4    9    7  13
1  2  5   10    8  14
2  3  6   11    9  15
3  4  7   12  NaN NaN

Replace NaN value from c_y with c_x value

将c_y中的NaN值替换为c_x值

In [76]: final['c'] = final['c_y'].fillna(final['c_x'])

In [77]: final
Out[77]:
   a  b  c_x  c_y   d   c
0  1  4    9    7  13   7
1  2  5   10    8  14   8
2  3  6   11    9  15   9
3  4  7   12  NaN NaN  12

Drop unwanted columns, and you have the result

删除不需要的列,您就得到了结果

In [79]: final.drop(['c_x', 'c_y'], axis=1)
Out[79]:
   a  b   d   c
0  1  4  13   7
1  2  5  14   8
2  3  6  15   9
3  4  7 NaN  12

#3


Here's a way to do it with join:

这是一种使用join的方法:

In [632]: t = left.set_index('a').join(right.set_index('a'), rsuffix='_right')

In [633]: t
Out[633]: 
   b   c  c_right   d
a                    
1  4   9        7  13
2  5  10        8  14
3  6  11        9  15
4  7  12      NaN NaN

Now, we want to set null values of c_right (which is from the right dataframe) with values from c column from the left dataframe. Updated the below process with a method taking from @John Galt's answer

现在,我们要设置c_right的空值(来自右侧数据帧),其值来自左侧数据帧中的c列。使用@John Galt的答案更新了以下过程

In [657]: t['c_right'] = t['c_right'].fillna(t['c'])

In [658]: t
Out[658]: 
   b   c  c_right   d
a                    
1  4   9        7  13
2  5  10        8  14
3  6  11        9  15
4  7  12       12 NaN

In [659]: t.drop('c_right', axis=1)
Out[659]: 
   b   c   d
a           
1  4   9  13
2  5  10  14
3  6  11  15
4  7  12 NaN