I have two dataframes df1 and df2 and would like to merge them to a df3, based on the values in one of the columns, as shown below.
我有两个数据帧df1和df2,并希望根据其中一列中的值将它们合并到df3,如下所示。
Please, how do I accomplish this?
请问,我该如何做到这一点?
df1:
+---+----+
| | b |
+---+----+
| 1 | 3 |
| 2 | 4 |
| 3 | 7 |
| 4 | 8 |
| 5 | 10 |
+---+----+
df2:
+---+-------+-----+
| | x | y |
+---+-------+-----+
| 3 | True | 5.4 |
| 3 | False | 6.9 |
| 4 | True | 9.8 |
| 7 | True | 7.8 |
| 8 | False | 5.6 |
+---+-------+-----+
df3:
+---+---+--------+-----+
| | b | y_notx | y_x |
+---+---+--------+-----+
| 1 | 3 | 6.9 | 5.4 |
| 2 | 4 | NaN | 9.8 |
| 3 | 7 | NaN | 7.8 |
| 4 | 8 | 5.6 | NaN |
+---+---+--------+-----+
The code:
import pandas as pd
t1 = {'b': [3, 4, 7, 8, 10]}
df1 = pd.DataFrame(t1, index=[1,2,3,4,5])
t2 = {'x' : [True, False, True, True, False],
'y' : [5.4,6.9,9.8,7.8,5.6]}
df2 = pd.DataFrame(t2, index=[3,3,4,7,8])
t3 = {'b': [3, 4, 7, 8],
'y_x': [5.4, 9.8, 7.8, pd.np.nan],
'y_notx': [6.9, pd.np.nan, pd.np.nan, 5.6]}
df3 = pd.DataFrame(t3, index=[1, 2, 3, 4])
1 个解决方案
#1
3
I think need:
我认为需要:
df4 = (df2.reset_index().pivot('index','x','y')
.rename_axis('b')
.reset_index()
.merge(df1, on='b')
.rename(columns={True:'y_x', False:'y_notx'}))
print (df4)
b y_notx y_x
0 3 6.9 5.4
1 4 NaN 9.8
2 7 NaN 7.8
3 8 5.6 NaN
Explanation:
- First
pivot
secondDataFrame
-
merge
by inner join (default) - rename boolean columns
第一个第二个DataFrame
内连接合并(默认)
重命名布尔列
EDIT:
Solution for multiple columns:
多列解决方案:
t1 = {'b': [3, 4, 7, 8, 10], 'c':range(5)}
df1 = pd.DataFrame(t1, index=[1,2,3,4,5])
t2 = {'x' : [True, False, True, True, False],
'y' : [5.4,6.9,9.8,7.8,5.6],
'v':np.arange(5) + 4.8,
'w':np.arange(5) -2.75,
'Z':np.arange(5) * 0.75 }
df2 = pd.DataFrame(t2, index=[3,3,4,7,8])
t3 = {'b': [3, 4, 7, 8],
'y_x': [5.4, 9.8, 7.8, pd.np.nan],
'y_notx': [6.9, pd.np.nan, pd.np.nan, 5.6]}
df3 = pd.DataFrame(t3, index=[1, 2, 3, 4])
df4 = (df2.set_index('x', append=True)
.unstack()
.rename(columns={True:'_x', False:'_notx'})
.rename_axis('b'))
df4.columns = df4.columns.map('_'.join)
df4 = df4.reset_index() .merge(df1, on='b')
print (df4)
b Z__notx Z__x v__notx v__x w__notx w__x y__notx y__x c
0 3 0.75 0.00 5.8 4.8 -1.75 -2.75 6.9 5.4 0
1 4 NaN 1.50 NaN 6.8 NaN -0.75 NaN 9.8 1
2 7 NaN 2.25 NaN 7.8 NaN 0.25 NaN 7.8 2
3 8 3.00 NaN 8.8 NaN 1.25 NaN 5.6 NaN 3
#1
3
I think need:
我认为需要:
df4 = (df2.reset_index().pivot('index','x','y')
.rename_axis('b')
.reset_index()
.merge(df1, on='b')
.rename(columns={True:'y_x', False:'y_notx'}))
print (df4)
b y_notx y_x
0 3 6.9 5.4
1 4 NaN 9.8
2 7 NaN 7.8
3 8 5.6 NaN
Explanation:
- First
pivot
secondDataFrame
-
merge
by inner join (default) - rename boolean columns
第一个第二个DataFrame
内连接合并(默认)
重命名布尔列
EDIT:
Solution for multiple columns:
多列解决方案:
t1 = {'b': [3, 4, 7, 8, 10], 'c':range(5)}
df1 = pd.DataFrame(t1, index=[1,2,3,4,5])
t2 = {'x' : [True, False, True, True, False],
'y' : [5.4,6.9,9.8,7.8,5.6],
'v':np.arange(5) + 4.8,
'w':np.arange(5) -2.75,
'Z':np.arange(5) * 0.75 }
df2 = pd.DataFrame(t2, index=[3,3,4,7,8])
t3 = {'b': [3, 4, 7, 8],
'y_x': [5.4, 9.8, 7.8, pd.np.nan],
'y_notx': [6.9, pd.np.nan, pd.np.nan, 5.6]}
df3 = pd.DataFrame(t3, index=[1, 2, 3, 4])
df4 = (df2.set_index('x', append=True)
.unstack()
.rename(columns={True:'_x', False:'_notx'})
.rename_axis('b'))
df4.columns = df4.columns.map('_'.join)
df4 = df4.reset_index() .merge(df1, on='b')
print (df4)
b Z__notx Z__x v__notx v__x w__notx w__x y__notx y__x c
0 3 0.75 0.00 5.8 4.8 -1.75 -2.75 6.9 5.4 0
1 4 NaN 1.50 NaN 6.8 NaN -0.75 NaN 9.8 1
2 7 NaN 2.25 NaN 7.8 NaN 0.25 NaN 7.8 2
3 8 3.00 NaN 8.8 NaN 1.25 NaN 5.6 NaN 3