根据行中的值合并pandas数据帧

时间:2021-02-07 22:55:38

I have two dataframes df1 and df2 and would like to merge them to a df3, based on the values in one of the columns, as shown below.

我有两个数据帧df1和df2,并希望根据其中一列中的值将它们合并到df3,如下所示。

Please, how do I accomplish this?

请问,我该如何做到这一点?

df1:

+---+----+
|   | b  |
+---+----+
| 1 |  3 |
| 2 |  4 |
| 3 |  7 |
| 4 |  8 |
| 5 | 10 |
+---+----+

df2:

+---+-------+-----+
|   |   x   |  y  |
+---+-------+-----+
| 3 | True  | 5.4 |
| 3 | False | 6.9 |
| 4 | True  | 9.8 |
| 7 | True  | 7.8 |
| 8 | False | 5.6 |
+---+-------+-----+

df3:

+---+---+--------+-----+
|   | b | y_notx | y_x |
+---+---+--------+-----+
| 1 | 3 | 6.9    | 5.4 |
| 2 | 4 | NaN    | 9.8 |
| 3 | 7 | NaN    | 7.8 |
| 4 | 8 | 5.6    | NaN |
+---+---+--------+-----+

The code:

import pandas as pd

t1 = {'b': [3, 4, 7, 8, 10]}

df1 = pd.DataFrame(t1, index=[1,2,3,4,5])


t2 = {'x' : [True, False, True, True, False],
     'y' : [5.4,6.9,9.8,7.8,5.6]}

df2 = pd.DataFrame(t2, index=[3,3,4,7,8])

t3 = {'b': [3, 4, 7, 8],
      'y_x': [5.4, 9.8, 7.8, pd.np.nan],
      'y_notx': [6.9, pd.np.nan, pd.np.nan, 5.6]}

df3 = pd.DataFrame(t3, index=[1, 2, 3, 4])

1 个解决方案

#1


3  

I think need:

我认为需要:

df4 = (df2.reset_index().pivot('index','x','y')
         .rename_axis('b')
         .reset_index()
         .merge(df1, on='b')
         .rename(columns={True:'y_x', False:'y_notx'}))
print (df4)
   b  y_notx  y_x
0  3     6.9  5.4
1  4     NaN  9.8
2  7     NaN  7.8
3  8     5.6  NaN

Explanation:

  1. First pivot second DataFrame
  2. 第一个第二个DataFrame

  3. merge by inner join (default)
  4. 内连接合并(默认)

  5. rename boolean columns
  6. 重命名布尔列

EDIT:

Solution for multiple columns:

多列解决方案:

t1 = {'b': [3, 4, 7, 8, 10], 'c':range(5)}

df1 = pd.DataFrame(t1, index=[1,2,3,4,5])


t2 = {'x' : [True, False, True, True, False],
     'y' : [5.4,6.9,9.8,7.8,5.6],
      'v':np.arange(5) + 4.8,
      'w':np.arange(5) -2.75,
      'Z':np.arange(5) * 0.75  }

df2 = pd.DataFrame(t2, index=[3,3,4,7,8])

t3 = {'b': [3, 4, 7, 8],
      'y_x': [5.4, 9.8, 7.8, pd.np.nan],
      'y_notx': [6.9, pd.np.nan, pd.np.nan, 5.6]}

df3 = pd.DataFrame(t3, index=[1, 2, 3, 4])

df4 = (df2.set_index('x', append=True)
          .unstack()
          .rename(columns={True:'_x', False:'_notx'})
          .rename_axis('b'))
df4.columns = df4.columns.map('_'.join)
df4 = df4.reset_index() .merge(df1, on='b')

print (df4)

   b  Z__notx  Z__x  v__notx  v__x  w__notx  w__x  y__notx  y__x  c
0  3     0.75  0.00      5.8   4.8    -1.75 -2.75      6.9   5.4  0
1  4      NaN  1.50      NaN   6.8      NaN -0.75      NaN   9.8  1
2  7      NaN  2.25      NaN   7.8      NaN  0.25      NaN   7.8  2
3  8     3.00   NaN      8.8   NaN     1.25   NaN      5.6   NaN  3

#1


3  

I think need:

我认为需要:

df4 = (df2.reset_index().pivot('index','x','y')
         .rename_axis('b')
         .reset_index()
         .merge(df1, on='b')
         .rename(columns={True:'y_x', False:'y_notx'}))
print (df4)
   b  y_notx  y_x
0  3     6.9  5.4
1  4     NaN  9.8
2  7     NaN  7.8
3  8     5.6  NaN

Explanation:

  1. First pivot second DataFrame
  2. 第一个第二个DataFrame

  3. merge by inner join (default)
  4. 内连接合并(默认)

  5. rename boolean columns
  6. 重命名布尔列

EDIT:

Solution for multiple columns:

多列解决方案:

t1 = {'b': [3, 4, 7, 8, 10], 'c':range(5)}

df1 = pd.DataFrame(t1, index=[1,2,3,4,5])


t2 = {'x' : [True, False, True, True, False],
     'y' : [5.4,6.9,9.8,7.8,5.6],
      'v':np.arange(5) + 4.8,
      'w':np.arange(5) -2.75,
      'Z':np.arange(5) * 0.75  }

df2 = pd.DataFrame(t2, index=[3,3,4,7,8])

t3 = {'b': [3, 4, 7, 8],
      'y_x': [5.4, 9.8, 7.8, pd.np.nan],
      'y_notx': [6.9, pd.np.nan, pd.np.nan, 5.6]}

df3 = pd.DataFrame(t3, index=[1, 2, 3, 4])

df4 = (df2.set_index('x', append=True)
          .unstack()
          .rename(columns={True:'_x', False:'_notx'})
          .rename_axis('b'))
df4.columns = df4.columns.map('_'.join)
df4 = df4.reset_index() .merge(df1, on='b')

print (df4)

   b  Z__notx  Z__x  v__notx  v__x  w__notx  w__x  y__notx  y__x  c
0  3     0.75  0.00      5.8   4.8    -1.75 -2.75      6.9   5.4  0
1  4      NaN  1.50      NaN   6.8      NaN -0.75      NaN   9.8  1
2  7      NaN  2.25      NaN   7.8      NaN  0.25      NaN   7.8  2
3  8     3.00   NaN      8.8   NaN     1.25   NaN      5.6   NaN  3