combining dataframes, and adding values of common elements

时间:2021-04-08 13:08:40

I have multiple data sets like this data set 1

我有多个数据集,如此数据集1

index| name | val|

指数|名字| VAL |

1 |  a   | 1  | 
2 |  b   | 0  |
3 |  c   | 3  |

data set 2

数据集2

index| name | val|

指数|名字| VAL |

1 |  g   | 4  | 
2 |  a   | 2  |
3 |  k   | 3  |
4 |  l   | 2  |

I want to combine these data sets in such a way that if the both the data sets have a row with a common element name, in this example, "a", i want to have only a single row for the combined dataset, where the value is sum of that a and this a, in this case the combined row a would have a val of 3 (2+1). index number for elements does not matter. is there an effective way to do this in excel itself? I'm new to querying data, but im trying to learn. If i can do this in pandas(i'm trying to make myself familiar in this language) or sql, I will do so. My data sets are of different sizes

我希望以这样的方式组合这些数据集:如果两个数据集都有一个具有公共元素名称的行,在本例中为“a”,我希望组合数据集只有一行,其中value是a和a之和的总和,在这种情况下,组合行a的val值为3(2 + 1)。元素的索引号无关紧要。有没有一种有效的方法在excel中做到这一点?我是新来的查询数据,但我正在努力学习。如果我能在熊猫中做到这一点(我试图让自己熟悉这种语言)或sql,我会这样做。我的数据集大小不一

2 个解决方案

#1


2  

use:

df3 = df1.groupby('name').sum().add(df2.groupby('name').sum(), fill_value=0).reset_index()
df3['val'] = df3.fillna(0)[' val']+df3.fillna(0)['val']
df3 = df3.drop([' val'], axis=1)
print(df3)

Output:

    name   index   val
0    a     3.0     3.0 
1    b     2.0     0.0 
2    c     3.0     3.0 
3    g     1.0     4.0 
4    k     3.0     3.0 
5    l     4.0     2.0 

#2


2  

IN Sql you can try below query:

在Sql中你可以尝试以下查询:

select name,sum(val)
from
(select index,name,val from dataset1
union all
select index,name,val from dataset2) tmp
group by name

In Pandas:

df3=pd.concat([df1,df2],ignore_index=True)
df3.groupby(['name']).sum()

#1


2  

use:

df3 = df1.groupby('name').sum().add(df2.groupby('name').sum(), fill_value=0).reset_index()
df3['val'] = df3.fillna(0)[' val']+df3.fillna(0)['val']
df3 = df3.drop([' val'], axis=1)
print(df3)

Output:

    name   index   val
0    a     3.0     3.0 
1    b     2.0     0.0 
2    c     3.0     3.0 
3    g     1.0     4.0 
4    k     3.0     3.0 
5    l     4.0     2.0 

#2


2  

IN Sql you can try below query:

在Sql中你可以尝试以下查询:

select name,sum(val)
from
(select index,name,val from dataset1
union all
select index,name,val from dataset2) tmp
group by name

In Pandas:

df3=pd.concat([df1,df2],ignore_index=True)
df3.groupby(['name']).sum()