I have multiple data sets like this data set 1
我有多个数据集,如此数据集1
index| name | val|
指数|名字| VAL |
1 | a | 1 |
2 | b | 0 |
3 | c | 3 |
data set 2
数据集2
index| name | val|
指数|名字| VAL |
1 | g | 4 |
2 | a | 2 |
3 | k | 3 |
4 | l | 2 |
I want to combine these data sets in such a way that if the both the data sets have a row with a common element name, in this example, "a", i want to have only a single row for the combined dataset, where the value is sum of that a and this a, in this case the combined row a would have a val of 3 (2+1). index number for elements does not matter. is there an effective way to do this in excel itself? I'm new to querying data, but im trying to learn. If i can do this in pandas(i'm trying to make myself familiar in this language) or sql, I will do so. My data sets are of different sizes
我希望以这样的方式组合这些数据集:如果两个数据集都有一个具有公共元素名称的行,在本例中为“a”,我希望组合数据集只有一行,其中value是a和a之和的总和,在这种情况下,组合行a的val值为3(2 + 1)。元素的索引号无关紧要。有没有一种有效的方法在excel中做到这一点?我是新来的查询数据,但我正在努力学习。如果我能在熊猫中做到这一点(我试图让自己熟悉这种语言)或sql,我会这样做。我的数据集大小不一
2 个解决方案
#1
2
use:
df3 = df1.groupby('name').sum().add(df2.groupby('name').sum(), fill_value=0).reset_index()
df3['val'] = df3.fillna(0)[' val']+df3.fillna(0)['val']
df3 = df3.drop([' val'], axis=1)
print(df3)
Output:
name index val
0 a 3.0 3.0
1 b 2.0 0.0
2 c 3.0 3.0
3 g 1.0 4.0
4 k 3.0 3.0
5 l 4.0 2.0
#2
2
IN Sql you can try below query:
在Sql中你可以尝试以下查询:
select name,sum(val)
from
(select index,name,val from dataset1
union all
select index,name,val from dataset2) tmp
group by name
In Pandas:
df3=pd.concat([df1,df2],ignore_index=True)
df3.groupby(['name']).sum()
#1
2
use:
df3 = df1.groupby('name').sum().add(df2.groupby('name').sum(), fill_value=0).reset_index()
df3['val'] = df3.fillna(0)[' val']+df3.fillna(0)['val']
df3 = df3.drop([' val'], axis=1)
print(df3)
Output:
name index val
0 a 3.0 3.0
1 b 2.0 0.0
2 c 3.0 3.0
3 g 1.0 4.0
4 k 3.0 3.0
5 l 4.0 2.0
#2
2
IN Sql you can try below query:
在Sql中你可以尝试以下查询:
select name,sum(val)
from
(select index,name,val from dataset1
union all
select index,name,val from dataset2) tmp
group by name
In Pandas:
df3=pd.concat([df1,df2],ignore_index=True)
df3.groupby(['name']).sum()