With the DataFrame below as an example,
以下面的DataFrame为例,
In [83]:df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})dfOut[83]: A B values0 1 1 101 1 2 152 2 1 203 2 2 25
What would be a simple way to generate a new column containing some aggregation of the data over one of the columns?
生成包含其中一列的数据聚合的新列的简单方法是什么?
For example, if I sum values
over items in A
例如,如果我对A中的项目的值进行求和
In [84]:df.groupby('A').sum()['values']Out[84]:A1 252 45Name: values
How can I get
我怎样才能得到
A B values sum_values_A0 1 1 10 251 1 2 15 252 2 1 20 453 2 2 25 45
4 个解决方案
#1
25
In [20]: df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})In [21]: dfOut[21]: A B values0 1 1 101 1 2 152 2 1 203 2 2 25In [22]: df['sum_values_A'] = df.groupby('A')['values'].transform(np.sum)In [23]: dfOut[23]: A B values sum_values_A0 1 1 10 251 1 2 15 252 2 1 20 453 2 2 25 45
#2
3
I found a way using join
:
我找到了一种使用join的方法:
In [101]:aggregated = df.groupby('A').sum()['values']aggregated.name = 'sum_values_A'df.join(aggregated,on='A')Out[101]: A B values sum_values_A0 1 1 10 251 1 2 15 252 2 1 20 453 2 2 25 45
Anyone has a simpler way to do it?
任何人都有更简单的方法吗?
#3
3
This is not so direct but I found it very intuitive (the use of map to create new columns from another column) and can be applied to many other cases:
这不是那么直接但我发现它非常直观(使用map从另一列创建新列)并且可以应用于许多其他情况:
gb = df.groupby('A').sum()['values']def getvalue(x): return gb[x]df['sum'] = df['A'].map(getvalue)df
#4
3
In [15]: def sum_col(df, col, new_col): ....: df[new_col] = df[col].sum() ....: return dfIn [16]: df.groupby("A").apply(sum_col, 'values', 'sum_values_A')Out[16]: A B values sum_values_A0 1 1 10 251 1 2 15 252 2 1 20 453 2 2 25 45
#1
25
In [20]: df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})In [21]: dfOut[21]: A B values0 1 1 101 1 2 152 2 1 203 2 2 25In [22]: df['sum_values_A'] = df.groupby('A')['values'].transform(np.sum)In [23]: dfOut[23]: A B values sum_values_A0 1 1 10 251 1 2 15 252 2 1 20 453 2 2 25 45
#2
3
I found a way using join
:
我找到了一种使用join的方法:
In [101]:aggregated = df.groupby('A').sum()['values']aggregated.name = 'sum_values_A'df.join(aggregated,on='A')Out[101]: A B values sum_values_A0 1 1 10 251 1 2 15 252 2 1 20 453 2 2 25 45
Anyone has a simpler way to do it?
任何人都有更简单的方法吗?
#3
3
This is not so direct but I found it very intuitive (the use of map to create new columns from another column) and can be applied to many other cases:
这不是那么直接但我发现它非常直观(使用map从另一列创建新列)并且可以应用于许多其他情况:
gb = df.groupby('A').sum()['values']def getvalue(x): return gb[x]df['sum'] = df['A'].map(getvalue)df
#4
3
In [15]: def sum_col(df, col, new_col): ....: df[new_col] = df[col].sum() ....: return dfIn [16]: df.groupby("A").apply(sum_col, 'values', 'sum_values_A')Out[16]: A B values sum_values_A0 1 1 10 251 1 2 15 252 2 1 20 453 2 2 25 45