GroupBy一列,对pandas中另一列分组记录进行自定义操作

时间:2022-12-11 22:14:24

I wanted to apply a custom operation on a column by grouping the values on another column. Group by column to get the count, then divide the another column value with this count for all the grouped records.

我想通过将值分组到另一列上来对列应用自定义操作。按列分组以获取计数,然后将所有分组记录的另一列值除以此计数。

My Data Frame:

我的数据框架:

   emp opp amount
0  a   1   10
1  b   1   10
2  c   2   30
3  b   2   30
4  d   2   30

My scenario:

我的情景:

  • For opp=1, two emp's worked(a,b). So the amount should be shared like 10/2 =5
  • 对于opp = 1,两个emp工作(a,b)。所以金额应该像10/2 = 5一样分享
  • For opp=2, two emp's worked(b,c,d). So the amount should be like 30/3 = 10
  • 对于opp = 2,两个emp工作(b,c,d)。所以金额应该是30/3 = 10

Final Output DataFrame:

最终输出数据框架:

      emp opp amount
    0  a   1   5
    1  b   1   5
    2  c   2   10
    3  b   2   10
    4  d   2   10

What is the best possible to do so

什么是最好的可能

2 个解决方案

#1


4  

df['amount'] = df.groupby('opp')['amount'].transform(lambda g: g/g.size)

df
#  emp  opp amount
# 0  a    1      5
# 1  b    1      5
# 2  c    2     10
# 3  b    2     10
# 4  d    2     10

Or:

要么:

df['amount'] = df.groupby('opp')['amount'].apply(lambda g: g/g.size)

does similar thing.

做类似的事情。

#2


3  

You could try something like this:

你可以尝试这样的事情:

df2 = df.groupby('opp').amount.count()
df.loc[:, 'calculated'] = df.apply( lambda row: \
                                  row.amount / df2.ix[row.opp], axis=1)
df

Yields:

产量:

  emp  opp  amount  calculated
0   a    1      10           5
1   b    1      10           5
2   c    2      30          10
3   b    2      30          10
4   d    2      30          10

#1


4  

df['amount'] = df.groupby('opp')['amount'].transform(lambda g: g/g.size)

df
#  emp  opp amount
# 0  a    1      5
# 1  b    1      5
# 2  c    2     10
# 3  b    2     10
# 4  d    2     10

Or:

要么:

df['amount'] = df.groupby('opp')['amount'].apply(lambda g: g/g.size)

does similar thing.

做类似的事情。

#2


3  

You could try something like this:

你可以尝试这样的事情:

df2 = df.groupby('opp').amount.count()
df.loc[:, 'calculated'] = df.apply( lambda row: \
                                  row.amount / df2.ix[row.opp], axis=1)
df

Yields:

产量:

  emp  opp  amount  calculated
0   a    1      10           5
1   b    1      10           5
2   c    2      30          10
3   b    2      30          10
4   d    2      30          10