My question is related to my previous Question but it's different. So I am asking the new question.
我的问题和我之前的问题有关,但不同。所以我提出了一个新的问题。
In above question see the answer of @jezrael.
在上述问题中,请参阅@jezrael的答案。
df = pd.DataFrame({'col1':[1,1,1],
'col2':[4,4,6],
'col3':[7,7,9],
'col4':[3,3,5]})
print (df)
col1 col2 col3 col4
0 1 4 7 3
1 1 4 7 3
2 1 6 9 5
df1 = df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'})
df1['result_col'] = df1['col3'].div(df1['col4'])
print (df1)
col4 col3 result_col
col1 col2
1 4 1 2 2.0
6 1 1 1.0
Now here I want to take count for the specific value of col4
. Say I also want to take count of col4 == 3
in the same query.
现在我要计算col4的具体值。假设我还想在同一个查询中取col4 == 3的count。
df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique'}) ... + count(col4=='3')
How to do this in same above query I have tried bellow but not getting solution.
如何在上面的查询中做到这一点,我尝试过bellow但是没有得到解决。
df.groupby(['col1','col2']).agg({'col3':'size','col4':'nunique','col4':'x: lambda x[x == 7].count()'})
2 个解决方案
#1
2
Do some preprocessing by including the col4==3
as a column ahead of time. Then use aggregate
提前将col4== =3作为列进行一些预处理。然后使用聚合
df.assign(result_col=df.col4.eq(3).astype(int)).groupby(
['col1', 'col2']
).agg(dict(col3='size', col4='nunique', result_col='sum'))
col3 result_col col4
col1 col2
1 4 2 2 1
6 1 0 1
old answers
旧的答案
g = df.groupby(['col1', 'col2'])
g.agg({'col3':'size','col4': 'nunique'}).assign(
result_col=g.col4.apply(lambda x: x.eq(3).sum()))
col3 col4 result_col
col1 col2
1 4 2 1 2
6 1 1 0
slightly rearranged
稍微重新安排
g = df.groupby(['col1', 'col2'])
final_df = g.agg({'col3':'size','col4': 'nunique'})
final_df.insert(1, 'result_col', g.col4.apply(lambda x: x.eq(3).sum()))
final_df
col3 result_col col4
col1 col2
1 4 2 2 1
6 1 0 1
#2
2
I think you need aggregate
with list of function in dict
for column col4
.
我认为在第col4列中需要用函数列表进行聚合。
If need count 3
values the simpliest is sum
True
values in x == 3
:
如果需要数3个值,最简单的是x = 3的和真值:
df1 = df.groupby(['col1','col2'])
.agg({'col3':'size','col4': ['nunique', lambda x: (x == 3).sum()]})
df1 = df1.rename(columns={'<lambda>':'count_3'})
df1.columns = ['{}_{}'.format(x[0], x[1]) for x in df1.columns]
print (df1)
col4_nunique col4_count_3 col3_size
col1 col2
1 4 1 2 2
6 1 0 1
#1
2
Do some preprocessing by including the col4==3
as a column ahead of time. Then use aggregate
提前将col4== =3作为列进行一些预处理。然后使用聚合
df.assign(result_col=df.col4.eq(3).astype(int)).groupby(
['col1', 'col2']
).agg(dict(col3='size', col4='nunique', result_col='sum'))
col3 result_col col4
col1 col2
1 4 2 2 1
6 1 0 1
old answers
旧的答案
g = df.groupby(['col1', 'col2'])
g.agg({'col3':'size','col4': 'nunique'}).assign(
result_col=g.col4.apply(lambda x: x.eq(3).sum()))
col3 col4 result_col
col1 col2
1 4 2 1 2
6 1 1 0
slightly rearranged
稍微重新安排
g = df.groupby(['col1', 'col2'])
final_df = g.agg({'col3':'size','col4': 'nunique'})
final_df.insert(1, 'result_col', g.col4.apply(lambda x: x.eq(3).sum()))
final_df
col3 result_col col4
col1 col2
1 4 2 2 1
6 1 0 1
#2
2
I think you need aggregate
with list of function in dict
for column col4
.
我认为在第col4列中需要用函数列表进行聚合。
If need count 3
values the simpliest is sum
True
values in x == 3
:
如果需要数3个值,最简单的是x = 3的和真值:
df1 = df.groupby(['col1','col2'])
.agg({'col3':'size','col4': ['nunique', lambda x: (x == 3).sum()]})
df1 = df1.rename(columns={'<lambda>':'count_3'})
df1.columns = ['{}_{}'.format(x[0], x[1]) for x in df1.columns]
print (df1)
col4_nunique col4_count_3 col3_size
col1 col2
1 4 1 2 2
6 1 0 1