Suppose I have a Pandas DataFrame called df
with columns a
and b
and what I want is the number of distinct values of b
per each a
. I would do:
假设我有一个名为df的Pandas DataFrame,列a和b,我想要的是每个a的b的不同值的数量。我会做:
distcounts = df.groupby('a')['b'].nunique()
which gives the desidered result, but it is as Series object rather than another DataFrame. I'd like a DataFrame instead. In regular SQL, I'd do:
它给出了desidered结果,但它是Series对象而不是另一个DataFrame。我想要一个DataFrame。在常规SQL中,我会这样做:
SELECT a, COUNT(DISTINCT(b)) FROM df
and haven't been able to emulate this query in Pandas exactly. How to?
并且无法完全在Pandas中模拟此查询。如何?
2 个解决方案
#1
4
I think you need reset_index
:
我想你需要reset_index:
distcounts = df.groupby('a')['b'].nunique().reset_index()
Sample:
样品:
df = pd.DataFrame({'a':[7,8,8],
'b':[4,5,6]})
print (df)
a b
0 7 4
1 8 5
2 8 6
distcounts = df.groupby('a')['b'].nunique().reset_index()
print (distcounts)
a b
0 7 1
1 8 2
#2
3
Another alternative using Groupby.agg
instead:
使用Groupby.agg的另一种替代方法:
df.groupby('a', as_index=False).agg({'b': 'nunique'})
#1
4
I think you need reset_index
:
我想你需要reset_index:
distcounts = df.groupby('a')['b'].nunique().reset_index()
Sample:
样品:
df = pd.DataFrame({'a':[7,8,8],
'b':[4,5,6]})
print (df)
a b
0 7 4
1 8 5
2 8 6
distcounts = df.groupby('a')['b'].nunique().reset_index()
print (distcounts)
a b
0 7 1
1 8 2
#2
3
Another alternative using Groupby.agg
instead:
使用Groupby.agg的另一种替代方法:
df.groupby('a', as_index=False).agg({'b': 'nunique'})