Pandas明显算作DataFrame

时间:2021-12-21 01:54:41

Suppose I have a Pandas DataFrame called df with columns a and b and what I want is the number of distinct values of b per each a. I would do:

假设我有一个名为df的Pandas DataFrame,列a和b,我想要的是每个a的b的不同值的数量。我会做:

distcounts = df.groupby('a')['b'].nunique()

which gives the desidered result, but it is as Series object rather than another DataFrame. I'd like a DataFrame instead. In regular SQL, I'd do:

它给出了desidered结果,但它是Series对象而不是另一个DataFrame。我想要一个DataFrame。在常规SQL中,我会这样做:

SELECT a, COUNT(DISTINCT(b)) FROM df

and haven't been able to emulate this query in Pandas exactly. How to?

并且无法完全在Pandas中模拟此查询。如何?

2 个解决方案

#1


4  

I think you need reset_index:

我想你需要reset_index:

distcounts = df.groupby('a')['b'].nunique().reset_index()

Sample:

样品:

df = pd.DataFrame({'a':[7,8,8],
                   'b':[4,5,6]})

print (df)
   a  b
0  7  4
1  8  5
2  8  6

distcounts = df.groupby('a')['b'].nunique().reset_index()
print (distcounts)
   a  b
0  7  1
1  8  2

#2


3  

Another alternative using Groupby.agg instead:

使用Groupby.agg的另一种替代方法:

df.groupby('a', as_index=False).agg({'b': 'nunique'})

#1


4  

I think you need reset_index:

我想你需要reset_index:

distcounts = df.groupby('a')['b'].nunique().reset_index()

Sample:

样品:

df = pd.DataFrame({'a':[7,8,8],
                   'b':[4,5,6]})

print (df)
   a  b
0  7  4
1  8  5
2  8  6

distcounts = df.groupby('a')['b'].nunique().reset_index()
print (distcounts)
   a  b
0  7  1
1  8  2

#2


3  

Another alternative using Groupby.agg instead:

使用Groupby.agg的另一种替代方法:

df.groupby('a', as_index=False).agg({'b': 'nunique'})