熊猫:使用groupby获取每个数据类别的均值

时间:2021-02-06 15:50:04

I have a dataframe that looks like this:

我有一个如下所示的数据框:

>>> df[['data','category']]
Out[47]: 
          data     category
  0       4610            2
 15       4610            2
 22       5307            7
 23       5307            7
 25       5307            7
...        ...          ...

Both data and category are numeric so I'm able to do this:

数据和类别都是数字,所以我能够这样做:

>>> df[['data','category']].mean()
Out[48]: 
data        5894.677985
category      13.805886
dtype: float64

And i'm trying to get the mean for each category. It looks straight forward but when I do this:

而我正试图获得每个类别的平均值。它看起来很直接,但是当我这样做时:

>>> df[['data','category']].groupby('category').mean()

or

要么

>>> df.groupby('category')['data'].mean()

It returns an error like this:

它返回如下错误:

DataError: No numeric types to aggregate

There's no error if I replace both functions above with .count().

如果我用.count()替换上面的两个函数,则没有错误。

What do I do wrongly? What's the correct way to get the mean of each category?

我做错了什么?获得每个类别的平均值的正确方法是什么?

2 个解决方案

#1


5  

Can you do a df.dtypes ? In the example below type is Int as it works fine.

你能做一个df.dtypes吗?在下面的示例中,type为Int,因为它工作正常。

    import pandas as pd

    ##group by 1 columns
    df = pd.DataFrame({' data': [4610, 4611, 4612, 4613], 'Category': [2, 2,    7, 7]})
    print df.groupby('Category'). mean()


    ##Mutiple columns to group by
    df1 = pd.DataFrame({' data': [4610, 4611, 4612, 4613], 'Category': [2,    2, 7, 7], 'Category2' : ['A','B','A','B']})
    key=['Category','Category2']
    print df1.groupby( key).mean()

 Category Category2       
 2        A           4610
          B           4611
 7        A           4612
          B           4613

#2


2  

As mentioned, you don't give an example of the testTime and passing_site data, but I'm guessing that they're floating rate numbers. As I'm sure you can imagine, you can't group on floating numbers. Rather, you would need to group on integers or categories of some type.

如上所述,您没有给出testTime和passing_site数据的示例,但我猜他们是浮动费率数字。我相信你可以想象,你不能把浮动数字分组。相反,您需要对某些类型的整数或类别进行分组。

try something like:

尝试类似的东西:

df.groupby(['data', 'category'])['passing_site', 'testTime'].mean()

You're grouping on 'data' and 'category', and then calculating the mean for the numerical columns 'passing_site' and 'testTime'.

您正在对“数据”和“类别”进行分组,然后计算数字列'passing_site'和'testTime'的平均值。

#1


5  

Can you do a df.dtypes ? In the example below type is Int as it works fine.

你能做一个df.dtypes吗?在下面的示例中,type为Int,因为它工作正常。

    import pandas as pd

    ##group by 1 columns
    df = pd.DataFrame({' data': [4610, 4611, 4612, 4613], 'Category': [2, 2,    7, 7]})
    print df.groupby('Category'). mean()


    ##Mutiple columns to group by
    df1 = pd.DataFrame({' data': [4610, 4611, 4612, 4613], 'Category': [2,    2, 7, 7], 'Category2' : ['A','B','A','B']})
    key=['Category','Category2']
    print df1.groupby( key).mean()

 Category Category2       
 2        A           4610
          B           4611
 7        A           4612
          B           4613

#2


2  

As mentioned, you don't give an example of the testTime and passing_site data, but I'm guessing that they're floating rate numbers. As I'm sure you can imagine, you can't group on floating numbers. Rather, you would need to group on integers or categories of some type.

如上所述,您没有给出testTime和passing_site数据的示例,但我猜他们是浮动费率数字。我相信你可以想象,你不能把浮动数字分组。相反,您需要对某些类型的整数或类别进行分组。

try something like:

尝试类似的东西:

df.groupby(['data', 'category'])['passing_site', 'testTime'].mean()

You're grouping on 'data' and 'category', and then calculating the mean for the numerical columns 'passing_site' and 'testTime'.

您正在对“数据”和“类别”进行分组,然后计算数字列'passing_site'和'testTime'的平均值。