使用Python Pandas中的groupby将一列的每个值与其他列的每个值分组

时间:2022-03-17 11:10:27

So, I have a dataframe containing 3 columns each having 631 rows, so I am highlighting only the unique values under each column.

所以,我有一个包含3列的数据帧,每列有631行,所以我只突出显示每列下的唯一值。

df

Segment Type  Nature of Query     Q1

PRIME         Request             1           
BUSINESS      Complaint           2 
PRIORITY      Critical Request    3
                                  4
                                  5

Now, let's say under 'Segment Type' i want to group 'PRIME' with every row of 'NATURE OF QUERY' and 'Q1' and find size, min, max, mean

现在,让我们说在“段类型”下我想用'NATURE OF QUERY'和'Q1'的每一行组合'PRIME'并找到size,min,max,mean

So tried to use groupby func and i got this:

所以尝试使用groupby func,我得到了这个:

 df.groupby(['Segment Type','Nature of Query'])['Q1'].agg([pd.np.size, 
 pd.np.min, pd.np.max, pd.np.mean])

And, i got this:

而且,我得到了这个:

    Segment Type    Nature of Query    size     amin    amax    mean            

         BUSINESS       Request          1        4       4     4.000000
           PRIME        Complaint        1        5       5     5.000000
                      Critical Request   3        1       2     1.666667
                        Request          31       1       5     3.387097
          PRIORITY    Critical Request   1        4       4     4.000000
                        Request          3        3       5     4.000000

What i wanted as output:

我想要的输出:

       Segment Type   Nature of Query      size     amin    amax    mean
           BUSINESS       Request            1        4       4     4.000000
                          Complaint          1        5       5     5.000000
                          Critical Request   3        1       2     1.666667


            PRIME       Complaint            1        5       5     5.000000
                        Critical Request     3        1       2     1.666667
                        Request              31       1       5     3.387097

          PRIORITY      Complaint            1        5       5     5.000000
                        Critical Request     1        4       4     4.000000
                        Request              3        3       5     4.000000

Ignore the size, mean, max etc it is calculated wrt Q1. My main problem is with the values of 'Segment Type' and 'Nature of Query'.

忽略它与Q1计算的大小,平均值,最大值等。我的主要问题是“细分类型”和“查询性质”的值。

If any solution possible, please let me know. Thanks!

如果有任何解决方案,请告诉我。谢谢!

2 个解决方案

#1


0  

I believe need reindex created by MultiIndex.from_product:

我相信需要由MultiIndex.from_product创建的reindex:

df = df.groupby(['Segment Type','Nature of Query'])['Q1'].agg(['size', 'min', 'max', 'mean'])

mux = pd.MultiIndex.from_product(df.index.levels, names=['Segment Type','Nature of Query'])
df = df.reindex(mux, fill_value=0).reset_index()
print (df)
  Segment Type   Nature of Query  size  min  max  mean
0     BUSINESS         Complaint     1    2    2     2
1     BUSINESS  Critical Request     0    0    0     0
2     BUSINESS           Request     0    0    0     0
3        PRIME         Complaint     0    0    0     0
4        PRIME  Critical Request     0    0    0     0
5        PRIME           Request     1    1    1     1
6     PRIORITY         Complaint     0    0    0     0
7     PRIORITY  Critical Request     3    3    5     4
8     PRIORITY           Request     0    0    0     0

#2


0  

You could use the pivot table function, see the tutorial here :

您可以使用数据透视表功能,请参阅此处的教程:

http://pbpython.com/pandas-pivot-table-explained.html

#1


0  

I believe need reindex created by MultiIndex.from_product:

我相信需要由MultiIndex.from_product创建的reindex:

df = df.groupby(['Segment Type','Nature of Query'])['Q1'].agg(['size', 'min', 'max', 'mean'])

mux = pd.MultiIndex.from_product(df.index.levels, names=['Segment Type','Nature of Query'])
df = df.reindex(mux, fill_value=0).reset_index()
print (df)
  Segment Type   Nature of Query  size  min  max  mean
0     BUSINESS         Complaint     1    2    2     2
1     BUSINESS  Critical Request     0    0    0     0
2     BUSINESS           Request     0    0    0     0
3        PRIME         Complaint     0    0    0     0
4        PRIME  Critical Request     0    0    0     0
5        PRIME           Request     1    1    1     1
6     PRIORITY         Complaint     0    0    0     0
7     PRIORITY  Critical Request     3    3    5     4
8     PRIORITY           Request     0    0    0     0

#2


0  

You could use the pivot table function, see the tutorial here :

您可以使用数据透视表功能,请参阅此处的教程:

http://pbpython.com/pandas-pivot-table-explained.html