如何在Pandas中组合一个Dataframe并保留列

时间:2021-01-01 21:40:23

given a dataframe that logs uses of some books like this:

给定一个数据框,记录一些书的使用情况,如下所示:

Name   Type   ID
Book1  ebook  1
Book2  paper  2
Book3  paper  3
Book1  ebook  1
Book2  paper  2

I need to get the count of all the books, keeping the other columns and get this:

我需要得到所有书籍的数量,保留其他列并得到这个:

Name   Type   ID    Count
Book1  ebook  1     2
Book2  paper  2     2
Book3  paper  3     1

How can this be done?

如何才能做到这一点?

Thanks!

2 个解决方案

#1


33  

You want the following:

您需要以下内容:

In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()

Out[20]:
    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby on these, call count and then reset_index.

在您的情况下,“名称”,“类型”和“ID”列匹配值,因此我们可以对这些,呼叫计数和reset_index进行分组。

An alternative approach would be to add the 'Count' column using transform and then call drop_duplicates:

另一种方法是使用transform添加'Count'列,然后调用drop_duplicates:

In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()

Out[25]:
    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

#2


17  

I think as_index=False should do the trick.

我认为as_index = False应该可以解决问题。

df.groupby(['Name','Type','ID'], as_index=False).count()

#1


33  

You want the following:

您需要以下内容:

In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()

Out[20]:
    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby on these, call count and then reset_index.

在您的情况下,“名称”,“类型”和“ID”列匹配值,因此我们可以对这些,呼叫计数和reset_index进行分组。

An alternative approach would be to add the 'Count' column using transform and then call drop_duplicates:

另一种方法是使用transform添加'Count'列,然后调用drop_duplicates:

In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()

Out[25]:
    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

#2


17  

I think as_index=False should do the trick.

我认为as_index = False应该可以解决问题。

df.groupby(['Name','Type','ID'], as_index=False).count()