I have a pandas dataframe that looks like that:
我有一个像这样的pandas数据框:
df = pd.DataFrame({ 'ID' : [2,2,2,2,2,4,4,3,3,3,6] , 'count' : [20,43,45,50,15,65,35,15,15,14,30]})
df
ID count
0 2 20
1 2 43
2 2 45
3 2 50
4 2 15
5 4 65
6 4 35
7 3 15
8 3 15
9 3 14
10 6 30
I want to create a pivot table with the following output:
我想创建一个带有以下输出的数据透视表:
ID 1 2 3 4 5
2 20 43 45 50 15
4 65 35 0 0 0
3 15 15 14 0 0
6 30 0 0 0 0
I thought using the pivot function to the dataframe (df_pivot = df.pivot(index='ID', columns=..., values='count') but I am missing the columns index list. I thought applying a lambda function to the df to generate an additional column with the missing column names but I have 800M IDs and the apply function to a grouped dataframe is painfully slow. Is there a quick approach you might be aware off?
我认为使用数据框的pivot函数(df_pivot = df.pivot(index ='ID',columns = ...,values ='count')但我缺少列索引列表。我想应用lambda函数df生成一个带有缺少列名的附加列,但是我有800M ID,并且对分组数据帧的apply函数非常慢。有没有快速的方法你可能会注意到?
1 个解决方案
#1
2
I would define a subindex for each group as:
我会为每个组定义一个子索引:
df['subindex'] = df.groupby('ID').cumcount() + 1
Then apply the pivot method setting the new subindex
as columns and fill NaN
values with 0:
然后应用pivot方法将新子索引设置为列,并使用0填充NaN值:
d = pd.pivot_table(df,index='ID',columns='subindex',values='count').fillna(0)
This returns:
subindex 1 2 3 4 5
ID
2 20 43 45 50 15
3 15 15 14 0 0
4 65 35 0 0 0
6 30 0 0 0 0
Hope that helps.
希望有所帮助。
#1
2
I would define a subindex for each group as:
我会为每个组定义一个子索引:
df['subindex'] = df.groupby('ID').cumcount() + 1
Then apply the pivot method setting the new subindex
as columns and fill NaN
values with 0:
然后应用pivot方法将新子索引设置为列,并使用0填充NaN值:
d = pd.pivot_table(df,index='ID',columns='subindex',values='count').fillna(0)
This returns:
subindex 1 2 3 4 5
ID
2 20 43 45 50 15
3 15 15 14 0 0
4 65 35 0 0 0
6 30 0 0 0 0
Hope that helps.
希望有所帮助。