如何按列值对python pandas数据帧进行十进制,然后对每个十进制求和?

时间:2021-01-01 21:40:05

Say a dataframe only has one numeric column, order it desc.

假设一个数据帧只有一个数字列,命令它desc。

What I want to get is a new dataframe with 10 rows, row 1 is sum of smallest 10% values then row 10 is sum of largest 10% values.

我想得到的是一个包含10行的新数据帧,第1行是最小10%值的总和,然后第10行是最大10%值的总和。

I can calculate this via a non-pythonic way but I guess there must be a fashion and pythonic way to achieve this.

我可以通过非pythonic方式计算这个,但我想必须有一种时尚和pythonic方式来实现这一点。

Any help?

有帮助吗?

Thanks!

谢谢!

1 个解决方案

#1


7  

You can do this with pd.qcut:

您可以使用pd.qcut执行此操作:

df = pd.DataFrame({'A':np.random.randn(100)})

# pd.qcut(df.A, 10) will bin into deciles
# you can group by these deciles and take the sums in one step like so:
df.groupby(pd.qcut(df.A, 10))['A'].sum()
# A
# (-2.662, -1.209]   -16.436286
# (-1.209, -0.866]   -10.348697
# (-0.866, -0.612]    -7.133950
# (-0.612, -0.323]    -4.847695
# (-0.323, -0.129]    -2.187459
# (-0.129, 0.0699]    -0.678615
# (0.0699, 0.368]      2.007176
# (0.368, 0.795]       5.457153
# (0.795, 1.386]      11.551413
# (1.386, 3.664]      20.575449

pandas.qcut documentation

pandas.qcut文档

#1


7  

You can do this with pd.qcut:

您可以使用pd.qcut执行此操作:

df = pd.DataFrame({'A':np.random.randn(100)})

# pd.qcut(df.A, 10) will bin into deciles
# you can group by these deciles and take the sums in one step like so:
df.groupby(pd.qcut(df.A, 10))['A'].sum()
# A
# (-2.662, -1.209]   -16.436286
# (-1.209, -0.866]   -10.348697
# (-0.866, -0.612]    -7.133950
# (-0.612, -0.323]    -4.847695
# (-0.323, -0.129]    -2.187459
# (-0.129, 0.0699]    -0.678615
# (0.0699, 0.368]      2.007176
# (0.368, 0.795]       5.457153
# (0.795, 1.386]      11.551413
# (1.386, 3.664]      20.575449

pandas.qcut documentation

pandas.qcut文档