如下所示:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
|
import pandas as pd
path = 'F:/python/python数据分析与挖掘实战/图书配套数据、代码/chapter3/demo/data/catering_fish_congee.xls'
data = pd.read_excel(path,header = None ,index_col = 0 )
data.index.name = '日期'
data.columns = [ '销售额(元)' ]
xse = data[ '销售额(元)' ]
print (xse. max ())
print (xse. min ())
print (xse. max () - xse. min ())
fanwei = list ( range ( 0 , 4500 , 500 ))
fenzu = pd.cut(xse.values,fanwei,right = False ) #分组区间,长度91
print (fenzu.codes) #标签
print (fenzu.categories) #分组区间,长度8
pinshu = fenzu.value_counts() #series,区间-个数
print (pinshu.index)
import matplotlib.pyplot as plt
pinshu.plot(kind = 'bar' )
#plt.text(0,29,str(29))
qujian = pd.cut(xse,fanwei,right = False )
data[ '区间' ] = qujian.values
data.groupby( '区间' ).median()
data.groupby( '区间' ).mean() #每个区间平均数
pinshu_df = pd.DataFrame(pinshu,columns = [ '频数' ])
pinshu_df[ '频率f' ] = pinshu_df / pinshu_df[ '频数' ]. sum ()
pinshu_df[ '频率%' ] = pinshu_df[ '频率f' ]. map ( lambda x: '%.2f%%' % (x * 100 ))
pinshu_df[ '累计频率f' ] = pinshu_df[ '频率f' ].cumsum()
pinshu_df[ '累计频率%' ] = pinshu_df[ '累计频率f' ]. map ( lambda x: '%.4f%%' % (x * 100 ))
In[ 158 ]: pinshu_df
Out[ 158 ]:
频数 频率f 频率 % 累计频率f 累计频率 %
[ 0 , 500 ) 29 0.318681 31.87 % 0.318681 31.8681 %
[ 500 , 1000 ) 20 0.219780 21.98 % 0.538462 53.8462 %
[ 1000 , 1500 ) 12 0.131868 13.19 % 0.670330 67.0330 %
[ 1500 , 2000 ) 12 0.131868 13.19 % 0.802198 80.2198 %
[ 2000 , 2500 ) 8 0.087912 8.79 % 0.890110 89.0110 %
[ 2500 , 3000 ) 3 0.032967 3.30 % 0.923077 92.3077 %
[ 3000 , 3500 ) 4 0.043956 4.40 % 0.967033 96.7033 %
[ 3500 , 4000 ) 3 0.032967 3.30 % 1.000000 100.0000 %
|
以上这篇pandas分区间,算频率的实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/castingA3T/article/details/79075240