1.餐厅订单数据分析
1.1 针对订单order_id
- 什么菜最受欢迎
- 点菜的种类
- 点菜的数量
- 消费金额最大
- 平均消费
1.2 针对时间日期进行分析
- 时间段与点菜量关系图
- 日期与点菜量关系图
- 星期与点菜量关系图
2 技术点
2.1 pandas模块
2.1.1拼接数据
- pd.concat([列1,...], axis=0) axis值为0 向下拼接 值为1向左拼接 关键字key支持联合拼接
2.1.2 apply函数 & map函数遍历映射处理dataframe对象中单个元素
2.1.3 drop函数按照指定条件删除内容
- 根据某列某值删除行需依靠索引
- 多条件删除,每个条件需使用()分开
- &:交集;|:并集;-:非
2.1.4 根据字段聚合 groupby(by=“”).聚合函数() 聚合函数:sum()|count()|min()|max()…
2.1.5 对某字段按值进行统计次数和频次 df[‘’].value_counts()
2.2 wordcloud模块
2.2.1 利用wordcloud模块WordCloud类依据词频绘制词云图
- 使用WordCloud首先需要实例化对象 含中文字符需手动设置字体参数
- 展示需要调用plt.imshow()
2.3 matplotlib模块
2.3.1 figure对象(画布)
2.3.1.1 设置fig大小
- fig.set_figwidth()
- fig.set_figheight()
2.3.1.2 获取当前fig对象 plt.gcf()
2.3.2 axes对象(图表)
2.3.2.1 影藏边框线
- ax.spines['position'].set_visible('none')
2.3.2.2 统一画布绘制多个图表
- ax=fig.add_subplots(rows, cols, index) # 画布添加图表
- ax = plt.plot() | df.plot() # 绘制图表
2.3.2.3 y轴刻度标签与y轴标签移动至右边
- ax.yaxis.tick_right() # y轴刻度
- ax.yaxis.set_label_posion('right') # y轴标签
2.3.3 plt对象
2.3.3.1 绘制同一坐标系绘制多个图解决x坐标轴刻度统一
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = 'SimHei'
# 读取数据
data1 = pd.read_excel("meal_order_detail.xlsx", sheet_name="meal_order_detail1")
data2 = pd.read_excel("meal_order_detail.xlsx", sheet_name="meal_order_detail2")
data3 = pd.read_excel("meal_order_detail.xlsx", sheet_name="meal_order_detail3")
# 合并数据
data = pd.concat([data1, data2, data3], axis=0)
# 数据预处理
# 1.删除NA
data.dropna(axis=1, how="all",inplace=True)
#2. 处理菜名末尾转译字符\r\n
data['dishes_name'] = data['dishes_name'].apply(lambda x: x.replace("\r\n",''))
data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10037 entries, 0 to 3610
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 detail_id 10037 non-null int64
1 order_id 10037 non-null int64
2 dishes_id 10037 non-null int64
3 dishes_name 10037 non-null object
4 itemis_add 10037 non-null int64
5 counts 10037 non-null int64
6 amounts 10037 non-null int64
7 place_order_time 10037 non-null datetime64[ns]
8 add_inprice 10037 non-null int64
9 picture_file 10037 non-null object
10 emp_id 10037 non-null int64
dtypes: datetime64[ns](1), int64(8), object(2)
memory usage: 941.0+ KB
# 统计卖出菜品的平均价格
mean_price1 = round(data['amounts'].mean(),2) # pandas 自带函数
mean_price2 = round(np.mean(data['amounts'])) # numpy函数处理,对数据量大处理也很快
# 根据菜品出现的频次绘制词云图
from wordcloud import WordCloud # 导入绘制词云模块
hot_dishes = data['dishes_name'].value_counts()[::1]
hot_dishes_dict = hot_dishes.to_dict()
wc = WordCloud(font_path="C:/Windows/Fonts/simfang.ttf", background_color='white')
dishes_wc = wc.generate_from_frequencies(hot_dishes_dict)
plt.imshow(dishes_wc,interpolation='bilinear')
plt.axis('off')
plt.show()
# 最畅销的前10个菜
# 删除白米饭
data.drop(data[(data.dishes_name=='白饭/大碗') | (data.dishes_name=='白饭/小碗')].index, inplace=True)
hot_dishes = data['dishes_name'].value_counts()[:10]
# 绘制柱状图
hot_dishes.plot(kind="bar", fontsize=12)
# 绘制折线图
hot_dishes.plot(kind='line', color='red')
# gca:get current axis得到当前轴
ax = plt.gca()
# 隐藏上框线和右框线
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# 添加数字标签
for x, y in enumerate(hot_dishes):
plt.text(x, y+2, y, horizontalalignment="center")
# x标签旋转显示
plt.xticks(rotation=90)
# 添加标题
plt.title("TOP-10 菜品", fontsize=16)
plt.show()
# 订单点菜的种类最多
detail_group = data['order_id'].value_counts()[:10]
detail_group.plot(kind='bar',color=['b','g','y'], label='点餐数')
plt.title("TOP-10 订单点餐数(AVG={})".format(int(np.mean(detail_group))),fontsize=16)
plt.xlabel("订单ID", fontsize=12)
plt.ylabel('点餐数量', fontsize=12)
plt.xticks(rotation=0)
# [str(i) for i in detail_group.index.tolist()] 实现与柱状图x轴统一
plt.plot([str(i) for i in detail_group.index.tolist()],[int(detail_group.mean())]*len(detail_group),
color='r',linestyle='solid', linewidth=3, label='点餐平均值')
# 无法显示在上面的画布中???
# plt.plot(detail_group.index.tolist(),[int(detail_group.mean())]*len(detail_group))
plt.legend()
plt.show()
# 订单ID点菜数量Top10(分组order_id,counts求和,排序,前十)
data['total_amounts'] = data['counts'] * data['amounts'] # 统计单道菜金额
dataGroup = data[['order_id', 'counts', 'total_amounts']].groupby(by='order_id').sum() # 分组求和
dataGroup['avg_price'] = dataGroup['total_amounts'] / dataGroup['counts']
sort_counts = dataGroup.sort_values(by='counts',ascending=False) # 根据点菜数量由高到低排序
fig = plt.gcf()# 获取当前画布get current figure
ax1 = fig.add_subplot(1,2,1)
# ax1.bar(sort_counts.index[:10], sort_counts['counts'][:10]) # 无法正常可视化,因为订单索引是数值会自动排序
ax1.bar([str(i)for i in sort_counts[:10].index.tolist()], sort_counts['counts'][:10])
plt.xticks(rotation=90)
plt.xlabel('订单ID')
plt.ylabel("点菜数量")
plt.title('TOP-10点菜数量')
# TODO
# plt.xticks([str(i)for i in sort_counts[:10].index.tolist()], rotation=90)
ax2 = fig.add_subplot(1,2,2)
sort_ttl_amounts = dataGroup.sort_values(by='total_amounts',ascending=False) # 根据点菜数量由高到低排序
ax2.bar([str(i) for i in sort_ttl_amounts.index[:10]], sort_ttl_amounts['total_amounts'][:10], color='g')
plt.xticks(rotation=90)
plt.title('TOP-10消费金额')
plt.xlabel('订单ID')
#将y轴刻度移至右边
ax2.yaxis.tick_right()
plt.ylabel("消费金额")
# 将y轴标签移至右边
ax2.yaxis.set_label_position('right')
plt.show()
# 初始化画布
fig = plt.gcf()
fig.set_figwidth(50)
fig.set_figheight(10)
# 一天中什么时间段,点菜量比较集中
data['hourcount'] = 1 # 新列 用作计数器
data['time'] = pd.to_datetime(data['place_order_time']) # 将时间转换为日期类型存储
data['hour'] = data['time'].map(lambda a: a.hour)
gp_by_hour = data.groupby(by='hour').count()['hourcount']
hour_ax = fig.add_subplot(1,3,1)
hour_ax = gp_by_hour.plot(kind='bar', fontsize=14)
plt.title("下单数与时间段关系图")
# plt.xticks(gp_by_hour.index, [f"{i}:00" for i in gp_by_hour.index.tolist()])
plt.xlabel('时间段', fontsize=14)
plt.ylabel('下单量', fontsize=14)
# 按天的点餐量
ax2 = fig.add_subplot(1, 3, 2)
data['daycount'] = 1 # 新列 用作计数器
data['day'] = data['time'].map(lambda a: a.day)
gp_by_day = data.groupby(by='day').count()['daycount']
ax2 = gp_by_day.plot(kind='bar')
plt.title("下单数与日期关系图")
plt.xlabel('日期', fontsize=14)
# 按星期的点餐量
ax3 = fig.add_subplot(1, 3, 3)
weekdayNameList = ['周一','周二','周三','周四','周五','周六','周天']
data['weekdaycount'] = 1
data['weekday'] = data['time'].map(lambda a: a.weekday())# 0代表星期一
gp_by_weekday = data.groupby(by='weekday').count()['weekdaycount']
ax3 = gp_by_weekday.plot(kind='bar')
plt.xticks(gp_by_weekday.index, weekdayNameList)
plt.title("下单数与星期关系图")
plt.xticks(rotation=0)
plt.xlabel('星期', fontsize=14)
plt.savefig("下单量与时间关系图.png")
plt.show()
----------------------------------END------------------------------------
需要餐厅订单数据的伙伴,可以在下方留言~