一、查询余票的原理
正常用户web浏览器查询余票通常是进入12306官网,输入起始站、终点站、日期既可以点击查询,如果用Python来操作则有两种方案,一种是基于Selenium2的自动化框架控制浏览器实现,另一种方案则是基于Python自身的爬虫package如request,urllib等来实现,本文实现第二种方案。
二、查询余票的实现
浏览器查询余票是通过访问如:
https://kyfw.12306.cn/otn/leftTicket/query?leftTicketDTO.train_date=2017-09-30&leftTicketDTO.from_station=BJP&leftTicketDTO.to_station=SHH&purpose_codes=ADULT
这样格式url显示的,从url中我们可以看出,乘车日期(train_date)为2017-09-30,SHH为北京南(BJP),目的站(to_station)为上海(SHH)。因此,python 程序只需要包装好上述关键信息,然后封装url,在调用Python自带的requeset,urllib等package就可以爬取整个余票信息,最后解析浏览器网页输出即可。
1.封装余票查询URL
#查询余票的链接头
query_url = "https://kyfw.12306.cn/otn/leftTicket/query?"
#查询余票的完整url
url=query_url+"leftTicketDTO.train_date="+train_date+"&leftTicketDTO.from_station="+from_station+"&leftTicketDTO.to_station="+to_station+"&purpose_codes=ADULT"
2.发送请求并读取网页内容
#http连接太多没有关闭导致 Max retries exceeded with url...
#增加重试连接次数
requests.adapters.DEFAULT_RETRIES = 5
#requests使用了urllib3库,默认的http connection是keep-alive的,requests设置False关闭。
s = requests.session()
s.keep_alive = False
#发送查询请求,获取余票网页
r = requests.get(url,allow_redirects=True,verify=False,timeout=10)
if r.status_code==200:
# station_dict = r.json()['data']['map']
traindatas = r.json()['data']['result']
3. 解析网页内容到指定的trainInfo数据结构
for data in traindatas:
trainInfo = {}
#解析网页内容,抓取余票信息
trainRowItem = re.compile('\|([^\|]*)').findall(data)
trainInfo['train_no'] = trainRowItem[2]
trainInfo['from_station_name'] = stationDictChineseMapAbbr [trainRowItem[3]]
trainInfo['to_station_name'] = stationDictChineseMapAbbr [trainRowItem[4] ]
# trainInfo['from_station_name'] = trainRowItem[3]
# trainInfo['to_station_name'] = trainRowItem[4]
trainInfo['start_time'] = trainRowItem[7]
trainInfo['arrive_time'] = trainRowItem[8]
trainInfo['duration'] = trainRowItem[9]
trainInfo['swz_num'] = trainRowItem[31]
...
4.输出余票数据结构内容trainInfo到终端
#设置输出格式
header = '序号 车次 出发站 到达站 出发时间 到达时间 历时 商务座 一等座 二等座 高级软卧 软卧 动卧 硬卧 硬座 无座'.split()
pt = PrettyTable()
pt._set_field_names(header)
for i,trainInfo in enumerate(trainInfoList):
pt.add_row([i,trainInfo['train_no'],trainInfo['from_station_name'],trainInfo['to_station_name'],trainInfo['start_time'],
trainInfo['arrive_time'],trainInfo['duration'],trainInfo['swz_num'],
trainInfo['zy_num'],trainInfo['ze_num'],trainInfo['gjrw_num'],trainInfo['rw_num'],trainInfo['dw_num'],
trainInfo['yw_num'],trainInfo['yz_num'],trainInfo['wz_num']])
#终端输出
print(pt)