首先下载Fiddler
百度网盘
提取码:mrin
安装完成后打开Fiddler进行配置
在主菜单栏 Tools->Options进行配置
先在HTTPS进行操作
然后在Connections进行操作,8888为端口号
接下来点击OK,关闭Fiddler使得配置生效
重新打开Fiddler后
在Fiddler右上方,有Online这一按钮,点击一下可以出现你的ip,例如下方ip为192.168.1.102
此时打开浏览器,输入192.168.1.102:8888,该端口8888为上文配置的端口,点击超链接下载证书
将证书发送到手机,点击证书直接安装,或者到设置-》安全和隐私-》更多安全设置-》加密和凭据-》从存储设备安装
安装好后进行手机IP配置
手机打开WIFI,与电脑连接同一个局域网,即同一个WiFi。在WLAN长按要连接的WiFi,点击修改网络,输入电脑的IP以及端口号
点击显示高级选项,设置代理为手动
接着打开Fiddler
就可以开始抓包了
今天示例抓取——ONE
打开发现这些带有one-api的就是该小程序的api接口
接下来设置一下网站过滤
研究一下,可以发现https://one-api.mssnn.cn/api/classes/Posts
该网站带有数据传送
该方法为POST
一般POST方法
所以
POST https://one-api.mssnn.cn/api/classes/Posts HTTP/1.1
charset: utf-8
Accept-Encoding: gzip
referer: https://servicewechat.com/wx5bbe79dd056cb238/34/page-frame.html
content-type: application/json
User-Agent: xxx
Content-Length: 227
Host: one-api.mssnn.cn
Connection: Keep-Alive
{“where”:{},“limit”:20,“order”:"-postAt","_method":“GET”,"_ApplicationId":“one_mssnn_cn”,"_ClientVersion":“js1.11.1”,"_InstallationId":“d4e87ab6-23ad-965a-5d4d-48f452e1e3c6”,"_SessionToken":“r:6b3f80b8c1d763e40be374d23d2f8d86”}
通过报文体我们可以获取我们需要的信息,("_InstallationId":“xxx”,"_SessionToken":"r:xxx"这个我改掉了,根据自身的获取)
import requests
import json
import simplejson
import xlwt
header_dict = {
"Host": "one-api.mssnn.cn",
"User-Agent": "xxx",
"Accept": "xxx",
"Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
"Accept-Encoding": "gzip",
"https": "//one-api.mssnn.cn/api/classes/Posts HTTP/1.1",
"charset": "utf-8",
"referer": "https://servicewechat.com/wx5bbe79dd056cb238/34/page-frame.html",
"content-type": "application/json",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"Pragma": "no-cache",
"Cache-Control": "no-cache",
}
url = 'https://one-api.mssnn.cn/api/classes/Posts'
ip_data = "171.41.80.247" # 设置代理ip
port_data = "9999" #端口
proxies = {
"http": ip_data + ":" + port_data
}
workbook = xlwt.Workbook(encoding='utf-8' , style_compression=0) #创建新表
worksheet = workbook.add_sheet('test',cell_overwrite_ok=True)
i = 0
col = 0
row = 1
while i < 2200:
inp_strr = {"where":{},"limit":20,"skip":i,"order":"-postAt","_method":"GET","_ApplicationId":"one_mssnn_cn","_ClientVersion":"js1.11.1","_InstallationId":"xxx","_SessionToken":"r:xxx"}
r = requests.post(url, data=json.dumps(inp_strr), headers=header_dict, proxies=proxies)
htmlcontent=r.content.decode('utf-8')
try:
json_dict = simplejson.loads(htmlcontent, strict=False)
list_key = []
j = 0
for lists in json_dict['results']:
list_key.append(lists)
j = 0
for value in lists.values():
if col == 10:
row += 1
col = 0
worksheet.write(row, col, str(value))
col += 1
except:
print(str(i) + "出错了")
print("目前爬取到的数据是"+str(i))
workbook.save("data.xls")
i += 20
最终爬到1000多条数据