微博爬虫之:无需账号获取微博weibo的Cookie

时间:2024-04-09 20:38:12

这里主要演示原理,不涉及具体的编程代码,工具:Postman,主要3个步骤:

 

 第1步(获取tid):

网址:https://passport.weibo.com/visitor/genvisitor
方式:POST
参数:
cb:gen_callback(固定)
fp:{"os":"1","browser":"Chrome70,0,3538,25","fonts":"undefined","screenInfo":"1920*1080*24","plugins":"Portable Document Format::internal-pdf-viewer::Chromium PDF Plugin|::mhjfbmdgcfjbbpaeojofohoefgiehjai::Chromium PDF Viewer|::gbkeegbaiigmenfmjfclcdgdpimamgkj::Google文档、表格及幻灯片的Office编辑扩展程序|::internal-nacl-plugin::Native Client"}(视浏览器真实值而定)

响应结果:

window.gen_callback && gen_callback({"retcode":20000000,"msg":"succ","data":{"tid":"t4vkYDYI5yHEIXBRL+VFdoXnXPqE9389EuMYk4HojIE=","new_tid":true}});

Postman截图:

微博爬虫之:无需账号获取微博weibo的Cookie

 

第2步(获取sub和subp):

网址:https://passport.weibo.com/visitor/visitor
方式:GET
参数:
a:incarnate(固定)
t:UhIQHACePHlmNiYcsClsQk4FcWAJx8dnTtn7lSkeql8(即上面得到的tid)
w:3(如果上面的new_tid为true,则此值为3,否则为2)
c:100(如果上面的data中有此值则取此值,否则默认为100)
cb:cross_domain(固定)
from:weibo(固定)

响应结果:

window.cross_domain && cross_domain({"retcode":20000000,"msg":"succ","data":{"sub":"_2AkMr-VWef8NxqwJRmfoQzGvgbYh1yAvEieKdpaRFJRMxHRl-yT83qmMMtRB6AHl7cF8_VEgmhI22z4tOrHKOgCxqTZfs","subp":"0033WrSXqPxfM72-Ws9jqgMF55529P9D9W5bD_b5wVspSuGXLY-FIm9m"}});

Postman截图:

微博爬虫之:无需账号获取微博weibo的Cookie

 

第3步(将sub和subp拼接组成Cookie,实现爬取数据):

网址:https://d.weibo.com/1087030002_2975_2017_0
方式:GET
Headers参数:
Cookie:SUB=_2AkMr-Uitf8NxqwJRmP4Vym7lZIt2wwDEieKdpbl2JRMxHRl-yT83qhAytRB6AHlmQiE0cGNJVvYskBmcaMuDeBtcMDoK; SUBP=0033WrSXqPxfM72-Ws9jqgMF55529P9D9WW7Ds97Ql.cFbVqMIoBZMpe
(SUB和SUBP有上一个接口得到)

Postman截图:

微博爬虫之:无需账号获取微博weibo的Cookie