requests比起之前用到的urllib,requests模块的api更加便捷(本质就是封装了urllib3)
#GET请求 HTTP默认的请求方法就是GET * 没有请求体 * 数据必须在1K之内! * GET请求数据会暴露在浏览器的地址栏中 GET请求常用的操作: 1. 在浏览器的地址栏中直接给出URL,那么就一定是GET请求 2. 点击页面上的超链接也一定是GET请求 3. 提交表单时,表单默认使用GET请求,但可以设置为POST #POST请求 (1). 数据不会出现在地址栏中 (2). 数据的大小没有上限 (3). 有请求体 (4). 请求体中如果存在中文,会使用URL编码! #!!!requests.post()用法与requests.get()完全一致,特殊的是requests.post()有一个data参数,用来存放请求体数据
基本的GET请求
import requests response = requests.get(\'http://httpbin.org/get\') print(response.text)
{ "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Host": "httpbin.org", "User-Agent": "python-requests/2.18.4" }, "origin": "223.71.166.246", "url": "http://httpbin.org/get" }
带参数的GET请求
#通常我们在发送请求时都需要带上请求头,请求头是将自身伪装成浏览器的关键,常见的有用的请求头如下 Host Referer #大型网站通常都会根据该参数判断请求的来源(从哪里跳转到当前页面的) User-Agent #浏览器内核,模拟是浏览器请求的 Cookie #Cookie信息虽然包含在请求头里,但requests模块有单独的参数来处理他,headers={}内就不要放它了
方式1:
import requests response = requests.get(\'http://httpbin.org/get?name=xiong&age=25\') print(response.text)
{ "args": { "age": "25", "name": "xiong" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Host": "httpbin.org", "User-Agent": "python-requests/2.18.4" }, "origin": "223.71.166.246", "url": "http://httpbin.org/get?name=0bug&age=25" } 结果
方式2:
import requests data = { \'name\': \'xiong\', \'age\': 25 } response = requests.get(\'http://httpbin.org/get\', params=data) print(response.text)
{ "args": { "age": "25", "name": "xiong" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Host": "httpbin.org", "User-Agent": "python-requests/2.18.4" }, "origin": "223.71.166.246", "url": "http://httpbin.org/get?name=0bug&age=25" }
response = requests.get() 实例化对象response的方法总结
response = requests.get(url="http://www.baidu.com",params=None) # get(url, params=None, **kwargs) response.text #获取网页HTML response.content #获取请求的url二进制内容,如 http://ww1.sinaimg.cn/large/007nuqGAly1g1yst34oyaj30ia0qedh5.jpg response.encoding #设置编码 response.apparent_encoding #获取网页的编码方式 response.status_code #获取请求的状态码 respone.headers #获取请求头 respone.cookies #获取cookies返回一个对象 respone.cookies.get_dict() #返回cookies具体内容 respone.url #获取请求地址 respone.history #重定向 response.close() ##关闭response
解析Json
#解析json import requests response=requests.get(\'http://httpbin.org/get\') import json res1=json.loads(response.text) #太麻烦 res2=response.json() #直接获取json数据 print(res1 == res2) #True
下载小的图片
import requests response = requests.get(\'https://github.com/favicon.ico\') with open(\'img.ico\',\'wb\') as f: f.write(response.content)
下载大的视频文件
#stream参数:一点一点的取,比如下载视频时,如果视频100G,用response.content然后一下子写到文件中是不合理的 import requests response=requests.get(\'https://gss3.baidu.com/6LZ0ej3k1Qd3ote6lo7D0j9wehsv/tieba-smallvideo-transcode/1767502_56ec685f9c7ec542eeaf6eac93a65dc7_6fe25cd1347c_3.mp4\', stream=True) with open(\'b.mp4\',\'wb\') as f: for line in response.iter_content(): f.write(line)
添加headers
import requests headers = { \'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36\' } response = requests.get(\'https://www.baidu.com/\', headers=headers) print(response.status_code)
基本的POST请求
import requests data = {\'name\':\'xiong\'} response = requests.post(\'http://httpbin.org/post\',data=data) print(response.text)
{ "args": {}, "data": "", "files": {}, "form": { "name": "xiong" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Content-Length": "9", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.18.4" }, "json": null, "origin": "223.71.166.246", "url": "http://httpbin.org/post" } 结果
import requests data = {\'name\': \'xiong\'} headers = { \'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36\' } response = requests.post(\'http://httpbin.org/post\', data=data, headers=headers) print(response.json())
{\'args\': {}, \'data\': \'\', \'files\': {}, \'form\': {\'name\': \'xiong\'}, \'headers\': {\'Accept\': \'*/*\', \'Accept-Encoding\': \'gzip, deflate\', \'Connection\': \'close\', \'Content-Length\': \'9\', \'Content-Type\': \'application/x-www-form-urlencoded\', \'Host\': \'httpbin.org\', \'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36\'}, \'json\': None, \'origin\': \'223.71.166.246\', \'url\': \'http://httpbin.org/post\'}
response属性
import requests response = requests.get(\'http://www.jianshu.com\') print(type(response.status_code), response.status_code) print(type(response.headers), response.headers) print(type(response.cookies), response.cookies) print(type(response.url), response.url) print(type(response.history), response.history)
<class \'int\'> 403 <class \'requests.structures.CaseInsensitiveDict\'> {\'Date\': \'Sat, 21 Apr 2018 02:16:27 GMT\', \'Server\': \'Tengine\', \'Content-Type\': \'text/html\', \'Transfer-Encoding\': \'chunked\', \'Strict-Transport-Security\': \'max-age=31536000; includeSubDomains; preload\', \'Content-Encoding\': \'gzip\', \'X-Via\': \'1.1 PSbjwjBGP2oc238:9 (Cdn Cache Server V2.0), 1.1 PSgxnnwt6jp78:4 (Cdn Cache Server V2.0), 1.1 PSbjhkwlwa80:0 (Cdn Cache Server V2.0)\', \'Connection\': \'close\'} <class \'requests.cookies.RequestsCookieJar\'> <RequestsCookieJar[]> <class \'str\'> https://www.jianshu.com/ <class \'list\'> [<Response [301]>]
文件上传
import requests files = {\'file\': open(\'img.ico\', \'rb\')} response = requests.post(\'http://httpbin.org/post\', files=files) print(response.text)
{ "args": {}, "data": "", "files": { "file": "data:application/octet-stream;base64,}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Content-Length": "6661", "Content-Type": "multipart/form-data; boundary=4ba9cec7ffee4873b4a00164473f792f", "Host": "httpbin.org", "User-Agent": "python-requests/2.18.4" }, "json": null, "origin": "223.71.166.246", "url": "http://httpbin.org/post" } 结果
获取cookie
import requests response = requests.get(\'https://www.baidu.com\') print(response.cookies) for key,value in response.cookies.items(): print(key+\'=\'+value)
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]> BDORZ=27315
会话维持
import requests s = requests.Session() s.get(\'http://httpbin.org/cookies/set/number/123456\') response = s.get(\'http://httpbin.org/cookies\') print(response.text)
{ "cookies": { "number": "123456" } }
证书验证
1.无证书报错
import requests response = requests.get(\'https://www.12306.cn\') print(response.status_code)
2.设置不使用证书,会返回200,也会有警告信息
import requests response = requests.get(\'https://www.12306.cn\',verify=False) print(response.status_code)
3.消除警告信息
import requests import urllib3 urllib3.disable_warnings() response = requests.get(\'https://www.12306.cn\',verify=False) print(response.status_code)
4.使用本地证书
import requests response = requests.get(\'https://www.12306.cn\',cert=(\'/path/server.crt\',\'/path/key\')) print(response.status_code)
代理设置
import requests proxies = { \'http\': \'http://127.0.0.1:9743\', \'https\': \'https://127.0.0.1:9743\' } response = requests.get(\'https://www.taobao.com\', proxies=proxies) print(response.status_code)
有用户名和密码的代理
import requests proxies = { \'http\': \'http://user:password@127.0.0.1:9743\', } response = requests.get(\'https://www.taobao.com\', proxies=proxies) print(response.status_code)
使用socks代理,需要安装一个模块
pip install requests[socks]
再使用代理
import requests proxies = { \'http\': \'socks5://127.0.0.1:9742\', \'https\': \'socks5://127.0.0.1:9742\', } response = requests.get(\'https://www.taobao.com\', proxies=proxies) print(response.status_code)
超时设置
import requests response = requests.get(\'https://www.taobao.com\',timeout=0.01) print(response.status_code)
错误处理
import requests from requests.exceptions import ReadTimeout try: response = requests.get(\'https://www.taobao.com\', timeout=0.01) print(response.status_code) except ReadTimeout: print(\'time out\')
认证设置
方式1:
import requests from requests.auth import HTTPBasicAuth r = requests.get(\'http://127.0.0.1:8080\', auth=HTTPBasicAuth(\'user\', \'123\')) print(r.status_code)
方式2:
import requests r = requests.get(\'http://127.0.0.1:8080\', auth=(\'user\', \'123\')) print(r.status_code)
异常处理
import requests from requests.exceptions import ReadTimeout, HTTPError, RequestException try: response = requests.get(\'http://httpbin.org/get\', timeout=0.01) print(response.status_code) except ReadTimeout: print(\'Timeout\') except HTTPError: print(\'HTTP REEOR\') except RequestException: print(\'Error\')
参考:
http://www.cnblogs.com/0bug/p/8899841.html
官方文档:http://www.python-requests.org/en/master/_modules/requests/exceptions/#RequestException