python爬虫—requests库的用法
总结:get、post等用法…
import requests
req = requests.get("/?tn=15007414_8_dg")
req = requests.post("/?tn=15007414_8_dg")
req = requests.put("/?tn=15007414_8_dg")
req = requests.delete("/?tn=15007414_8_dg")
req = requests.head("/?tn=15007414_8_dg")
req = requests.options("/?tn=15007414_8_dg")
- 1
- 2
- 3
- 4
- 5
- 6
- 7
get请求
import requests
url = "/s"
params = {"wd":'篮球'}
response = requests.get(url,params = params)#传个字典不用格式
print(response.url)
response.encoding = "utf-8"
html = response.text
print(html)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
post请求,参数是字典,也可以传输json参数
import requests
from fake_useragent import UserAgent
url = "/hsjw/cas/"
headers = {
"User-Agent":UserAgent().firefox
}
formdata = {
"user":"******",
"password":"****** "
}
response = requests.post(url,data=formdata,headers =headers)
response.encoding = "utf-8"
html = response.text
print(html)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
自定义请求头部,伪装请求头是采集术静静常用的,我们可用这个方法去隐藏
import requests
from fake_useragent import UserAgent
headers = {"User-Agent":UserAgent().firefox}
r = requests.get("",headers =headers)
print(r.request.headers["User-Agent"])
- 1
- 2
- 3
- 4
- 5
设置超时时间
import requests
requests.get("/?tn=monline_3_dg",timeout = 0.001)
- 1
- 2
代理访问
import requests
proxies = {
"http":"http://122.9.101.6:8888",
"https":"https://61.157.206.174:37259"
"http":"http:user:password@//122.9.101.6:8888"#需要账户密码
}
requests.get("/",proxies = proxies)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
session 自动保存cookies\seesion的意思是操持一个对话,比如登陆后续继续操作(记录身份信息),而requests的请求,身份信息不会被记录
import requests
s = requests.Session()
#用session对象发出get请求,设置cookies
print(s.get("/cookies/set/sessioncookie/123456789"))
- 1
- 2
- 3
- 4
- 5
ssl验证,禁用安全请求警告
import requests
from fake_useragent import UserAgent
headers = {
'User-Agent':UserAgent().firefox
}
url = "/?tn=monline_3_dg"
requests.packages.urllib3.disable_warnings()#关闭安全请求的警告
response = requests.get(url,verify = False,headers =headers)
print(response)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
获取响应信息
“”""
() 获取相应内容以json为例
获取响应内容以字符串形式
获取响应内容(以字节的形式)
获取响应头内容
获取访问地址
获取网页编码
resp.request.headers 请求头内容
获取cookie
“”"