python爬虫之requests模块

时间:2021-06-24 22:22:11

一. 登录事例

a. 查找汽车之家新闻 标题 链接 图片写入本地

python爬虫之requests模块python爬虫之requests模块
import requests
from bs4 import BeautifulSoup
import uuid

response
= requests.get(
'http://www.autohome.com.cn/news/'
)
response.encoding
= 'gbk'
soup
= BeautifulSoup(response.text,'html.parser') # HTML会转换成对象
tag
= soup.find(id='auto-channel-lazyload-article')
li_list
= tag.find_all('li')

for i in li_list:
a
= i.find('a')
if a:
print(a.attrs.
get('href'))
txt
= a.find('h3').text
print(txt)
img_url
= txt = a.find('img').attrs.get('src')
print(img_url)


img_response
= requests.get(url=img_url)
file_name
= str(uuid.uuid4()) + '.jpg'
with open(file_name,
'wb') as f:
f.write(img_response.content)
用到BeautifulSoup模块寻找标签

b. 抽屉点赞 获取页面和登录都会获取gpsd  点赞会使用获取页面的gpsd 而不是登录的gpsd

python爬虫之requests模块python爬虫之requests模块
import requests

#先获取页面

r1
= requests.get('http://dig.chouti.com/')
r1_cookies
= r1.cookies.get_dict()

#登录
post_dict
= {
"phone":"8615131255089",
"password":"woshiniba",
"oneMonth":"1"
}

r2
= requests.post(
url
="http://dig.chouti.com/login",
data
= post_dict,
cookies
=r1_cookies
)

r2_cookies
= r2.cookies.get_dict()

# 访问其他页面
r3
= requests.post(
url
="http://dig.chouti.com/link/vote?linksId=13921091",
cookies
={'gpsd':r1_cookies['gpsd']}
)
print(r3.text)
抽屉网页面的(gpsd)

c. 登录githup 携带cookie登录

python爬虫之requests模块python爬虫之requests模块
import requests
from bs4 import BeautifulSoup

r1
= requests.get('https://github.com/login')
s1
= BeautifulSoup(r1.text,'html.parser')

# 获取csrf_token
token
= s1.find(name='input',attrs={'name':"authenticity_token"}).get('value')
r1_cookie_dict
= r1.cookies.get_dict()

# 将用户名 密码 token 发送到服务端 post
r2
= requests.post(
'https://github.com/session',
data
={
'commit':'Sign in',
'utf8':'',
'authenticity_token':token,
'login':'317828332@qq.com',
'password':'alex3714'
},
cookies
=r1_cookie_dict
)

# 获取登录后cookie
r2_cookie_dict
= r2.cookies.get_dict()

#合并登录前的cookie和登录后的cookie
cookie_dict
= {}
cookie_dict.update(r1_cookie_dict)
cookie_dict.update(r2_cookie_dict)




r3
= requests.get(
url
='https://github.com/settings/emails',
cookies
=cookie_dict
)

print(r3.text)
View Code

二.  requests 参数 

- method:  提交方式
- url: 提交地址
- params: 在URL中传递的参数,GET
- data: 在请求体里传递的数据
- json 在请求体里传递的数据
- headers 请求头
- cookies Cookies
- files 上传文件
- auth 基本认知(headers中加入加密的用户名和密码)
- timeout 请求和响应的超市时间
- allow_redirects 是否允许重定向
- proxies 代理
- verify 是否忽略证书
- cert 证书文件
- stream 村长下大片
- session: 用于保存客户端历史访问信息

a. file 发送文件

python爬虫之requests模块python爬虫之requests模块
import requests

requests.post(
url
='xxx',
filter
={
'name1': open('a.txt','rb'), #名称对应的文件对象
'name2': ('bbb.txt',open('b.txt','rb')) #表示上传到服务端的名称为 bbb.txt
}
)
View Code

b. auth 认证

#配置路由器访问192.168.0.1会弹出小弹窗,输入用户名,密码 点击登录不是form表单提交,是基本登录框,这种框会把输入的用户名和密码 经过加密放在请求头发送过去
python爬虫之requests模块python爬虫之requests模块
import requests

requests.post(
url
='xxx',
filter
={
'name1': open('a.txt','rb'), #名称对应的文件对象
'name2': ('bbb.txt',open('b.txt','rb')) #表示上传到服务端的名称为 bbb.txt
}
)
View Code

c. stream 流

python爬虫之requests模块python爬虫之requests模块
#如果服务器文件过大,循环下载

def param_stream():
ret
= requests.get('http://127.0.0.1:8000/test/', stream=True)
print(ret.content)
ret.close()

#
from contextlib import closing
# with closing(requests.
get('http://httpbin.org/get', stream=True)) as r:
# # 在此处理响应。
#
for i in r.iter_content():
# print(i)
View Code

d. session  和django不同  事例:简化抽屉点赞

python爬虫之requests模块python爬虫之requests模块
    import requests

session
= requests.Session()

###
1、首先登陆任何页面,获取cookie

i1
= session.get(url="http://dig.chouti.com/help/service")

###
2、用户登陆,携带上一次的cookie,后台对cookie中的 gpsd 进行授权
i2
= session.post(
url
="http://dig.chouti.com/login",
data
={
'phone': "8615131255089",
'password': "xxxxxx",
'oneMonth': ""
}
)

i3
= session.post(
url
="http://dig.chouti.com/link/vote?linksId=8589623",
)
print(i3.text)
View Code