在写爬虫的时候要使用到浏览器ua

分享一下今天学到的如何使用动态ua的进行爬取

1.简单的爬取网页信息

from urllib.request import urlopen
#目标地址
url = "https://www.baidu.com"
#请求
respose = urlopen(url)
#读取内容
info = respose.read()
#打印输出
print(info.decode())

2.使用request爬取百度网页信息

from urllib.request import urlopen
from urllib.request import Request
from random import choice
#目标地址
url = "https://www.baidu.com"
#随机获取一个浏览器ua
user_agents= [
    "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)The World 2.x",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
]
headers = {
    "User-Agent":choice(user_agents)
}

#请求
request = Request(url,headers = headers)

response = urlopen(request)
#读取内容
info = response.read()
#打印输出
print(info.decode())

3.使用useragent实现动态ua

from urllib.request import urlopen
from urllib.request import Request
from fake_useragent import UserAgent
#目标地址
url = "https://www.baidu.com"
#随机获取一个动态ua，ua.chrome,ua.firfox都可以
ua = UserAgent()
headers = {
    "User-Agent":ua.chrome
}
#发起请求
request = Request(url,headers = headers)
#urlopen()获取页面，类型是字节，需要用decode()解码，转换成str类型
respose = urlopen(request)
#读取数据
info = respose.read()
#打印输出
print(info.decode())

秒客网

python学习03-使用动态ua

在写爬虫的时候要使用到浏览器ua

1.简单的爬取网页信息

2.使用request爬取百度网页信息

3.使用useragent实现动态ua

相关文章