1.selenium自动化
selenium可以操作浏览器,在浏览器页面上实现:点击、输入、滑动 等操作。
不同于selenium自动化,逆向本质是:
- 分析请求,例如:请求方法、请求参数、加密方式等。
- 用代码模拟请求去实现同等功能。
逆向 vs 自动化Selenium
- Selenium,【优】简单不需要逆向,只需要控制浏览器去执行预设的操作即可;【缺点】性能差,不利于批量实现
- 逆向, 【优】算法逆向出来后,性能好且利于批量实现; 【缺点】语法难搞的js加密算法,不容易逆向
2.必备操作
2.1 模块 & 驱动
-
安装模块
pip install selenium
-
下载驱动
Selenium想要控制谷歌、火狐、IE、Edage等浏览器,必须要使用对应的驱动才行。【Selenium】->【驱动】->【浏览器】 【Selenium】->【火狐驱动】->【火狐浏览器】 【Selenium】->【谷歌驱动】->【谷歌浏览器】 谷歌驱动的下载: 114及之前版本: http://chromedriver.storage.googleapis.com/index.html 117/118/119版本: https://googlechromelabs.github.io/chrome-for-testing/ 浏览器版本的获取: 在谷歌浏览器*问 chrome://version/ 例如:119.0.6045.200 (正式版本) (64 位) (cohort: Stable)
-
快速使用
import time from selenium import webdriver from selenium.webdriver.chrome.service import Service service = Service("driver/chromedriver.exe") driver = webdriver.Chrome(service=service) driver.get('https://passport.bilibili.com/login') time.sleep(5) driver.close()
2.2 寻找标签
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.get('打开网址')
# find_element find_elements
tag = driver.find_element(By.ID, "user")
tag = driver.find_element(By.CLASS_NAME, "c1")
tag = driver.find_element(By.TAG_NAME, "div")
tag = driver.find_element(By.XPATH, "/html/body/div[1]/div/div[2]/div[3]/div[3]/div/div/div/div[1]/span[2]")
tag = driver.find_element(By.XPATH, '//*[@id="geetest-wrap"]//input[@name="tel"]')
tag_list = driver.find_elements(By.XPATH, "/html/body/div/div[2]/div/div[2]/div/div[2]/div[2]/div/div/div/div/div[2]/a")
for tag in tag_list:
print(tag)
time.sleep(5)
driver.close()
示例:5xclass.cn
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.get('https://www.5xclass.cn/')
# 根据ID寻找
tag = driver.find_element(By.ID, "bs-example-navbar-collapse-1")
print(tag.text)
print(10 * "-")
# 根据类名寻找
tags = driver.find_elements(By.CLASS_NAME, "panel-heading")
for tag in tags:
print(tag.text)
print(10 * "-")
# 根据标签名称寻找
tags = driver.find_elements(By.TAG_NAME, "li")
for tag in tags:
print(tag.text)
print(10 * "-")
# 根据XPATH寻找
tag = driver.find_element(By.XPATH, "/html/body/div/div[2]/div/div[2]/div/div[2]/div[1]")
print(tag.text)
print(10 * "-")
# 根据XPATH寻找
tag = driver.find_element(By.XPATH, '//*[@id="bs-example-navbar-collapse-1"]/ul[1]/li[1]/a')
print(tag.text)
print(10 * "-")
# 根据XPATH寻找多个
tags = driver.find_elements(By.XPATH, '/html/body/div/div[2]/div/div[2]/div/div[2]/div[2]/div/div/div/div/div[2]/a')
for tag in tags:
print(tag.text)
print(10 * "-")
# 根据父子关系嵌套寻找
parent = driver.find_element(By.XPATH, '/html/body/div/div[2]/div/div[2]/div/div[2]/div[2]/div/div/div/div')
tags = parent.find_elements(By.XPATH, "div[@class='course']/a")
for tag in tags:
print(tag.text)
time.sleep(5)
driver.close()
2.3 执行操作
常见的执行操作:点击、输入
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.get('https://passport.bilibili.com/login')
# 1.点击短信登录
time.sleep(3)
sms_btn = driver.find_element(
By.XPATH,
'//*[@id="app"]/div[2]/div[2]/div[3]/div[1]/div[3]'
)
sms_btn.click() # 点击
# 2.输入账号
phone_txt = driver.find_element(
By.XPATH,
'//*[@id="app"]/div[2]/div[2]/div[3]/div[2]/div[1]/div[1]/input'
)
phone_txt.send_keys("18630087660") # 输入
time.sleep(55)
driver.close()
2.4 执行JavaScript
如果【选择标签】【执行操作】这种操作起来比较繁琐,也可以直接在页面上去执行js代码实现功能。
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.get('https://passport.bilibili.com/login')
# ############# 1.点击短信登录 #############
time.sleep(3)
sms_btn = driver.find_element(
By.XPATH,
'//*[@id="app"]/div[2]/div[2]/div[3]/div[1]/div[3]'
)
sms_btn.click()
# ############# 2.输入账号 #############
phone_txt = driver.find_element(
By.XPATH,
'//*[@id="app"]/div[2]/div[2]/div[3]/div[2]/div[1]/div[1]/input'
)
phone_txt.send_keys("18630087660")
# ############# 3.选择国家 #############
time.sleep(2)
driver.execute_script('document.querySelector(".area-code-select").children[18].click()')
# ############# 4.读取cookie #############
data_string = driver.execute_script('return document.cookie;') # return document.title;
print(data_string)
# ############# 5.读取cookie #############
cookie_list = driver.get_cookies()
print(cookie_list)
time.sleep(2550)
driver.close()
2.5 等待
如果页面加载比较慢,需要等待某个元素加载成功后,再执行某些操作。
示例1:基于lambda表达式
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.wait import WebDriverWait
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.get('https://passport.bilibili.com/login')
# ############# 方式1:点击短信登录 #############
time.sleep(3)
sms_btn = driver.find_element(
By.XPATH,
'//*[@id="app"]/div[2]/div[2]/div[3]/div[1]/div[3]'
)
sms_btn.click()
# ############# 方式2:点击短信登录(推荐) #############
sms_btn = WebDriverWait(driver, 30, 0.5).until(lambda dv: dv.find_element(
By.XPATH,
'//*[@id="app"]/div[2]/div[2]/div[3]/div[1]/div[3]'
))
sms_btn.click()
示例2:自定义函数
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.wait import WebDriverWait
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.get('https://passport.bilibili.com/login')
def func(dv):
print("无返回值,则间隔0.5s执行一次此函数;如有返回值,则复制给sms_btn变量")
# <div xxx="123" id="uuu"></div>
# <img src="..."/>
tag = dv.find_element(
By.XPATH,
'//*[@id="app"]/div[2]/div[2]/div[3]/div[1]/div[3]'
)
img_src = tag.get_attribute("xxx")
if img_src:
return tag
return
sms_btn = WebDriverWait(driver, 30, 0.5).until(func)
sms_btn.click()
time.sleep(250)
driver.close()
2.4 执行JavaScript
如果【选择标签】【执行操作】这种操作起来比较繁琐,也可以直接在页面上去执行js代码实现功能。
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.get('https://passport.bilibili.com/login')
# ############# 1.点击短信登录 #############
time.sleep(3)
sms_btn = driver.find_element(
By.XPATH,
'//*[@id="app"]/div[2]/div[2]/div[3]/div[1]/div[3]'
)
sms_btn.click()
# ############# 2.输入账号 #############
phone_txt = driver.find_element(
By.XPATH,
'//*[@id="app"]/div[2]/div[2]/div[3]/div[2]/div[1]/div[1]/input'
)
phone_txt.send_keys("18630087660")
# ############# 3.选择国家 #############
time.sleep(2)
driver.execute_script('document.querySelector(".area-code-select").children[18].click()')
# ############# 4.读取cookie #############
data_string = driver.execute_script('return document.cookie;') # return document.title;
print(data_string)
# ############# 5.读取cookie #############
cookie_list = driver.get_cookies()
print(cookie_list)
time.sleep(2550)
driver.close()
2.5 等待
如果页面加载比较慢,需要等待某个元素加载成功后,再执行某些操作。
示例1:基于lambda表达式
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.wait import WebDriverWait
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.get('https://passport.bilibili.com/login')
# ############# 方式1:点击短信登录 #############
time.sleep(3)
sms_btn = driver.find_element(
By.XPATH,
'//*[@id="app"]/div[2]/div[2]/div[3]/div[1]/div[3]'
)
sms_btn.click()
# ############# 方式2:点击短信登录(推荐) #############
sms_btn = WebDriverWait(driver, 30, 0.5).until(lambda dv: dv.find_element(
By.XPATH,
'//*[@id="app"]/div[2]/div[2]/div[3]/div[1]/div[3]'
))
sms_btn.click()
示例2:自定义函数
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.wait import WebDriverWait
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.get('https://passport.bilibili.com/login')
def func(dv):
print("无返回值,则间隔0.5s执行一次此函数;如有返回值,则复制给sms_btn变量")
# <div xxx="123" id="uuu"></div>
# <img src="..."/>
tag = dv.find_element(
By.XPATH,
'//*[@id="app"]/div[2]/div[2]/div[3]/div[1]/div[3]'
)
img_src = tag.get_attribute("xxx")
if img_src:
return tag
return
sms_btn = WebDriverWait(driver, 30, 0.5).until(func)
sms_btn.click()
time.sleep(250)
driver.close()
示例3:全局配置
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
# 后续找元素时,没找到时则等待10去寻找(一旦找到则继续)
driver.implicitly_wait(30)
driver.get('https://passport.bilibili.com/login')
sms_btn = driver.find_element(
By.XPATH,
# '//*[@id="app"]/div[2]/div[2]/div[3]/div[1]/div[3]'
'//*[@id="xxxxxxxxxapp"]/div[2]/div[2]/div[3]/div[1]/div[3]'
)
sms_btn.click()
print("找到了")
time.sleep(250)
driver.close()
2.6 获取值
当找到某个标签之后,想要获取标签内部值。
示例1:文本和属性
例如:<a id='x1' class="info mine" href="5xclass.cn">武沛齐</a>
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.implicitly_wait(10)
driver.get('https://www.5xclass.cn')
tag = driver.find_element(
By.XPATH,
'/html/body/div/div[2]/div/div[2]/div/div[2]/div[2]/div/div/div/div/div[2]/a[1]'
)
print(tag.text)
print(tag.get_attribute("target"))
print(tag.get_attribute("data-toggle"))
driver.close()
示例2:获取值
例如:<input type='text' value="?" placeholder="?" />
例如:<select ><option value='1'>北京</option> </option value='2'>上海</option> </select>
,获取select标签的value属性
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.implicitly_wait(10)
driver.get('https://www.bilibili.com/')
time.sleep(10)
tag = driver.find_element(
By.XPATH,
'//*[@id="nav-searchform"]/div[1]/input'
)
print(tag)
print(tag.text)
print(tag.get_attribute("placeholder"))
print(tag.get_attribute("value"))
time.sleep(1000)
driver.close()
示例3:选择相关
<input type="radio" name="findcar" value="1" checked="">新车
<input type="radio" name="findcar" value="2">二手机
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.implicitly_wait(10)
driver.get('https://www.autohome.com.cn/beijing/')
# ############### 1.单独找到每一个 ###############
tag = driver.find_element(
By.XPATH,
'/html/body/div[1]/div[11]/div[2]/div[1]/div[1]/label[1]/span/input'
)
print(tag.get_property("checked")) # True
tag = driver.find_element(
By.XPATH,
'/html/body/div[1]/div[11]/div[2]/div[1]/div[1]/label[2]/span/input'
)
print(tag.get_property("checked")) # False
# ############### 2.循环找到每一个 ###############
parent = driver.find_element(
By.XPATH,
'/html/body/div[1]/div[11]/div[2]/div[1]/div[1]'
)
tag_list = parent.find_elements(
By.XPATH,
'label/span/input'
)
for tag in tag_list:
print( tag.get_property("checked"), tag.get_attribute("value") )
driver.close()
2.7 源码+bs4
打开页面后,如果基于selenium不太容易定位和寻找,也可以结合bs4来进行寻找。
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
driver.implicitly_wait(10)
driver.get('https://car.yiche.com/')
html_string = driver.page_source
soup = BeautifulSoup(html_string, features="html.parser")
tag_list = soup.find_all(name="div", attrs={"class": "item-brand"})
for tag in tag_list:
child = tag.find(name='div', attrs={"class": "brand-name"})
print(child.text)
driver.close()
2.8 携带Cookie
driver.add_cookie({'name': 'foo', 'value': 'bar'})
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
service = Service("driver/chromedriver.exe")
driver = webdriver.Chrome(service=service)
# 注意:一定要先访问,不然Cookie无法生效
driver.get('https://dig.chouti.com/about')
# 加cookie
driver.add_cookie({
'name': 'token',
'value': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJqaWQiOiJjZHVfNDU3OTI2NDUxNTUiLCJleHBpcmUiOiIxNzA0MzI5NDY5OTMyIn0.8n_tWcEHXsBSXWIY9rBoGWwaLPF8iWIruryhKTe5_ks'
})
# 再访问
driver.get('https://dig.chouti.com/')
time.sleep(2000)
driver.close()
2.9 IP检测和代理
如果网站进行了IP访问限制,例如:每个IP每天只能操作5次。此时可以选择购买IP,然后在请求时添加代理IP即可,具体步骤:
- 购买IP
- 登录购买IP渠道的后台,配置自己IP白名单
- 代码携带代理
import time
import requests
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
# 换成自己生成的代理
res