selenium是Python的第三方库,使用前需要安装。但是如果你使用的是anaconda,就可以省略这个步骤,为啥?自带,任性。
安装命令:
pip install selenium
(一)使用selenium打开指定网站,这里以淘宝为例。
# -*- coding: utf-8 -*-
"""
Created on Wed Jul 25 10:12:39 2018
@author: brave_man
email: 1979887709@qq.com
"""
from selenium import webdriver
from time import sleep
b = webdriver.Chrome()
b.get("http://www.taobao.com")
elem = b.find_element_by_id('q')
elem.send_keys('iphone')
sleep(3)
elem.clear()
elem.send_keys("ipad")
button = b.find_element_by_class_name("btn-search")
button.click()
sleep(5)
b.close()
(二)简单的拖拽动作(用于验证码识别)
# -*- coding: utf-8 -*-
"""
Created on Wed Jul 25 15:00:10 2018
@author: brave_man
email: 1979887709@qq.com
""" from selenium import webdriver
from selenium.webdriver import ActionChains b = webdriver.Chrome()
url = "http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable"
b.get(url)
b.switch_to.frame('iframeResult')
sou = b.find_element_by_css_selector('#draggable')
tar = b.find_element_by_css_selector('#droppable')
actions = ActionChains(b)
actions.drag_and_drop(sou, tar)
actions.perform()
(三)在爬虫中,可能会由于网速等外界因素的影响,造成获取网页元素失败,这里介绍两种等待模式
1. 隐式等待:webdriver没有在DOM中找到想要的元素,在等待指定的时间后,会抛出一个找不到指定元素的异常。在网速特别慢的情况可以使用
from selenium import webdriver b = webdriver.Chrome()
b.implicitly_wait(10)
b.get("https://www.zhihu.com/explore")
elem = b.find_element_by_class_name('zu-top-add-question')
print(elem)
2. 显式等待
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC b = webdriver.Chrome()
#b.implicitly_wait(10)
b.get("https://taobao.com/")
#elem = b.find_element_by_class_name('zu-top-add-question')
b_wait = WebDriverWait(b, 10)
elem = b_wait.until(EC.presence_of_all_elements_located((By.ID, 'q')))
button = b_wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.btn-search')))
print(elem, button)
(四)前进后退
from selenium import webdriver
from time import sleep b = webdriver.Chrome()
b.get("http://www.baidu.com")
sleep(1)
b.get("http://www.sina.com.cn")
sleep(1)
b.back()
sleep(3)
b.forward()
sleep(3)
b.close()
更多内容可以参考文档:http://selenium-python-zh.readthedocs.io/en/latest/index.html