爬虫—代理的使用

时间:2021-12-24 14:46:54

使用代理IP

一,requests使用代理

  requests的代理需要构造一个字典,然后通过设置proxies参数即可。

import requests

proxy = '60.186.9.233'
proxies = {
    'http': 'http://' + proxy,
    'https': 'https://' + proxy
}
try:
    res = requests.get('http://httpbin.org/get', proxies=proxies)
    print(res.text)
except requests.exceptions.ConnectionError as e:
    print('error', e.args)

运行结果:

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "origin": "60.186.9.233", 
  "url": "https://httpbin.org/get"
}

  其运行结果的origin是代理的IP,说明代理设置成功。如果代理需要认证,再代理的前面加上用户名密码即可。

proxy = 'username:password@60.186.9.233'

二,Selenium使用代理

  Selenium同样可以设置代理,一种是有界面浏览器,Chrome为例;另一种是无头浏览器,以PhantomJS为例。

Chrome浏览器设置

  通过chrome_options来设置代理,才创建Chrome对象的时候用chrome_options参数传递即可。运行代码会弹出Chrome浏览器,访问连接后看到如下结果。

# chrome代理设置
from selenium import webdriver

proxy = '60.186.9.233'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=http://' + proxy)
browser = webdriver.Chrome(chrome_options=chrome_options)
res = browser.get('http://httpbin.org/get')
{
  "args": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "zh-CN,zh;q=0.9", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
  }, 
  "origin": "60.186.9.233", 
  "url": "https://httpbin.org/get"
}

 

PhantomJS设置

  使用service_args参数将命令行的一些参数定义为列表,在初始化的时候传递给PhantomJS就可以了。

# PhantomJs代理设置
from selenium import webdriver

service_args = [
    '--proxy=60.186.9.233',
    '--proxy-type=http'
]
browser = webdriver.PhantomJS(service_args=service_args)
browser.get('http://httpbin.org/get')
print(browser.page_source)

运行结果:

{
  "args": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "zh-CN,zh;q=0.9", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
  }, 
  "origin": "60.186.9.233", 
  "url": "https://httpbin.org/get"
}

如果需要认证,那么在service_args参数中加入--proxy-auth选项即可。

service_args = [
    '--proxy=60.186.9.233',
    '--proxy-type=http',
    '--proxy-auth=username:password'
]