应用 Scrapy框架 ,配置动态IP处理反爬。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
# settings 配置中间件
DOWNLOADER_MIDDLEWARES = {
'text.middlewares.TextDownloaderMiddleware' : 543 ,
# 'text.middlewares.RandomUserAgentMiddleware': 544,
# 'text.middlewares.CheckUserAgentMiddleware': 545,
'text.middlewares.ProxyMiddleware' : 546 ,
'text.middlewares.CheckProxyMiddleware' : 547
}
# settings 配置可用动态IP
PROXIES = [
"http://101.231.104.82:80" ,
"http://39.137.69.6:8080" ,
"http://39.137.69.10:8080" ,
"http://39.137.69.7:80" ,
"http://39.137.77.66:8080" ,
"http://117.191.11.102:80" ,
"http://117.191.11.113:8080" ,
"http://117.191.11.113:80" ,
"http://120.210.219.103:8080" ,
"http://120.210.219.104:80" ,
"http://120.210.219.102:80" ,
"http://119.41.236.180:8010" ,
"http://117.191.11.80:8080"
]
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
# middlewares 配置中间件
import random
class ProxyMiddleware( object ):
def process_request( self , request, spider):
ip = random.choice(spider.settings.get( 'PROXIES' ))
print ( '测试IP:' , ip)
request.meta[ 'proxy' ] = ip
class CheckProxyMiddleware( object ):
def process_response( self , request, response, spider):
print ( '代理IP:' , request.meta[ 'proxy' ])
return response
|
到此这篇关于Scrapy 配置动态代理IP的实现的文章就介绍到这了,更多相关Scrapy 动态代理IP内容请搜索服务器之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持服务器之家!
原文链接:https://blog.csdn.net/BradyCC/article/details/90759341