scrapy框架简易整理

- scrapy框架
介绍：大而全的爬虫组件。

安装：
           - Win:
               下载：http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted

               pip3 install wheel
               pip install Twisted‑18.4.0‑cp36‑cp36m‑win_amd64.whl

               pip3 install pywin32

               pip3 install scrapy
           - Linux:
               pip3 install scrapy


使用：
           Django:
               # 创建project
               django-admin startproject mysite

               cd mysite

               # 创建app
               python manage.py startapp app01
               python manage.py startapp app02

               # 启动项目
               python manage.runserver

           Scrapy：
               # 创建project
               scrapy startproject xdb

               cd xdb

               # 创建爬虫
               scrapy genspider chouti chouti.com
               scrapy genspider cnblogs cnblogs.com

               # 启动爬虫
               scrapy crawl chouti




           1. 创建project
               scrapy startproject 项目名称

               项目名称
                   项目名称/
                       - spiders               # 爬虫文件
                           - chouti.py
                           - cnblgos.py
                           ....
                       - items.py                # 持久化
                       - pipelines               # 持久化
                       - middlewares.py       # 中间件
                       - settings.py            # 配置文件（爬虫）
                   scrapy.cfg                   # 配置文件（部署）

           2. 创建爬虫
               cd 项目名称

               scrapy genspider chouti chouti.com
               scrapy genspider cnblgos cnblgos.com

           3. 启动爬虫
               scrapy crawl chouti
               scrapy crawl chouti --nolog

总结：
           - HTML解析：xpath
           - 再次发起请求：yield Request对象

秒客网

scrapy框架简易整理

相关文章

scrapy框架 简易整理

相关文章

scrapy框架简易整理