scrapy爬虫框架结合BeautifulSoup

①安装scrapy
pip install scrapy
依赖的包 python-lxml python-dev libffi-dev
在指定目录下创建项目：
$ scrapy startproject weather
②定义Item
Item就是要保存的属性对象，定义在Item.py中
Item 是保存爬取到的数据的容器；其使用方法和python字典类似，并且提供了额外保护机制来避免拼写错误导致的未定义字段错误。

import scrapy
class BkgscrapyItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
    name =scrapy.Field()
pass

③编写spider

import scrapy
from bs4 import BeautifulSoup
from weather.items import WeatherItem


class localspider(scrapy.Spider):
    name="myspider"
    allowed_domains=["meizitu.com/"]
    start_urls=['http://www.meizitu.com/']   

def parse(self, response):
        html_doc = response.body
#html_doc = html_doc.decode('utf-8')
        soup = BeautifulSoup(html_doc,'lxml')
        itemTemp = {}
        itemTemp['name'] = soup.find(id='slider_name')
return item

秒客网

scrapy爬虫框架结合BeautifulSoup

相关文章