rules = [
Rule(SgmlLinkExtractor(allow=('/u012150179/article/details'),
restrict_xpaths=('//li[@class="next_article"]')),
callback='parse_item',
follow=True)
] def parse_item(self, response): #print "parse_item>>>>>>"
item = CsdnblogcrawlspiderItem()
blog_url = str(response.url)
blog_name = response.xpath('//div[@id="article_details"]/div/h1/span/a/text()').extract() item['blog_name'] = [n.encode('utf-8') for n in blog_name]
item['blog_url'] = blog_url.encode('utf-8') return item
相关文章
- Python.错误解决:scrapy 没有crawl 命令
- Python3.7 Scrapy crawl 运行出错解决方法
- scrapy的User-Agent中间件、代理IP中间件、cookies设置、多个爬虫自定义settings设置
- Python爬虫框架scrapy实现downloader_middleware设置proxy代理功能示例
- Scrapy学习-13-使用DownloaderMiddleware设置IP代理池及IP变换
- 如何从项目管道访问scrapy设置
- scrapy中如何设置应用cookies的方法(3种)
- scrapy crawl rules设置
- 反反爬虫------设置scrapy随机user_agents
- scrapy相关 通过设置 FEED_EXPORT_ENCODING 解决 unicode 中文写入json文件出现`\uXXXX`