python爬虫，爬起点小说网小说

说明哦！不能爬vip章节只能爬可见的，@[email protected]技术不够__

首先就是python模块：

import requests # 这个就是爬虫模块哦
from lxml import etree #是一个python库
import os #这个是创建文件夹的爬的小说要放文件中

说明哦，如果这些模块你都没有安装那我教你一个简单的方法，你把import requests写在pycharm中会报红，就是有一个红色的的下划线，你点击下报错的（红色下划线那个）然后按Alt+Enter（选第一个，等会就自己安装好了）

如果你不知道Alt哪个，の键盘倒数第四个，空格键左边第一个，Enter就是回车哈！

然后就是不能让起点网，知道是机器进来爬取数据，所以要包装一个客户端。

我用的谷歌浏览器哈，毕竟是程序员专用浏览器~

进去起点小说官网起点中文网

鼠标右键检查

继续，我的打马赛克了哈，你们要用你们自己的id哦！

import requests
from lxml import etree
import os
class Spider(object):
    def __init__(self):
      
        self.headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 我是乱码，记得用自
        己的哦！别忘记加双引号，不懂得评论，我看见了会回的-----"}#记得换成你的哦
    def start_request(self):
        response = requests.get("https://www.qidian.com/all",headers=self.headers)
        html = etree.HTML(response.content.decode())
        # print(response.content.decode())
        Bigtit_list = html.xpath('//div[@class="book-mid-info"]/h4/a/text()')
        Bighref_list = html.xpath('//div[@class="book-mid-info"]/h4/a/@href')
        for Bigtit,Bighref in zip(Bigtit_list,Bighref_list):
            # print(Bigtit,Bighref)
            #判断有没有这个文件夹
            if os.path.exists(Bigtit) == False:
                os.mkdir(Bigtit) #创建文件夹
            self.file_data(Bighref,Bigtit)

    def file_data(self,Bigurl,Bigtit):
        response = requests.get("http:" + Bigurl,headers = self.headers)
        html = etree.HTML(response.content.decode())
        Littit_list = html.xpath('//ul[@class="cf"]/li/a/text()')
        Lithref_list = html.xpath('//ul[@class="cf"]/li/a/@href')
        for tit,href in zip(Littit_list,Lithref_list):
            self.finally_file(tit,href,Bigtit)

    def finally_file(self,tit,url,Bigtit):
        response = requests.get("http:" + url,headers=self.headers)
        html = etree.HTML(response.content.decode())
        text_list = html.xpath('//div[@class="read-content j_readContent"]/p/text()')
        text = "\n".join(text_list)
        file_name =Bigtit + "\\" + tit + ".txt"
        
		print(file_name)
        print("正在抓取文章：" + file_name)
        
        with open(file_name,'a',encoding="utf-8")as f:
            f.write(text)

spider = Spider()
spider.start_request()

爬不了vip哈！看了留个好评，

放个效果图哈

python爬虫，爬起点小说网小说

秒客网

python爬虫，爬起点小说网小说

说明哦！不能爬vip章节只能爬可见的，@[email protected]技术不够__

爬不了vip哈！看了留个好评，

放个效果图哈

相关文章