使用requests+BeautifulSoup爬取龙族V小说

这几天想看龙族最新版本，但是搜索半天发现没有网站提供下载，我又只想下载后离线阅读（写代码已经很费眼睛了）。无奈只有自己爬取了。

这里记录一下，以后想看时，直接运行脚本下载小说。

这里是从 http://longzu5.co 这个网站下载的小说，如果需要更改存储路径，可以更改 FILE_URL 常量的值

如果爬取不到了，说明，此网站做了防爬虫，或者其渲染网页的 html 元素改变了。

# -*- coding: utf-8 -*-

# (C) rgc, 2018

# All rights reserved

# requirements list: [python3.6, requests, bs4]

import requests

from bs4 import BeautifulSoup

URL = "http://longzu5.co"

FILE_URL = 'E:\lz.txt'

def get_son_text(strs):

    # 获取文章内容

    soup = BeautifulSoup(strs, 'html.parser')

    body_soup = soup.find('div', 'post-body')

    result = body_soup.find_all('p')

    title = soup.find('h2', 'post-title')

    title = title.text

    final_txt = title + '\n'

    for item in result:

        txt = item.text

        final_txt += txt

    final_txt += '\n\n'

    with open(FILE_URL, 'a', encoding='utf-8') as f:

        f.write(final_txt)

def get_father_text():

    """

    获取文章列表

    :return:

    """

    res = requests.get(URL + "/")

    strs = res.text

    soup = BeautifulSoup(strs, 'html.parser')

    ul_soup = soup.find('ul', 'booklist')

    x = ul_soup.find_all('a')

    section_list = []

    for item in x:

        url = URL + item.get('href')

        section_list.append(url)

    section_list.reverse()

    for url in section_list:

        print(url)

        section = requests.get(url)

        sec_txt = section.text

        get_son_text(sec_txt)

if __name__ == '__main__':

    get_father_text()

# 如有版权，请及时联系我，我会及时删除，如有冒犯，请原谅。

秒客网

使用requests+BeautifulSoup爬取龙族V小说

相关文章