爬取笔趣阁小说-xpath

时间:2022-11-20 10:57:17

1、获取小说标题、详情页链接

爬取笔趣阁小说-xpath

url = 'https://www.bqg99.com/book/109323/'
list_html = requests.get(url=url,headers=headers)
selector =etree.HTML(list_html.text)
lis =selector.xpath('/html/body/div[@class="listmain"]//dd/a/@href') #提取所有章节页
title = selector.xpath('/html/body/div/span/text()')[0]

2、构造详情页链接并处理成列表形式

emptylist = []
for i in lis:
href_list = "https://www.bqg99.com" + i
# print(href_list)
emptylist.append(href_list)
emptylist.remove(emptylist[10]) #第11个网页不是我们想要的网页链接

3、访问小说每一章内容,获取数据并下载

for li in emptylist:
req = requests.get(url=li,headers=headers)
sel = etree.HTML(req.text)
content = sel.xpath('//*[@]/text()')
chapter =sel.xpath('//*[@]/div/span/text()')[0]
content = '\n'.join(content) #用换行符\n 拼接列表
content = content.replace('请收藏本站:https://www.bqg99.com。笔趣阁手机版:https://m.bqg99.com ','')
this_chapter =f'\n{chapter}\n{content}'
with open(file=file_name,mode='a',encoding='UTF-8') as f:
f.write(this_chapter)
print(f'{chapter}--下载完成!') #打印下载

4、代码

import requests
from lxml import etree

headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
}
url = 'https://www.bqg99.com/book/109323/'
list_html = requests.get(url=url,headers=headers)
selector =etree.HTML(list_html.text)
lis =selector.xpath('/html/body/div[@class="listmain"]//dd/a/@href') #提取所有章节页
title = selector.xpath('/html/body/div/span/text()')[0]
file_name = f'小说/{title}.txt' #定义本地存储名称
emptylist = []
for i in lis:
href_list = "https://www.bqg99.com" + i
# print(href_list)
emptylist.append(href_list)
emptylist.remove(emptylist[10]) #第11个网页不是我们想要的网页链接
# print(emptylist)
for li in emptylist:
req = requests.get(url=li,headers=headers)
sel = etree.HTML(req.text)
content = sel.xpath('//*[@]/text()')
chapter =sel.xpath('//*[@]/div/span/text()')[0]
content = '\n'.join(content) #用换行符\n 拼接列表
content = content.replace('请收藏本站:https://www.bqg99.com。笔趣阁手机版:https://m.bqg99.com ','')
this_chapter =f'\n{chapter}\n{content}'
with open(file=file_name,mode='a',encoding='UTF-8') as f:
f.write(this_chapter)
print(f'{chapter}--下载完成!') #打印下载


5、总结

a.详情页没处理成列表导致报错

爬取笔趣阁小说-xpath


b.用etree接受数据导致报错

爬取笔趣阁小说-xpath