python_爬虫一之爬取糗事百科上的段子

目标

抓取糗事百科上的段子
实现每按一次回车显示一个段子
输入想要看的页数，按 'Q' 或者 'q' 退出

实现思路

目标网址：糗事百科
使用requests抓取页面 requests官方教程
使用bs4模块解析页面，获取内容 bs4官方教程

代码内容：

 1 import requests
 2 from bs4 import BeautifulSoup
 3 
 4 
 5 def get_content(pages):  # get jokes list
 6     headers = {'user_agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) Apple\
 7     WebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.87 Safari/537.36'}  # 用户代理
 8     content_list = []
 9     for page in range(1, pages+1):  # 想看多少页
10         url = 'http://www.qiushibaike.com/text/page/' + str(page) + '/?s=4928950'
11         response = requests.get(url, headers=headers)  # 获取网页内容
12         html = response.text
13         soup = BeautifulSoup(html, 'html5lib')  # 解析网页内容
14         jokes = soup.find_all('div', class_='content')
15         for each in jokes:
16             each_joke = each.get_text()
17             joke = each_joke.replace('\n', '')  # 将换行符替换
18             content_list.append(joke)
19     return content_list  # 返回段子列表
20 
21 
22 if __name__ == "__main__":
23     number = int(input("How many pages do you want to read?\nIf you want to quit, just press 'q'.\n"))  # 输入想要看的页数
24     print()  # 换行，便于阅读
25     for paragraph in get_content(number):
26         print(paragraph)
27         user_input = input()
28         if user_input == 'q':  # 按'q'退出
29             break