python_爬虫一之爬取糗事百科上的段子

时间:2021-10-25 18:32:47

目标

  • 抓取糗事百科上的段子
  • 实现每按一次回车显示一个段子
  • 输入想要看的页数,按 'Q' 或者 'q' 退出

实现思路

代码内容:

 1 import requests
 2 from bs4 import BeautifulSoup
 3 
 4 
 5 def get_content(pages):  # get jokes list
 6     headers = {'user_agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) Apple\
 7     WebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.87 Safari/537.36'}  # 用户代理
 8     content_list = []
 9     for page in range(1, pages+1):  # 想看多少页
10         url = 'http://www.qiushibaike.com/text/page/' + str(page) + '/?s=4928950'
11         response = requests.get(url, headers=headers)  # 获取网页内容
12         html = response.text
13         soup = BeautifulSoup(html, 'html5lib')  # 解析网页内容
14         jokes = soup.find_all('div', class_='content')
15         for each in jokes:
16             each_joke = each.get_text()
17             joke = each_joke.replace('\n', '')  # 将换行符替换
18             content_list.append(joke)
19     return content_list  # 返回段子列表
20 
21 
22 if __name__ == "__main__":
23     number = int(input("How many pages do you want to read?\nIf you want to quit, just press 'q'.\n"))  # 输入想要看的页数
24     print()  # 换行,便于阅读
25     for paragraph in get_content(number):
26         print(paragraph)
27         user_input = input()
28         if user_input == 'q':  # 按'q'退出
29             break

 

结果展示:

python_爬虫一之爬取糗事百科上的段子

 

参考:

Python爬虫实战一之爬取糗事百科段子

http://www.jianshu.com/p/19c846daccb3

静谧的爬虫教程:https://cuiqingcai.com/990.html

爬取段子参考:http://www.jianshu.com/p/0e7d1c80b8c3