8.Python爬虫实战一之爬取糗事百科段子

大家好，前面入门已经说了那么多基础知识了，下面我们做几个实战项目来挑战一下吧。那么这次为大家带来，Python爬取糗事百科的小段子的例子。

首先，糗事百科大家都听说过吧？糗友们发的搞笑的段子一抓一大把，这次我们尝试一下用爬虫把他们抓取下来。

1.抓取糗事百科热门段子

2.过滤带有图片的段子

 #coding:utf-8

 import urllib

 import urllib2

 import re

 page = 1

 url = 'https://www.qiushibaike.com/hot/page/1/'+str(page)

 user_agent = 'Mozilla/4.0 (compatible;MSIE 5.5;Windows NT)'

 headers = {'User-Agent':user_agent}

 try:

     request = urllib2.Request(url,headers=headers)

     response = urllib2.urlopen(request)

     qiubaiPattern =re.compile('<div.*?author.*?alt="(.*?)>.*?content.*?span>(.*?)</.*?number">(.*?)<',re.S)

     infos = re.findall(qiubaiPattern,response.read().decode('utf-8'))

     for info in infos:

         for a in info:

             str = a.replace('<br/>','\r\n') #将段子正文中的<br/>替换成回车

             print str.strip() #删除字符中的首尾空格

 except urllib2.URLError,e:

         if hasattr(e,'code'):

             print e.code

         if hasattr(e,'reason'):

             print e.reason

在这里不打算详细讲解这个代码,以后有空了再回来补上嘻嘻

秒客网

8.Python爬虫实战一之爬取糗事百科段子

相关文章