Python爬虫实现获取动态gif格式搞笑图片的方法示例

本文实例讲述了Python爬虫实现获取动态gif格式搞笑图片的方法。分享给大家供大家参考，具体如下：

有时候看到一些喜欢的动图，如果一个个取保存挺麻烦，有的网站还不支持右键保存，因此使用python来获取动态图，就看看就很有意思了

本次爬取的网站是居然搞笑网 http://www.zbjuran.com/dongtai/list_4_1.html

思路：

获取当前页面内容

查找页面中动图所代表的url地址

保存这个地址内容到本地

如果想爬取多页，就可以加上一个循环条件

代码：

				?

									#!/usr/bin/python

									#coding:utf-8

									import urllib2,time,uuid,urllib,os,sys,re

									from bs4 import BeautifulSoup

									reload(sys)

									sys.setdefaultencoding('utf-8')

									#获取页面内容

									def getHtml(url):

									    try:

									        print url

									        html = urllib2.urlopen(url).read()#.decode('utf-8')#解码为utf-8

									    except:

									        return

									    return html

									#获取动图所代表的url列表

									def getImagUrl(html):

									    if not html:

									        print 'nothing can be found'

									        return

									    ImagUrlList=[]

									    soup=BeautifulSoup(html,'lxml')

									    #获取item列表

									    items=soup.find("div",{"class":"main"}).find_all('div',{'class':'item'})

									    for item in items:

									        target={}

									        #通过if语句，过滤广告项

									        if item.find('div',{"class":"text"}):

									            #获取url

									            imgurl=item.find('div',{"class":"text"}).find('img').get('src')

									            target['url']=imgurl

									            #获取名字

									            target['name']=item.find('h3').text

									            ImagUrlList.append(target)

									    return ImagUrlList

									#下载图片到本地

									def download(author,imgurl,typename,pageNo):

									    #定义文件夹的名字

									    x = time.localtime(time.time())

									    foldername = str(x.__getattribute__("tm_year"))+"-"+str(x.__getattribute__("tm_mon"))+"-"+str(x.__getattribute__("tm_mday"))

									    download_img=None

									    picpath = 'Jimy/%s/%s/%s' % (foldername,typename,str(pageNo))

									    filename = author+str(uuid.uuid1())

									    pic_type=imgurl[-3:]

									    if not os.path.exists(picpath):

									        os.makedirs(picpath)

									    target = picpath+"/%s.%s" % (filename,pic_type)

									    print "动图存贮位置:"+target

									    download_img = urllib.urlretrieve(imgurl, target)#将图片下载到指定路径中

									    print "图片出处为："+imgurl

									    return download_img

									#退出函数

									def myquit():

									    print "Bye Bye!"

									    exit(0)

									def start(pageNo):

									    targeturl="http://www.zbjuran.com/dongtai/list_4_%s.html" % str(pageNo)

									    html = getHtml(targeturl)

									    urllist=getImagUrl(html)

									    for imgurl in urllist:

									        download(imgurl['name'],imgurl['url'],'搞笑动图',pageNo)

									if __name__ == '__main__':

									    print '''

									            *****************************************

									            **  Welcome to Spider of GIF     **

									            **   Created on 2017-3-16      **

									            **   @author: Jimy         **

									            *****************************************'''

									    pageNo = raw_input("Input the page number you want to scratch (1-50),please input 'quit' if you want to quit\n\

									请输入要爬取的页面，范围为（1-100），如果退出，请输入Q>\n>")

									    while not pageNo.isdigit() or int(pageNo) > 50 or int(pageNo) < 1:

									        if pageNo == 'Q':

									            myquit()

									        print "Param is invalid , please try again."

									        pageNo = raw_input("Input the page number you want to scratch >")

									    print pageNo

									    start(pageNo)

									    #第一次爬取结束

									    pageNo = raw_input("Input the page number you want to scratch (1-50),please input 'quit' if you want to quit\n\

									请输入总共需要爬取的页面，范围为（1-5000），如果退出，请输入Q>\n>")

									    while not pageNo.isdigit() or int(pageNo) > 5000 or int(pageNo) < 1:

									        if pageNo == 'Q':

									            myquit()

									        print "Param is invalid , please try again."

									        pageNo = raw_input("Input the page number you want to scratch >")

									    #循环遍历，爬取多页

									    for num in xrange(int(pageNo)):

									        start(str(num+1))

结果如下：

                        *****************************************
                        **    Welcome to Spider of GIF         **
                        **      Created on 2017-3-16           **
                        **      @author: Jimy                  **
                        *****************************************
Input the page number you want to scratch (1-50),please input 'quit' if you want to quit
请输入要爬取的页面，范围为（1-100），如果退出，请输入Q>
>1
1
http://www.zbjuran.com/dongtai/list_4_1.html
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/真是艰难的选择。3f0fe8f6-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为：http://www.zbjuran.com/uploads/allimg/170206/10-1F206135ZHJ.gif
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/这么贱会被打死吧……3fa9da88-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为：http://www.zbjuran.com/uploads/allimg/170206/10-1F206135H35U.gif
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/一看就是印度……4064e60c-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为：http://www.zbjuran.com/uploads/allimg/170206/10-1F20613543c50.gif
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/新垣结衣的正经工作脸414b4f52-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为：http://www.zbjuran.com/uploads/allimg/170206/10-1F206135250553.gif
动图存贮位置:Jimy/2017-3-16/搞笑动图/1/妹子这是在摇什么的421afa86-09f8-11e7-9161-f8bc12753d1e.gif
图片出处为：http://www.zbjuran.com/uploads/allimg/170206/10-1F20613493N03.gif
Input the page number you want to scratch (1-50),please input 'quit' if you want to quit
请输入总共需要爬取的页面，范围为（1-5000），如果退出，请输入Q>
>Q
Bye Bye!

最终就能够获得动态图了

希望本文所述对大家Python程序设计有所帮助。

原文链接：https://blog.csdn.net/qiqiyingse/article/details/62418857

秒客网

Python爬虫实现获取动态gif格式搞笑图片的方法示例

相关文章