1.根据给定的网址获取网页源代码
2.利用正则表达式把源代码中的图片地址过滤出来
3.根据过滤出来的图片地址下载网络图片
<span style="font-size:14px;">#coding=UTF-8 import urllib2 import urllib import re def getHtml(url): page = urllib.urlopen(url) html = page.read() return html def getJpg(html): reg = r'"largeTnImageUrl":"(.+?\.jpg)",' imgre = re.compile(reg) imglist = re.findall(imgre, html) x = 0 for imgurl in imglist: print imgurl #urllib.urlretrieve(imgurl, 'D:/test/%s.html' % x) header = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11', 'Accept':'text/html;q=0.9,*/*;q=0.8', 'Accept-Charset':'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'Accept-Encoding':'gzip', 'Connection':'close', 'Referer':'http://image.baidu.com/i?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=result&fr=&sf=1&fmq=1425134407244_R&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=%E6%AF%94%E5%9F%BA%E5%B0%BC%E7%BE%8E%E5%A5%B3&f=3&oq=b&rsp=-1' } timeout = 30 request = urllib2.Request(imgurl,None,header) response = urllib2.urlopen(request,None,timeout) str = response.read() foo = open("D:/test/%s.gif" % x,"wb") foo.write(str) foo.close() x += 1 html = getHtml('http://image.baidu.com/i?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=result&fr=&sf=1&fmq=1425134407244_R&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=%E6%AF%94%E5%9F%BA%E5%B0%BC%E7%BE%8E%E5%A5%B3&f=3&oq=b&rsp=-1') print getJpg(html)</span>
博客实例参考:http://www.cnblogs.com/fnng/p/3576154.html
在参考博客实现实例的过程中也碰到一些问题,图片下载下来了,却无法预览,经过多方分析,发现是百度图片设置了反抓取,为了解决这个问题,我又参考了另外一篇博文
模拟浏览器发请求,骗过服务器反抓取:http://www.oschina.net/question/114640_162399?sort=time