Python爬虫实现抓取京东店铺信息及下载图片功能示例

本文实例讲述了Python爬虫实现抓取京东店铺信息及下载图片功能。分享给大家供大家参考，具体如下：

这个是抓取信息的

				?

									from bs4 import BeautifulSoup

									import requests

									url = 'https://list.tmall.com/search_product.htm?q=%CB%AE%BA%F8+%C9%D5%CB%AE&type=p&vmarket=&spm=875.7931836%2FA.a2227oh.d100&from=mallfp..pc_1_searchbutton'

									response = requests.get(url)                          #解析网页

									soup = BeautifulSoup(response.text,'lxml')                   #.text将解析到的网页可读

									storenames = soup.select('#J_ItemList > div > div > p.productTitle > a')    #选择出商店的信息

									prices = soup.select('#J_ItemList > div > div > p.productPrice > em')     #选择出价格的信息

									sales = soup.select('#J_ItemList > div > div > p.productStatus > span > em')  #选择出销售额的信息

									for storename, price, sale in zip(storenames,prices,sales):

									  storename = storename.get_text().strip()   #用get_text()方法筛选出标签中的文本信息，由于筛选结果有换行符\n所以用strip()将换行符去掉

									  price = price.get_text()

									  sale = sale.get_text()

									  print('商店名:%-40s价格:%-40s销售额:%s'%(storename,price,sale))   #使打印出来的信息规范

									  print('----------------------------------------------------------------------------------------------')

这个是下载图片的

				?

									from bs4 import BeautifulSoup

									import requests

									import urllib.request

									url = 'https://list.tmall.com/search_product.htm?q=%CB%AE%BA%F8+%C9%D5%CB%AE&type=p&vmarket=&spm=875.7931836%2FA.a2227oh.d100&from=mallfp..pc_1_searchbutton'

									response = requests.get(url)

									soup = BeautifulSoup(response.text, 'lxml')

									imgs = soup.select('#J_ItemList > div > div > div.productImg-wrap > a > img')

									a = 1

									for i in imgs:

									  if(i.get('src')==None):

									    break

									  img = 'http:'+i.get('src') #这里废了好长的时间，原来网站必须要有http：的

									  #print(img)

									  urllib.request.urlretrieve(img,'%s.jpg'%a, None,)

									  a = a+1

ps:

1.选择信息的时候用css

2.用get_text()方法筛选出标签中的文本信息

3.strip，lstrip，rstrip的用法：

Python中的strip用于去除字符串的首尾字符；同理，lstrip用于去除左边的字符；rstrip用于去除右边的字符。

这三个函数都可传入一个参数，指定要去除的首尾字符。

需要注意的是，传入的是一个字符数组，编译器去除两端所有相应的字符，直到没有匹配的字符，比如：

				?

									theString = 'saaaay yes no yaaaass'

									print theString.strip('say')

theString依次被去除首尾在['s'，'a'，'y']数组内的字符，直到字符在不数组内。所以，输出的结果为：

yes no

比较简单吧，lstrip和rstrip原理是一样的。

注意：当没有传入参数时，是默认去除首尾空格和换行符的。

				?

									theString = 'saaaay yes no yaaaass'

									print theString.strip('say')

									print theString.strip('say ') #say后面有空格

									print theString.lstrip('say')

									print theString.rstrip('say')

运行结果：

yes no
es no
yes no yaaaass
saaaay yes no

希望本文所述对大家Python程序设计有所帮助。

原文链接：https://blog.csdn.net/qq_35661436/article/details/52180399

秒客网

Python爬虫实现抓取京东店铺信息及下载图片功能示例

相关文章