前言
最近一直在奋战爬虫,对我而言,我感觉比较好玩,刚开始也处于迷糊状态,但现在对于爬虫的基本流程也渐渐熟悉了,也没来得及整理,今天整理的是爬取京东手机Apple iPhone XS的评论。
首先
找到你所要爬取物品的页面,右键检查,如图所示:
然后按以下操作:
当你点击Network之后然后滑动商品页面并点击商品评论:你会发现多了几行数据,不用说多出来的这几行肯定有评论的数据,但是我们还是很难找到。
但是我们可以猜测一下商品评论的名字,一般情况下都是有comments,因此我们ctrl+f搜索:
点开之后发现这应该就是我们需要的数据了:
当你点开上面数据是:
看内容我们算是找对了
然后(开始请求和寻找cookie)
我们需要的就是红方块框柱的内容了。
**最后(我感觉可以直接上代码了)
**`
import requests
import json
import csv
from lxml import etree
comment_url = 'https://sclub.jd.com/comment/productPageComments.action?callback'
for i in range(10):
page = i
params = {
'productId': 100000287113, # 商品id,先写死
'score': 0,
'sortType': 5,
'page': page,
'pageSize': 10,
# 'callback': 'fetchJSON_comment98vv15262',
# 'isShadowSku': 0,
# 'fold': 1
}
headers = {
'cookie': 'shshshfpa=4e6c0f90-587c-a46f-5880-a7debd7d4393-1544616560; __jdu=1126324296; PCSYCityID=412; user-key=44089d07-befa-4522-87fc-bcc039ec7045; pinId=qopcdCj6kcR3U84v0KTTbrV9-x-f3wj7; pin=jd_769791719e9e9; unick=jd_769791719e9e9; _tp=nc%2FbpB%2BkeSbk3jZ6p2H0FlWrdUa1gbgi16QiQ7NBXKY%3D; _pst=jd_769791719e9e9; cn=9; ipLoc-djd=1-72-2799-0; mt_xid=V2_52007VwMSUVpaUV8cQR5sUWMDEgUIUVBGGEofWhliABNUQQtQWkpVHVVXb1ZGB1lYW11LeRpdBW4fElFBW1VLH0ESXgJsAhpiX2hSahxLGFsFZwcRUG1bWlo%3D; shshshfpb=bRnqa4s886i2OeHTTR9Nq6g%3D%3D; unpl=V2_ZzNtbUZTSxJ3DURTLk0LAmJXFVlKAkdAIQ1PUXseCVIzU0UKclRCFXwURldnGlUUZwcZXERcQRdFCHZXchBYAWcCGllyBBNNIEwHDCRSBUE3XHxcFVUWF3RaTwEoSVoAYwtBDkZUFBYhW0IAKElVVTUFR21yVEMldQl2VHsaWwdkBhFVRWdzEkU4dl17HVwDYDMTbUNnAUEpAUJRfRpcSGcDEVpAVEYWfQ92VUsa; __jda=122270672.1126324296.1544405080.1545968922.1545980857.16; __jdc=122270672; ceshi3.com=000; TrackID=11EpDXYHaqwJE15W6paeMk_GMm05o3NUUeze9XyIcFs33GGxX8knxMpxWTeID75qSiUlj31s8CtKJs4hJUV-7CvKuiOEyDd8bvOCH7zzigeI; __jdv=122270672|baidu-pinzhuan|t_288551095_baidupinzhuan|cpc|0f3d30c8dba7459bb52f2eb5eba8ac7d_0_55963436def64e659d5de48416dfeaff|1545980984854; 3AB9D23F7A4B3C9B=OA3G4SO3KYLQB6H3AIX36QQAW34BF376WJN66IUPEQAG6FUA2NWGM6R6MBDL32HLDG62WL2FICMYIVMOU6ISUWHKPE; shshshfp=1ed96ad08a7585648cd5017583df22bd; _gcl_au=1.1.162218981.1545981094; JSESSIONID=305879A97D4EA21F4D5C4207BB81423F.s1; shshshsID=c8c51ee0c5b1ddada7c5544abc3eea8a_5_1545981289039; __jdb=122270672.11.1126324296|16.1545980857; thor=3A30EBABA844934A836AC9AA37D0F4B85306071BD7FC64831E361A626E76F6977EC7794D06F2A922AEABF7D3D7DC22FBE2EB6B240F81A13F5A609368D4185BA0081D7C34A93760063D2F058F5B916835B4960EC8A9122008745971D812BA9E4AE48542CCC5A42E5CD786CC93770E520E36F950614C06A7EB05C8E1DD93EEA844B2EBA9B0136063FCFB6B7C83AECA828774041A9FED7BD98496689496122822FF',
'referer': 'https://item.jd.com/100000287113.html',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36'
}
comment_resp = requests.get(url=comment_url, params=params, headers=headers)
print(comment_resp.status_code)
# print(comment_resp.text)
comment_str = comment_resp.text
comment_dict = json.loads(comment_str)
comments = comment_dict['comments']
for comment in comments:
user = comment['nickname']
color = comment['productColor']
size = comment['productSize']
test = comment['content']
with open('JDcotent.csv', 'a', newline='') as f:
row = ('买家', '商品颜色', '版本', '评论')
writer = csv.writer(f)
writer.writerow(row)
# print(infor)
with open('JDcotent.csv', 'a', newline='')as csv_file:
rows = (user,color,size,test)
writer = csv.writer(csv_file)
writer.writerow(rows)
这是没有函数封装的代码(刚开始练手的)后续会添上整理好的函数封装并存入数据库的代码。