我最近做了一组关于京东、天猫、淘宝、阿里巴巴、苏宁、国美、考拉电商数据搜索提供,用到的技术有java+xpath(爬虫相关技术)+springboot,就这两个打算做一个自己随便用用,随便比比赛,虽然我早就意料到网上有类似的东西。不足之处没有多线程处理还有一些细枝末节的东西都没有顾及到。尽力就好,何况也没尽力。
- 京东:
成果:
-
问题:
京东的动态加载,它会现在加载大概三十个,接着再次加载三十个,我的方案是加上几个传递参数,url如下:
https://search.jd.com/Search?keyword="+question+"&enc=utf-8&qrst=1&rt=1&stop=1&vt=2&stock=1&page="+n+"&s="+(1+(n-1)*30)+"&click=0&scrolling=y
其中n为起始商品数,如果点击第二页就是第31个商品开始,其中page为页数,这样就能把所有搜索到的商品都加载进去了。
只贴部分代码:
List<item> item = new ArrayList<>(); Document doc = Jsoup.parse(page.getHtml()); //System.out.println("doerall"+doc); // String all = "//li[contains(@class,'gl-item')]"; //String titleXpath = "div/div[@class='p-price']/strong/i/text()"; // String timeXpath = "//*[@id='page-tools']/span/span[position() = 1]"; List<Element> elements = doc.getElementsByClass("gl-item"); for (Element element : elements) { item item1 = new item(); //System.out.println(element.html()); item1.setItemSellpoint(JsoupParserUtils.getXpathString(element, "//div/div[@class='p-name p-name-type-2']/a/i/text()")); item1.setItemName(JsoupParserUtils.getXpathString(element, "//div/div[@class='p-name p-name-type-2']/a/em/text()")); item1.setPrice(JsoupParserUtils.getXpathString(element, "//div/div[@class='p-price']/strong/i/text()")); item1.setImages("https:"+JsoupParserUtils.getXpathString(element, "//div[@class='gl-i-wrap']/div[@class='p-img']/a/img/@source-data-lazy-img")); item1.setShopName( element.getElementsByClass("p-shop").text()); item1.setShopUrl( "https:"+element.getElementsByClass("curr-shop").attr("href")); if (item1.getShopName().equals("")){ item1.setShopName("京东自营"); item1.setShopUrl("https://www.jd.com"); } item1.setEcName(dianshang); Date date = new Date(); SimpleDateFormat sdft = new SimpleDateFormat("yyyy-mm-dd HH:mm:ss"); item1.setUpdateTime(sdft.format(date)); if (JsoupParserUtils.getXpathString(element, "//div[@class='gl-i-wrap']/div[@class='p-img']/a/@href").length() > 50) { item1.setItemUrl(JsoupParserUtils.getXpathString(element, "//div[@class='gl-i-wrap']/div[@class='p-img']/a/@href")); } else { item1.setItemUrl("https:" + JsoupParserUtils.getXpathString(element, "//div[@class='gl-i-wrap']/div[@class='p-img']/a/@href")); } item.add(item1); System.out.println("\n\n\n\n"); } System.out.println("jd success\n"); return item;
- 阿里:
成果:
问题:
一开始我是通过web电脑端查看它的数据加载网页,然后通过xpath解析,但是他们反爬机制让我隔天就要换一个cookies,所以我通过web app端通过查看network找到直接返回json的url。url如下:
https://m.p4psearch.1688.com/chord/scene.html?q=%E9%98%BF%E9%87%8C%E5%B7%B4%E5%B7%B4%E7%BD%91%E7%AB%99%E6%89%B9%E5%8F%91&cosite=baidujj&trackid=4014000012730004&format=normal&_version=&pagesize=20&beginpage="+n+"&sortType=&scene=WuxianOfferResult&location=landing_t3&v=1&ie=utf-8&prodid=163&pid=&fcatid=&p4pid=1554724111293181203364&keywords="+question
其中question为查询keyword,n为页数。
2 天猫:
成果:
天猫还是挺厉害的,反爬虫做的很好,可以通过检测你所带的请求头检测,并且还可以检测异常行为,如果你用同一个ip一直访问同一个搜索词,那么将自动送你机票到登陆界面,或者自动给你一个滑动界面滑动,检测你是否是人类行为,因为爬虫并不能滑动模块,当然爬虫也可以模仿浏览器行为滑动,但是总的来说那样代价就太大了。
所以我的解决方案就是利用请求头,用自己的检测完成以后的cookies,并且实施ip池来变化ip实现爬虫模仿认类行为。
代码:
package com.zz.search.crawl.page;
import org.apache.commons.httpclient.DefaultHttpMethodRetryHandler;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.httpclient.params.HttpMethodParams;
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
public class RequestAndResponseTool {
public static Page sendRequstAndGetResponse(String url,String dianshang) {
Page page = null;
// 1.生成 HttpClinet 对象并设置参数
HttpClient httpClient = new HttpClient();
// 设置 HTTP 连接超时 5s
httpClient.getHttpConnectionManager().getParams().setConnectionTimeout(5000);
// 2.生成 GetMethod 对象并设置参数
GetMethod getMethod = new GetMethod(url);
//zz设置请求头
Date date = new Date();
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("");
System.out.println(date);
//cna=xc8QFScq3mMCAXWIdnNklw/5; x=__ll%3D-1%26_ato%3D0; enc=22D2pCkbDgD4j4NI690F1syj2pzcmVODKNelTBhnJFSbQKa86y3R4gP2f957TU49KrG4i8Z8A0GZ8WP3yEz0%2BQ%3D%3D; _med=dw:1920&dh:1080&pw:1920&ph:1080&ist:0; otherx=e%3D1%26p%3D*%26s%3D0%26c%3D0%26f%3D0%26g%3D0%26t%3D0; tk_trace=1; hng=CN%7Czh-CN%7CCNY%7C156; t=2feeba33aa8d109a344fc80c085e942e; lid=%E5%85%8B%E6%8B%89%E5%A4%AB%E5%93%88%E8%8B%8F%E4%B8%9C%E5%9D%A1; _tb_token_=e687eb397a673; cookie2=1a050f7d908b3b750603a0c1a47df435; tt=tmall-main; pnm_cku822=098%23E1hv%2BvvUvbpvUvCkvvvvvjiPRLFwtjnCPssyljljPmPW6j1nP2Fw1jDvPsqy6j3WvphvCyCCvvvvvbyCvm3vpvvvvvCvphCvjvUvvhP7phvwv9vvBj1vpCQmvvChpyCvjvUvvhBmuphvmhCvC8evVczpkphvCyEmmvo4e9yCvh1CVfQvIqU3o5%2BO3w0AhjEmJDKXlLJ1nH6Sp42EHFiihFnhiaV1nV9w4B8n3feAOHCTmEcBKFyK2kyZD70wd5QXVAtlK24Abyy6cPs92QhvCvvvMMGtvpvhphvvv8wCvvBvpvpZ; res=scroll%3A1899*5994-client%3A1899*917-offset%3A1899*5994-screen%3A1920*1080; cq=ccp%3D1; isg=BOnpwpgOkbToia07swi_dp8F7JVJNmK-_aISFYveTlBJUgtk0weRvaMEELRBFXUg; l=bBO58m_qvAtE67oMBOCwqZZ49EbTALRb6uWbggHei_5CF19fmY_OlML0Le96VjCP9iTB4QAn21ytieD4rzkf.
if (dianshang.equals("tm")){
getMethod.setRequestHeader("cookie","");
getMethod.setRequestHeader("user-agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36");
//getMethod.setRequestHeader("refer","https://list.tmall.com/search_product.htm?q=shouji+&type=p&vmarket=&spm=875.7931836%2FB.a2227oh.d100&from=mallfp..pc_1_searchbutton");
//Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Mobile Safari/537.36
//enc
}
if (!dianshang.equals("tm")) {
getMethod.setRequestHeader("user-agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36");
}
//zz设置编码格式Content-Encoding →gzip
//getMethod.setRequestHeader("Content-Encoding","GBKs");
// 设置 get 请求超时 5s
getMethod.getParams().setParameter(HttpMethodParams.SO_TIMEOUT, 5000);
// 设置请求重试处理
getMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new DefaultHttpMethodRetryHandler());
// 3.执行 HTTP GET 请求
try {
int statusCode = httpClient.executeMethod(getMethod);
// 判断访问的状态码
if (statusCode != HttpStatus.SC_OK) {
System.err.println("Method failed: " + getMethod.getStatusLine());
}
// 4.处理 HTTP 响应内容
byte[] responseBody = getMethod.getResponseBody();// 读取为字节 数组
if (dianshang.equals("al")){
page = new Page(responseBody,url,null); //封装成为页面
}
else {
String contentType = getMethod.getResponseHeader("Content-Type").getValue(); // 得到当前返回类型
page = new Page(responseBody,url,contentType); //封装成为页面
}
} catch (HttpException e) {
// 发生致命的异常,可能是协议不对或者返回的内容有问题
System.out.println("Please check your provided http address!");
e.printStackTrace();
} catch (IOException e) {
// 发生网络异常
e.printStackTrace();
} finally {
// 释放连接
getMethod.releaseConnection();
}
return page;
}
}
这里的cookies 要自己用浏览器打开天猫,接着搜索,右键查看network,查看此网页请求头,然后添加上去就行了。
数据处理
Document doc = Jsoup.parse(page.getHtml());
List<item> item = new ArrayList<>();
//System.out.println(doc);
//筛选出商品列表/ssss
//String all = "//html/body[@class='pg']/div[@class='page']/div[@id='mallPage']/div[@id='content']/div[@class='main bts-61 ']/div[@id='J_ItemList']/div";
List<Element> elements = doc.getElementsByClass("product-iWrap");
// System.out.println(elements);
//计数天猫在动态加载功能,前只加载前五个数据,后五个数据的html结构变化
int i=1;
for (Element element:
elements) {
System.out.println(element);
item item1 = new item();
if (i<=5) {
//图片地址
//System.out.println(element.getElementsByClass("productImg-wrap"));
if (!element.getElementsByClass("productImg-wrap").equals("")) {
//System.out.println("https:" + element.getElementsByClass("productImg-wrap").get(0).getElementsByTag("img").attr("src"));
item1.setImages("https:" + element.getElementsByClass("productImg-wrap").get(0).getElementsByTag("img").attr("src"));
}
//商家地址
//System.out.println("https:"+element.getElementsByClass("productShop-name").get(0).attr("href"));
item1.setShopUrl("https:"+element.getElementsByClass("productShop-name").get(0).attr("href"));
//商家名字
//System.out.println(element.getElementsByClass("productShop-name").get(0).text());
item1.setShopName(element.getElementsByClass("productShop-name").get(0).text());
//价格
//System.out.println(element.getElementsByClass("productPrice").get(0).text());
item1.setPrice(element.getElementsByClass("productPrice").get(0).text());
//商品地址productTitle productTitle-spu
//System.out.println("https:"+element.getElementsByClass("productTitle").get(0).getElementsByTag("a").attr("href"));
item1.setItemUrl("https:"+element.getElementsByClass("productTitle").get(0).getElementsByTag("a").attr("href"));
//商品名字\买点
//System.out.println(element.getElementsByClass("productTitle productTitle-spu").text());
item1.setItemName(element.getElementsByClass("productTitle productTitle-spu").text());
//商品月销量
//System.out.println(element.getElementsByClass("productStatus").text());
item1.setBuyNum(element.getElementsByClass("productStatus").text());
//电商名字
item1.setEcName("tm");
//更新时间
Date date = new Date();
SimpleDateFormat sdft = new SimpleDateFormat("yyyy-mm-dd HH:mm:ss");
item1.setUpdateTime(sdft.format(date));
// System.out.println("asdas"+element);
i++;
System.out.println("\n\n\n\n\n");
}
else{
//商品图片
//System.out.println(i+""+element.getElementsByClass("productImg-wrap"));
//System.out.println(i+""+element.hasClass("productImg-wrap"));
if (!element.getElementsByClass("productImg-wrap").equals("")) {
System.out.println("https:" + element.getElementsByClass("productImg-wrap").get(0).getElementsByTag("img").attr("data-ks-lazyload"));
item1.setImages("https:" + element.getElementsByClass("productImg-wrap").get(0).getElementsByTag("img").attr("data-ks-lazyload"));
}
//商家地址
//System.out.println("https:"+element.getElementsByClass("productShop-name").get(0).attr("href"));
item1.setShopUrl("https:"+element.getElementsByClass("productShop-name").get(0).attr("href"));
//商家名字
//System.out.println(element.getElementsByClass("productShop-name").get(0).text());
item1.setShopName(element.getElementsByClass("productShop-name").get(0).text());
//价格
//System.out.println(element.getElementsByClass("productPrice").get(0).text());
item1.setPrice(element.getElementsByClass("productPrice").get(0).text());
//商品地址productTitle productTitle-spu
//System.out.println("https:"+element.getElementsByClass("productTitle").get(0).getElementsByTag("a").attr("href"));
item1.setItemUrl("https:"+element.getElementsByClass("productTitle").get(0).getElementsByTag("a").attr("href"));
//商品名字\买点
//System.out.println(element.getElementsByClass("productTitle productTitle-spu").text());
item1.setItemName(element.getElementsByClass("productTitle productTitle-spu").text());
//商品月销量
//System.out.println(element.getElementsByClass("productStatus").text());
item1.setBuyNum(element.getElementsByClass("productStatus").text());
//电商名字
item1.setEcName("tm");
//更新时间
Date date = new Date();
SimpleDateFormat sdft = new SimpleDateFormat("yyyy-mm-dd HH:mm:ss");
item1.setUpdateTime(sdft.format(date));
// System.out.println(element);
System.out.println("\n\n\n\n\n");
}
item.add(item1);
//https://img.alicdn.com/bao/uploaded/i8/TB1LPeMDRLoK1RjSZFuLG8n0XXa_043355.jpg
}
return item;
至于浏览url这里就不贴了。
4 淘宝:
成果:
问题:
遇到了无法获取script通过封装的方法解决了,同样如果你淘宝也遇到了给你飞机票到登陆界面,你也可以添加cookies来避免这种行为。
Document doc = Jsoup.parse(page.getHtml());
List<item> item = new ArrayList<>();
// System.out.println(page.getHtml());
//筛选出商品列表
List<Element> elements1 = doc.getElementsByTag("script");
Elements e = doc.getElementsByTag("script").eq(7);
String sc = e.html();
// System.out.println(sc);
String[] it = sc.split("}};");
String it1 = it[0] + "}}";
System.out.println(it1);
it1 = it1.substring(16);
System.out.println(it1);
//数据处理完成
try {
JSONObject obj = new JSONObject(it1);
JSONObject obj1 = obj.getJSONObject("mods");
JSONObject obj2 = obj1.getJSONObject("itemlist");
JSONObject obj3 = obj2.getJSONObject("data");
JSONArray jarry = obj3.getJSONArray("auctions");
//json解析完成
for (int i = 0; i < jarry.length(); i++) {
item item1 = new item();
//商品名字\买点
item1.setItemName(jarry.getJSONObject(i).getString("raw_title"));
System.out.println(jarry.getJSONObject(i).getString("raw_title"));
//商品图片路由
item1.setImages("https:" + jarry.getJSONObject(i).getString("pic_url"));
System.out.println("https:"+jarry.getJSONObject(i).getString("pic_url"));
//商品路由
item1.setItemUrl("https:" + jarry.getJSONObject(i).getString("detail_url"));
System.out.println("https:"+jarry.getJSONObject(i).getString("detail_url"));
//商品价格
item1.setPrice(jarry.getJSONObject(i).getString("view_price"));
System.out.println(jarry.getJSONObject(i).getString("view_price"));
//商品购买数量
item1.setBuyNum(jarry.getJSONObject(i).getString("view_sales"));
System.out.println(jarry.getJSONObject(i).getString("view_sales"));
//店家名字
item1.setShopName(jarry.getJSONObject(i).getString("nick"));
System.out.println(jarry.getJSONObject(i).getString("nick"));
//店家路由
item1.setShopUrl("https:" + jarry.getJSONObject(i).getString("shopLink"));
System.out.println("https:"+jarry.getJSONObject(i).getString("shopLink"));
//时间
Date date = new Date();
SimpleDateFormat sdft = new SimpleDateFormat("yyyy-mm-dd HH:mm:ss");
item1.setUpdateTime(sdft.format(date));
//发货地址
System.out.println(item1.getLocal());
item1.setLocal(jarry.getJSONObject(i).getString("shopLink"));
//电商名字
item1.setEcName("tb");
//System.out.println(jarry.getJSONObject(i).getString(""));
System.out.println("\n\n\n\n\n");
item.add(item1);
}
} catch (JSONException e1) {
e1.printStackTrace();
}
return item;
总的来说就是通过处理script种的json来获取 。
5 苏宁:
成果:
问题找不到价格,通过普通的url访问,并没有返回价格,通过长时间的搜索,找到一个返回json 的url
https://search.suning.com/emall/mobile/wap/clientSearch.jsonp?keyword="+question+"&cp="+(n-1)+"&ps=30&set=5&ct=-1&channelId=WAP&sp=&sg=&sc=&prune=&operate=0&isAnalysised=0&istongma=1&v=99999999&sesab=ABB&&jzq=1535&callback=success_jsonpCallback
这个是url,其中question是搜索内容n代表页数。
List<item> item = new ArrayList<>();
Document doc = Jsoup.parse(page.getHtml());
String json = doc.body().text();
json = json.substring(22);
json = json.replace(");", "");
//System.out.println(json);
//json 处理完成
JSONObject obj = new JSONObject(json);
JSONArray jarry = obj.getJSONArray("goods");
//System.out.println(jarry);
for (int i = 0; i < jarry.length(); i++) {
item item1 = new item();
//System.out.println(i);
//title
//System.out.println(jarry.getJSONObject(i).getString("catentdesc"));
item1.setItemName(jarry.getJSONObject(i).getString("catentdesc"));
//sellPoint
//System.out.println(jarry.getJSONObject(i).getString("auxdescription"));
item1.setItemSellpoint(jarry.getJSONObject(i).getString("auxdescription"));
//comment
//System.out.println(jarry.getJSONObject(i).getJSONObject("extenalFileds").getString("commentShow"));
item1.setBuyNum("评价数:"+jarry.getJSONObject(i).getJSONObject("extenalFileds").getString("commentShow")+" 好评率"+jarry.getJSONObject(i).getString("praiseRate"));
//price!!!!!!!!!!!!!!!!!!!!!!!!
//System.out.println(jarry.getJSONObject(i).getString("price"));
item1.setPrice("¥"+jarry.getJSONObject(i).getString("price"));
//picUrl
if (!jarry.getJSONObject(i).isNull("dynamicImg")) {
//System.out.println("http:" + jarry.getJSONObject(i).getString("dynamicImg"));
item1.setImages("https:" + jarry.getJSONObject(i).getString("dynamicImg"));
}
else {
item1.setImages("/images/ZO.png");
}
//shopName
//System.out.println(jarry.getJSONObject(i).getString("salesName"));
if (jarry.getJSONObject(i).getString("salesName").equals("苏宁自营")) {
item1.setShopUrl("");
} else {
item1.setShopUrl(jarry.getJSONObject(i).getJSONObject("extenalFileds").getString("specificUrl"));
}
item1.setShopName(jarry.getJSONObject(i).getString("salesName"));
//itemUrl
//System.out.println("https://product.suning.com/"+jarry.getJSONObject(i).getString("salesCode")+"/"+jarry.getJSONObject(i).getString("catentryId")+".html");
item1.setItemUrl("https://product.suning.com/" + jarry.getJSONObject(i).getString("salesCode") + "/" + jarry.getJSONObject(i).getString("catentryId") + ".html");
//ec
item1.setEcName("sn");
//time
Date date = new Date();
SimpleDateFormat sdft = new SimpleDateFormat("yyyy-mm-dd HH:mm:ss");
item1.setUpdateTime(sdft.format(date));
item.add(item1);
}
return item;
单纯的处理json数据。
6 国美:
成果:
问题:
国美没有翻页键所以你要通过webapp端的url进行分析然后得到url
https://m.gome.com.cn/category.html?from=1&scat=2&key_word="+question+"&page="+n+"&plsj_flag=N&sort=10
数据处理代码如下:
List<item> item = new ArrayList<>();
Document doc = Jsoup.parse(page.getHtml());
// System.out.println();
List<Element> elements = doc.getElementsByClass("gd_list");
int i = 0;
List<item> items = new ArrayList<>();
for (Element e :
elements) {
item item1 = new item();
//System.out.println(e+"\n\n\n\n");
//pic
//System.out.println("https:"+e.getElementsByTag("img").attr("src"));
item1.setImages("https:" + e.getElementsByTag("img").attr("src"));
//itemUrl
//System.out.println("https:"+e.getElementsByClass("a-mask").attr("href").split("\\?")[0].replace("product-","").replace(".m",""));
item1.setItemUrl("https:" + e.getElementsByClass("a-mask").attr("href").split("\\?")[0].replace("product-", "").replace(".m", ""));
//price
//System.out.println(e.getElementsByClass("price_warp").text());
item1.setPrice(e.getElementsByClass("price_warp").text());
//title
System.out.println(e.getElementsByClass("title ellipsis-one").text());
item1.setItemName(e.getElementsByClass("title ellipsis-one").text());
if (item1.getItemName().equals("")){
item1.setItemName(e.getElementsByClass("title ellipsis_two").text());
}
//comment
//System.out.println(e.getElementsByClass("cmt").text());
item1.setBuyNum(e.getElementsByClass("cmt").text());
//sellPoint
System.out.println(e.getElementsByClass("sell-point").text());
item1.setItemSellpoint(e.getElementsByClass("sell-point").text());
//ec
item1.setEcName("gm");
//time
Date date = new Date();
SimpleDateFormat sdft = new SimpleDateFormat("yyyy-mm-dd HH:mm:ss");
item1.setUpdateTime(sdft.format(date));
item.add(item1);
// System.out.println(e);
//System.out.println(i++);
//System.out.println(e.html());
// System.out.println("\n\n\n\n\n");
}
return item;
7 考拉:
url:
https://search.kaola.com/search.html?key="+question+"&pageNo="+n+"&type=&pageSize=20&isStock=false&isSelfProduct=false
没啥问题
代码处理:
List<item> item = new ArrayList<>();
Document doc = Jsoup.parse(page.getHtml());
// System.out.println(doc);
List<Element> elements = doc.getElementsByClass("goods colorsku");
int i = 0;
for (Element e :
elements) {
item item1 = new item();
//System.out.println(e.html());
//pic
System.out.println("http:"+e.getElementsByTag("img").attr("data-src"));
item1.setImages("https:" + e.getElementsByTag("img").attr("data-src"));
//itemUrl
//System.out.println("https:"+e.getElementsByClass("title").attr("href"));
item1.setItemUrl("https:" + e.getElementsByClass("title").attr("href"));
//price
//System.out.println(e.getElementsByClass("marketprice").text());
item1.setPrice(e.getElementsByClass("marketprice").text());
//title
//System.out.println(e.getElementsByTag("img").attr("alt"));
item1.setItemName(e.getElementsByTag("img").attr("alt"));
//comment
//System.out.println(e.getElementsByClass("comments").text());
item1.setBuyNum(e.getElementsByClass("comments").text());
//sellPoint
//System.out.println(e.getElementsByClass("sell-point").text());
//item1.setItemSellpoint(e.getElementsByClass("sell-point").text());
//localtion
//System.out.println(e.getElementsByClass("proPlace ellipsis").text());
item1.setLocal(e.getElementsByClass("proPlace ellipsis").text());
//shop
//System.out.println(e.getElementsByClass("selfflag").text());
if (e.getElementsByClass("selfflag").text().equals("网易考拉自营")) {
item1.setShopName(e.getElementsByClass("selfflag").text());
item1.setShopUrl("");
} else {
//System.out.println("https:"+e.getElementsByClass("selfflag").get(0).getElementsByTag("a").attr("href"));
item1.setShopName(e.getElementsByClass("selfflag").text());
item1.setShopUrl("https:" + e.getElementsByClass("selfflag").text());
}
//item1.setShopName(e.getElementsByClass("comments").text());
//ec
item1.setEcName("kl");
//time
Date date = new Date();
SimpleDateFormat sdft = new SimpleDateFormat("yyyy-mm-dd HH:mm:ss");
item1.setUpdateTime(sdft.format(date));
item.add(item1);
//System.out.println(i++);
//System.out.println("\n\n\n\n\n\n\n\n\n");
//item.add(item1);
}
return item;