I'm currently trying to scrape amazon for a bunch of data. I'm using jsoup to help me do this, and everything has gone pretty smoothly, but for some reason I can't figure out how to pull the current number of sellers selling new products.
我目前正在尝试为亚马逊搜索大量数据。我正在使用jsoup来帮助我做到这一点,而且一切都很顺利,但由于某种原因,我无法弄清楚如何拉出目前销售新产品的卖家数量。
Here's an example of the url I'm scraping : http://www.amazon.com/dp/B006L7KIWG
以下是我正在抓取的网址示例:http://www.amazon.com/dp/B006L7KIWG
I want to extract "39 new" from the following below:
我想从以下提取“39 new”:
<div id="secondaryUsedAndNew" class="mbcOlp">
<div class="mbcOlpLink">
<a class="buyAction" href="/gp/offer-listing/B006L7KIWG/ref=dp_olp_new_mbc?ie=UTF8&condition=new">
39 new
</a> from
<span class="price">$60.00</span>
</div>
</div>
This project is the first time I've used jsoup, so the coding may be a bit iffy, but here are some of the things I have tried:
这个项目是我第一次使用jsoup,所以编码可能有点不确定,但这里有一些我尝试过的东西:
String asinPage = "http://www.amazon.com/dp/" + getAsin();
try {
Document document = Jsoup.connect(asinPage).timeout(timeout).get();
.....
//get new sellers try one
Elements links = document.select("a[href]");
for (Element link : links) {
// System.out.println("Span olp:"+link.text());
String code = link.attr("abs:href");
String label = trim(link.text(), 35);
if (label.contains("new")) {
System.out.println(label + " : " + code);
}
}
//get new sellers try one
Elements links = document.select("div.mbcOlpLink");
for (Element link : links) {
// System.out.println("Span olp:"+link.text());
}
//about a million other failed attempts that you'll just have to take my word on.
I've been successful when scrape everything else I need on the page, but for some reason this particular element is being a pain, any help would be GREAT! Thanks guys!
当我在页面上刮掉我需要的所有东西时,我已经成功了,但由于某种原因,这个特殊元素是一个痛苦,任何帮助都会很棒!多谢你们!
1 个解决方案
#1
0
I would use
我会用
String s = document.select("div[id=secondaryUsedAndNew] a.buyAction").text.replace(" "," ");
This should leave you "42 new" as it says on the page at this moment.
这应该会让你在此时在页面上显示“42 new”。
Hope this works for you!
希望这对你有用!
#1
0
I would use
我会用
String s = document.select("div[id=secondaryUsedAndNew] a.buyAction").text.replace(" "," ");
This should leave you "42 new" as it says on the page at this moment.
这应该会让你在此时在页面上显示“42 new”。
Hope this works for you!
希望这对你有用!