使用Jsoup更好的方法

时间:2022-10-31 09:14:56

I started palying with JSoup today.So as an example I started with scraping proxies from this site.

我今天开始使用JSoup。所以我开始从这个站点抓取代理。

After playing a lot , I was able to scrape the proxies, but without their port numbers as they were using JavaScript. I wanted to know can we scrape those port numbers also with JSoup. As this was my first attempt, I wanted to know if the approach I took was right. So I am posting the code that could fetch proxies.

在玩了很多之后,我能够抓住代理,但没有他们使用JavaScript的端口号。我想知道我们是否可以使用JSoup来删除这些端口号。由于这是我的第一次尝试,我想知道我采取的方法是否正确。所以我发布了可以获取代理的代码。

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.jsoup.safety.Whitelist;

public class ListLinks
{
    public static void main(String[] args)
    {
        try
        {
            Document doc = Jsoup.connect("http://www.samair.ru/proxy/socks01.htm").get();
            Elements content = doc.select("table.tablelist tbody tr ");
            for(Element com: content)
            {
                Element fi=com.select("td").first();
                String e=fi.text();
                String safe=Jsoup.clean(e,Whitelist.basic());
                System.out.println(safe);
            }

        }
        catch(Exception e)
        {
          System.out.print("Problem");
        }
    }

}

1 个解决方案

#1


0  

Yes, your approach is ok.

是的,你的方法还可以。

But one thing: There's no need for String safe=Jsoup.clean(e,Whitelist.basic());
since String e=fi.text(); will give you a clean string.

但有一件事:不需要String safe = Jsoup.clean(e,Whitelist.basic());因为String e = fi.text();会给你一个干净的字符串。

#1


0  

Yes, your approach is ok.

是的,你的方法还可以。

But one thing: There's no need for String safe=Jsoup.clean(e,Whitelist.basic());
since String e=fi.text(); will give you a clean string.

但有一件事:不需要String safe = Jsoup.clean(e,Whitelist.basic());因为String e = fi.text();会给你一个干净的字符串。