使用Eclipse+httpClient+Jsoup读取网页数据-初级

本人最近几天学习使用HttpClient包读取网页上的数据，运行博客http://ducaijun.iteye.com/blog/1335453上的例子：

1.首先打开Eclipse,File->New->Java Project,生成一个Java工程;

2. 然后下载HttpClient这个Java包和Jsoup这个Java包,然后将他们导入自己建立的工程。根据版本不同可能还需要commons-codec,commons-logging,httpcore等Java包;

3. 右键工程，New->Class,添加一个类，类名取为JustTest （与例http://ducaijun.iteye.com/blog/1335453中类同），将例程中代码拷贝进入类文件;

这里或许可以将网页上的代码copy进txt存成文件，然后作为现成的类直接导入工程中。但是偶是新手，暂时还不会。所以用笨方法。

4. 可能会因为HttpClient等包的版本问题出现一些红叉叉，提示不能识别一些标识符。这个时候我的笨方法就俩，(1)使用最高版本HttpClient\HttpCore\Jsoup，把.jar解压完,然后直接搜不能识别的类，如果搜到就把所在的package路径 import进入JustTest中;(2)遇到上述无法解决或者提示函数已经废弃不用的，就按照提示去谷歌里面搜，一般能找到解决方案的.

5. 如下显示代码:

package testLoadHttp;

import org.apache.http.HttpEntity;
import org.apache.http.HttpStatus;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.apache.http.client.methods.CloseableHttpResponse;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class JustTest {

public static String getHtmlByUrl(String url) {
String html = null;
CloseableHttpClient httpClient = HttpClients.createDefault();

          HttpGet httpget = new HttpGet(url);// 以get方式请求该URL
          try {
          CloseableHttpResponse responce = httpClient.execute(httpget);
          int resStatu = responce.getStatusLine().getStatusCode();// 返回码
               if (resStatu == HttpStatus.SC_OK)
               {
                  HttpEntity entity = responce.getEntity();
                   if (entity != null)
                   {
                       html = EntityUtils.toString(entity);// 获得html源代码
                      System.out.println(html);
                  }
               }
         } catch (Exception e) {
             System.out.println("访问【" + url + "】出现异常!");
              e.printStackTrace();
          } finally {
              httpClient.getConnectionManager().shutdown();
         }
         // httpClient.close();

          return html;
     }

public static void main(String[] args) {
          String html = getHtmlByUrl("http://www.iteye.com/");
          if (html != null && !"".equals(html))
          {
              Document doc = Jsoup.parse(html);
              {
               String str0="div#page>div#content.clearfix>div#local>div#recommend>ul>li>a";
               Elements linksElements = doc.select(str0);
               for (Element ele : linksElements)
               {
                   String href = ele.attr("href");
                   String title = ele.text();
                         System.out.println(href + "," + title);
               }
              }

              {

               String str0="div#page>div#content.clearfix>div#local>div#recommend>ul>li";
               Elements linksElements = doc.select(str0);
               for (Element ele : linksElements)
               {
                  String href = ele.attr("href");
                  String target=ele.attr("target");
                  String title=ele.attr("title");
                  System.out.println("href:"+href + ",target:" + target+",title:"+title);
               }
              }
         }