HttpClient抓取网页内容简单介绍

版本HttpClient3.1

1、GET方式

第一步、创建一个客户端，类似于你用浏览器打开一个网页

HttpClient httpClient = new HttpClient();

第二步、创建一个GET方法，用来获取到你需要抓取的网页URL

GetMethod getMethod = new GetMethod("http://www.baidu.com");

第三步、获得网址的响应状态码，200表示请求成功

int statusCode = httpClient.executeMethod(getMethod);

第四步、获取网页的源码

byte[] responseBody = getMethod.getResponseBody();

主要就这四步，当然还有其他很多东西，比如网页编码的问题

 public static String spiderHtml() throws Exception {

         //URL url = new URL("http://top.baidu.com/buzz?b=1");

         HttpClient client = new HttpClient();

         GetMethod method = new GetMethod("http://top.baidu.com/buzz?b=1");        

         int statusCode = client.executeMethod(method);

         if(statusCode != HttpStatus.SC_OK) {

             System.err.println("Method failed: "  + method.getStatusLine());

         }

         byte[] body = method.getResponseBody();

         String html = new String(body,"gbk");


2、Post方式

1 HttpClient httpClient = new HttpClient();

        PostMethod postMethod = new PostMethod(UrlPath);

        postMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,new DefaultHttpMethodRetryHandler());

        NameValuePair[] postData = new NameValuePair[2];

        postData[0] = new NameValuePair("username", "xkey");

        postData[1] = new NameValuePair("userpass", "********");

        postMethod.setRequestBody(postData);

        try {

            int statusCode = httpClient.executeMethod(postMethod);

            if (statusCode == HttpStatus.SC_OK) {

                byte[] responseBody = postMethod.getResponseBody();

                String html = new String(responseBody);

                System.out.println(html);

            }

        } catch (Exception e) {  
            System.err.println("页面无法访问");

        }finally{

         postMethod.releaseConnection();

     } 

相关链接：http://blog.****.net/acceptedxukai/article/details/7030700

http://www.cnblogs.com/modou/articles/1325569.html

秒客网

HttpClient抓取网页内容简单介绍

相关文章