使用Java获取页面源只读取第一行

时间:2023-01-18 23:25:48

I have never used java in my life,But i am very good in php , I want to get page source of an website.But i am using Appspot(GAE) .Where file_get_contents and Curl is not working.So i want to get page source via java.I learnt some basics of java and found below code , But below code is getting only 1st line of external page.Please guide me where i am wrong.

我从来没有在我的生活中使用过java,但我在php中非常好,我想获得一个网站的页面源。但我正在使用Appspot(GAE)。所以file_get_contents和Curl不工作。所以我想得到页面通过java学习。我学习了java的一些基础知识并在下面的代码中找到了,但是下面的代码只获得了第一行的外部页面。请指导我哪里错了。

<?php

function get($url){

        import java.net.URL;
        import java.io.BufferedReader;
        import java.io.InputStreamReader;

        $java_url = new URL($url);
        $java_bufferreader = new BufferedReader(new InputStreamReader($java_url->openStream()));

        while (($line = $java_bufferreader->readLine()) != null) {
            $content .= $line;
        }

        return $content;
}


echo get("http://domain.com");

?>

For example , if i scrape *.com its returning only below code

例如,如果我刮掉*.com它只返回下面的代码

<!DOCTYPE html><html><head>        <title>Stack Overflow</title>    <link rel="shortcut icon" href="//cdn.sstatic.net/*/img/favicon.ico">    <link rel="apple-touch-icon image_src" href="//cdn.sstatic.net/*/img/apple-touch-icon.png">    <link rel="search" type="application/opensearchdescription+xml" title="Stack Overflow" href="/opensearch.xml">    <meta name="twitter:card" content="summary">    <meta name="twitter:domain" content="*.com"/>    <meta name="og:type" content="website" />    <meta name="og:image" content="http://cdn.sstatic.net/*/img/apple-touch-icon@2.png?v=fde65a5a78c6"/>    <meta name="og:title" content="Stack Overflow" />    <meta name="og:description" content="Q&amp;A for professional and enthusiast programmers" />    <meta name="og:url" content="http://*.com/"/>

1 个解决方案

#1


0  

Try with the Scanner class.

尝试使用Scanner类。

<?php

function get($url){

        import java.net.URL;
        import java.util.Scanner;

        $java_url = new URL($url);
        $java_scanner = new Scanner($java_url->openStream());

        while (($line = $java_scanner->nextLine()) != null) {
            $content .= $line;
        }

        return $content;
}


echo get("http://domain.com");

?>

If that does not work either, initialize variable content with an empty string, just in case. :)

如果这也不起作用,请使用空字符串初始化变量内容,以防万一。 :)

#1


0  

Try with the Scanner class.

尝试使用Scanner类。

<?php

function get($url){

        import java.net.URL;
        import java.util.Scanner;

        $java_url = new URL($url);
        $java_scanner = new Scanner($java_url->openStream());

        while (($line = $java_scanner->nextLine()) != null) {
            $content .= $line;
        }

        return $content;
}


echo get("http://domain.com");

?>

If that does not work either, initialize variable content with an empty string, just in case. :)

如果这也不起作用,请使用空字符串初始化变量内容,以防万一。 :)