如何在Java中获取HTML

时间:2022-12-08 11:42:43

Without the use of any external library, what is the simplest way to fetch a website's HTML content into a String?

如果不使用任何外部库,将网站的HTML内容提取到String中的最简单方法是什么?

5 个解决方案

#1


32  

I'm currently using this:

我目前正在使用这个:

String content = null;
URLConnection connection = null;
try {
  connection =  new URL("http://www.google.com").openConnection();
  Scanner scanner = new Scanner(connection.getInputStream());
  scanner.useDelimiter("\\Z");
  content = scanner.next();
}catch ( Exception ex ) {
    ex.printStackTrace();
}
System.out.println(content);

But not sure if there's a better way.

但不确定是否有更好的方法。

#2


20  

This has worked well for me:

这对我有用:

URL url = new URL(theURL);
InputStream is = url.openStream();
int ptr = 0;
StringBuffer buffer = new StringBuffer();
while ((ptr = is.read()) != -1) {
    buffer.append((char)ptr);
}

Not sure at to whether the other solution(s) provided are any more efficient or not.

不确定提供的其他解决方案是否更有效。

#3


2  

I just left this post in your other thread, though what you have above might work as well. I don't think either would be any easier than the other. The Apache packages can be accessed by just using import org.apache.commons.HttpClient at the top of your code.

我刚刚在你的另一个帖子中留下了这篇文章,不过你上面的内容也可以。我认为要么比另一个更容易。只需使用代码顶部的import org.apache.commons.HttpClient即可访问Apache包。

Edit: Forgot the link ;)

编辑:忘记链接;)

#4


2  

Whilst not vanilla-Java, I'll offer up a simpler solution. Use Groovy ;-)

虽然不是vanilla-Java,但我会提供一个更简单的解决方案。使用Groovy ;-)

String siteContent = new URL("http://www.google.com").text

#5


0  

Its not library but a tool named curl generally installed in most of the servers or you can easily install in ubuntu by

它不是库,而是一个名为curl的工具,通常安装在大多数服务器中,或者您可以轻松地在ubuntu中安装

sudo apt install curl

Then fetch any html page and store it to your local file like an example

然后获取任何html页面并将其存储到本地文件中,如示例所示

curl https://www.facebook.com/ > fb.html

You will get the home page html.You can run it in your browser as well.

您将获得主页html。您也可以在浏览器中运行它。

#1


32  

I'm currently using this:

我目前正在使用这个:

String content = null;
URLConnection connection = null;
try {
  connection =  new URL("http://www.google.com").openConnection();
  Scanner scanner = new Scanner(connection.getInputStream());
  scanner.useDelimiter("\\Z");
  content = scanner.next();
}catch ( Exception ex ) {
    ex.printStackTrace();
}
System.out.println(content);

But not sure if there's a better way.

但不确定是否有更好的方法。

#2


20  

This has worked well for me:

这对我有用:

URL url = new URL(theURL);
InputStream is = url.openStream();
int ptr = 0;
StringBuffer buffer = new StringBuffer();
while ((ptr = is.read()) != -1) {
    buffer.append((char)ptr);
}

Not sure at to whether the other solution(s) provided are any more efficient or not.

不确定提供的其他解决方案是否更有效。

#3


2  

I just left this post in your other thread, though what you have above might work as well. I don't think either would be any easier than the other. The Apache packages can be accessed by just using import org.apache.commons.HttpClient at the top of your code.

我刚刚在你的另一个帖子中留下了这篇文章,不过你上面的内容也可以。我认为要么比另一个更容易。只需使用代码顶部的import org.apache.commons.HttpClient即可访问Apache包。

Edit: Forgot the link ;)

编辑:忘记链接;)

#4


2  

Whilst not vanilla-Java, I'll offer up a simpler solution. Use Groovy ;-)

虽然不是vanilla-Java,但我会提供一个更简单的解决方案。使用Groovy ;-)

String siteContent = new URL("http://www.google.com").text

#5


0  

Its not library but a tool named curl generally installed in most of the servers or you can easily install in ubuntu by

它不是库,而是一个名为curl的工具,通常安装在大多数服务器中,或者您可以轻松地在ubuntu中安装

sudo apt install curl

Then fetch any html page and store it to your local file like an example

然后获取任何html页面并将其存储到本地文件中,如示例所示

curl https://www.facebook.com/ > fb.html

You will get the home page html.You can run it in your browser as well.

您将获得主页html。您也可以在浏览器中运行它。