Java从网站获取源代码

时间:2023-01-18 23:53:24

I have a problem once again where I cant find the source code because its hidden or something... When my java program indexes the page it finds everything but the info i need... I assume its hidden for a reason but is there anyway around this?

我再次遇到问题,因为我无法找到源代码,因为它隐藏了什么......当我的java程序索引页面时,它会找到除了我需要的信息之外的一切......我认为它隐藏了一个原因,但无论如何在这附近?

Its just a bunch of tr/td tags that show up in firebug but dont show up when viewing the page source or when i do below

它只是一堆tr / td标签,显示在firebug中但在查看页面源时或在下面时我不显示

URL url = new URL("my url");
            URLConnection yc = url.openConnection();
            BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
            String inputLine;
            while ((inputLine = in.readLine()) != null) {

I really have no idea how to attempt to get the info that i need...

我真的不知道如何尝试获取我需要的信息......

4 个解决方案

#1


3  

The reason for this behavior is because probably those tags are dynamically injected into the DOM using javascript and are not part of the initial HTML which is what you can fetch with an URLConnection. They might even be created using AJAX. You will need a javascript interpreter on your server if you want to fetch those.

这种行为的原因是因为这些标记可能是使用javascript动态注入到DOM中的,并且不是初始HTML的一部分,这是您可以使用URLConnection获取的内容。甚至可以使用AJAX创建它们。如果要获取那些,您需要在服务器上使用javascript解释器。

#2


0  

If they don't show up in the page source, they're likely being added dynamically by Javascript code. There's no way to get them from your server-side script short of including a javascript interpreter, which is rather high-overhead.

如果它们没有显示在页面源中,则可能是由Javascript代码动态添加的。除了包含javascript解释器之外,没有办法从服务器端脚本中获取它们,这是一个相当高的开销。

The information in the tags is presumably coming from somewhere, though. Why not track that down and grab it straight from there?

但是,标签中的信息可能来自某个地方。为什么不追踪它并从那里直接抓住它?

#3


0  

Try Using Jsoup.

尝试使用Jsoup。

Document doc = doc=Jsoup.parse("http:\\",10000);
System.out.print(doc.toString());

#4


0  

Assuming that the issue is that the "missing" content is being injected using javascript, the following SO Question is pertinent:

假设问题是使用javascript注入“缺失”内容,以下SO问题是相关的:

#1


3  

The reason for this behavior is because probably those tags are dynamically injected into the DOM using javascript and are not part of the initial HTML which is what you can fetch with an URLConnection. They might even be created using AJAX. You will need a javascript interpreter on your server if you want to fetch those.

这种行为的原因是因为这些标记可能是使用javascript动态注入到DOM中的,并且不是初始HTML的一部分,这是您可以使用URLConnection获取的内容。甚至可以使用AJAX创建它们。如果要获取那些,您需要在服务器上使用javascript解释器。

#2


0  

If they don't show up in the page source, they're likely being added dynamically by Javascript code. There's no way to get them from your server-side script short of including a javascript interpreter, which is rather high-overhead.

如果它们没有显示在页面源中,则可能是由Javascript代码动态添加的。除了包含javascript解释器之外,没有办法从服务器端脚本中获取它们,这是一个相当高的开销。

The information in the tags is presumably coming from somewhere, though. Why not track that down and grab it straight from there?

但是,标签中的信息可能来自某个地方。为什么不追踪它并从那里直接抓住它?

#3


0  

Try Using Jsoup.

尝试使用Jsoup。

Document doc = doc=Jsoup.parse("http:\\",10000);
System.out.print(doc.toString());

#4


0  

Assuming that the issue is that the "missing" content is being injected using javascript, the following SO Question is pertinent:

假设问题是使用javascript注入“缺失”内容,以下SO问题是相关的: