使用java从javascript中提取url链接

时间:2022-10-29 13:33:33

I am trying to get all the urls from an html page. I have succeeded in getting the urls from the page itself but there are also javascripts that contain urls. How do I get the urls from them? I have been searching for a way for a while and i will appreciate your help.

我试图从HTML页面获取所有网址。我已成功从页面本身获取URL,但也有包含URL的javascripts。我如何从他们那里获取网址?我一直在寻找一种方式,我将非常感谢你的帮助。

2 个解决方案

#1


0  

If the URLs are just strings in the JavaScript code then you could extract them by matching everything that looks like a URL in the "script" tag text. E.g.:

如果URL只是JavaScript代码中的字符串,那么您可以通过匹配“脚本”标记文本中看起来像URL的所有内容来提取它们。例如。:

List<URL> urls = new ArrayList<URL>();
Pattern p = Pattern.compile(myUrlPattern);
Matcher m = p.matcher(eachScriptTagText);
while (m.find()) {
  urls.add(m.group());
}

The regular expression for a URL can be found easily on the internet.

可以在互联网上轻松找到URL的正则表达式。

#2


0  

Here is a classic article by Sun on webcrawling. It contains some example code that extracts URL's from HTML.

这是Sun关于webcrawling的经典文章。它包含一些从HTML中提取URL的示例代码。

#1


0  

If the URLs are just strings in the JavaScript code then you could extract them by matching everything that looks like a URL in the "script" tag text. E.g.:

如果URL只是JavaScript代码中的字符串,那么您可以通过匹配“脚本”标记文本中看起来像URL的所有内容来提取它们。例如。:

List<URL> urls = new ArrayList<URL>();
Pattern p = Pattern.compile(myUrlPattern);
Matcher m = p.matcher(eachScriptTagText);
while (m.find()) {
  urls.add(m.group());
}

The regular expression for a URL can be found easily on the internet.

可以在互联网上轻松找到URL的正则表达式。

#2


0  

Here is a classic article by Sun on webcrawling. It contains some example code that extracts URL's from HTML.

这是Sun关于webcrawling的经典文章。它包含一些从HTML中提取URL的示例代码。