JPedal library in java is usually used to convert pdf to XML or HTML. However, I needed to know if we could extract data from HTML5 document and save it to XML using JPedal library API? Is there any other possible alternative to this?
java中的JPedal库通常用于将pdf转换为XML或HTML。但是,我需要知道我们是否可以从HTML5文档中提取数据并使用JPedal库API将其保存到XML?还有其他可能的选择吗?
Also , I am trying to parse HTML5 document using Java and store it in XML. are there any good solutions to find just specific tags and render an XML out of them?
此外,我试图使用Java解析HTML5文档并将其存储在XML中。有什么好的解决方案可以找到特定的标签并从中呈现XML吗?
Please do let me know . Thank you.
请告诉我。谢谢。
1 个解决方案
#1
0
There are a number of Java HTML parsers out there, but I recommend using the HTML5 parser from validator.nu available for download from here: http://about.validator.nu/htmlparser/.
有很多Java HTML解析器,但我建议使用validator.nu中的HTML5解析器,可以从这里下载:http://about.validator.nu/htmlparser/。
Written to use the HTML5 parser algorithm by one of the main protagonists of HTML5, Henri Sivonen of Mozilla, you won't find a more reliable HTML parser and it creates a true DOM that can be manipulated using standard XML tools and queried for hyperlinks using XPath. There are examples of how to use XSLT transformations with it and how to get an XML serialization of the created DOM.
编写为使用HTML5的主要角色之一的HTML5解析器算法,Mozilla的Henri Sivonen,你将找不到更可靠的HTML解析器,它创建了一个真正的DOM,可以使用标准XML工具进行操作并查询超链接使用XPath的。有一些如何使用XSLT转换以及如何获取创建的DOM的XML序列化的示例。
#1
0
There are a number of Java HTML parsers out there, but I recommend using the HTML5 parser from validator.nu available for download from here: http://about.validator.nu/htmlparser/.
有很多Java HTML解析器,但我建议使用validator.nu中的HTML5解析器,可以从这里下载:http://about.validator.nu/htmlparser/。
Written to use the HTML5 parser algorithm by one of the main protagonists of HTML5, Henri Sivonen of Mozilla, you won't find a more reliable HTML parser and it creates a true DOM that can be manipulated using standard XML tools and queried for hyperlinks using XPath. There are examples of how to use XSLT transformations with it and how to get an XML serialization of the created DOM.
编写为使用HTML5的主要角色之一的HTML5解析器算法,Mozilla的Henri Sivonen,你将找不到更可靠的HTML解析器,它创建了一个真正的DOM,可以使用标准XML工具进行操作并查询超链接使用XPath的。有一些如何使用XSLT转换以及如何获取创建的DOM的XML序列化的示例。