最好的方法是转义XML字符?

I've got html datas that i'm converting into a Dom4J document.

我已经将html数据转换为Dom4J文档。

I've met an error:

我遇到了一个错误:

org.dom4j.DocumentException: Error on line 1 of document  : Reference is not allowed in prolog. Nested exception: Reference is not allowed in prolog.
    at org.dom4j.io.SAXReader.read(SAXReader.java:482)
    at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)
    at MonTest.main(MonTest.java:21)
Nested exception: 
    org.xml.sax.SAXParseException: Reference is not allowed in prolog.

It was a character "&" that i needed to escape into & amp; in order to build the document.

我需要逃进&安普的是一个“&”字为了构建文档。

In XML, it seems that we need to escape 5 characters: (gt, lt, quot, amp, apos)

在XML中，我们似乎需要转义5个字符:(gt, lt， "， amp, apos)

Nevertheless, how can i escape it, without escaping it into the "nodes" elements:

然而，我如何能够在不将其转义为“节点”元素的情况下，将其转义为:

<div id="test" class='toto'>A&A<A"A</div>

should give:

应该给:

<div id="test" class='toto'>A&amp;A&lt;A&quot;A</div>

and not

而不是

&lt;div id=&quot;test&quot; class=&apos;toto&apos;&gt;A&amp;A&lt;A&quot;A&lt;/div&gt;

Thank you,

谢谢你！

2 个解决方案

#1

Escape strings before adding to XML document. Use StringEscapeUtils.escapeXml method from Apache Commons Lang. Use some library to build XML e.g. http://code.google.com/p/joox/.

在添加到XML文档之前要转义字符串。使用Apache Commons Lang中的StringEscapeUtils.escapeXml方法，使用一些库来构建XML，例如http://code.google.com/p/joox/。

#2

I would have a look at using a lenient HTML XMLReader instead of the default XMLReader implementation. Something like tag soup or html tidy.

我将考虑使用宽松的HTML XMLReader而不是默认的XMLReader实现。比如标签汤或者html整洁。

#1