如何在生成XML时在CDATA中保存新行?

时间:2022-05-30 21:44:26

I want to write some text that contains whitespace characters such as newline and tab into an xml file so I use

我想将一些包含空格字符(如newline和tab)的文本写入xml文件中,以便使用

Element element = xmldoc.createElement("TestElement");
element.appendChild(xmldoc.createCDATASection(somestring));

but when I read this back in using

但是当我把它读回来的时候。

Node vs =  xmldoc.getElementsByTagName("TestElement").item(0);
String x = vs.getFirstChild().getNodeValue();

I get a string that has no newlines anymore.
When i look directly into the xml on disk, the newlines seem preserved. so the problem occurs when reading in the xml file.

我得到一个不再有换行的字符串。当我直接查看磁盘上的xml时,这些新行看起来是保留的。因此,在读取xml文件时出现了问题。

How can I preserve the newlines?

如何保存新行?

Thanks!

谢谢!

5 个解决方案

#1


5  

I don't know how you parse and write your document, but here's an enhanced code example based on yours:

我不知道如何解析和编写文档,但这里有一个基于您的增强代码示例:

// creating the document in-memory                                                        
Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();

Element element = xmldoc.createElement("TestElement");                                    
xmldoc.appendChild(element);                                                              
element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n"));              

// serializing the xml to a string                                                        
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();             

DOMImplementationLS impl =                                                                
    (DOMImplementationLS)registry.getDOMImplementation("LS");                             

LSSerializer writer = impl.createLSSerializer();                                          
String str = writer.writeToString(xmldoc);                                                

// printing the xml for verification of whitespace in cdata                               
System.out.println("--- XML ---");                                                        
System.out.println(str);                                                                  

// de-serializing the xml from the string                                                 
final Charset charset = Charset.forName("utf-16");                                        
final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset));       
Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);

Node vs =  xmldoc2.getElementsByTagName("TestElement").item(0);                           
final Node child = vs.getFirstChild();                                                    
String x = child.getNodeValue();                                                          

// print the value, yay!                                                                  
System.out.println("--- Node Text ---");                                                  
System.out.println(x);                                                                    

The serialization using LSSerializer is the W3C way to do it (see here). The output is as expected, with line separators:

使用LSSerializer的序列化是实现它的W3C方法(请参见这里)。输出如预期,线分隔符:

--- XML --- 
<?xml version="1.0" encoding="UTF-16"?>
<TestElement><![CDATA[first line
second line ]]></TestElement>
--- Node Text --- 
first line
second line

#2


2  

You need to check the type of each node using node.getNodeType(). If the type is CDATA_SECTION_NODE, you need to concat the CDATA guards to node.getNodeValue.

您需要使用node. getnodetype()检查每个节点的类型。如果类型是CDATA_SECTION_NODE,则需要将CDATA警卫限制为node.getNodeValue。

#3


2  

You don't necessarily have to use CDATA to preserve white space characters. The XML specification specify how to encode these characters.

您不必使用CDATA来保存空白字符。XML规范指定如何对这些字符进行编码。

So for example, if you have an element with value that contains new space you should encode it with

例如,如果你有一个值包含新空间的元素,你应该用它来编码

  &#xA;

Carriage return:

回车:

 &#xD;

And so forth

等等

#4


0  

EDIT: cut all the irrelevant stuff

编辑:剪掉所有不相关的东西

I'm curious to know what DOM implementation you're using, because it doesn't mirror the default behaviour of the one in a couple of JVMs I've tried (they ship with a Xerces impl). I'm also interested in what newline characters your document has.

我很想知道您使用的DOM实现是什么,因为它没有反映我尝试过的几个jvm中的默认行为(它们与Xerces impl一起发布)。我还对您的文档的换行符感兴趣。

I'm not sure if whether CDATA should preserve whitespace is a given. I suspect that there are many factors involved. Don't DTDs/schemas affect how whitespace is processed?

我不确定CDATA是否应该保留空格。我怀疑这其中有很多因素。dtd /模式不影响空格的处理吗?

You could try using the xml:space="preserve" attribute.

您可以尝试使用xml:space=“preserve”属性。

#5


0  

xml:space='preserve' is not it. That is only for "all whitespace" nodes. That is, if you want the whitespace nodes in

xml:空间= '保存'不是它。这仅适用于“所有空格”节点。也就是说,如果您想要进入空白节点

<this xml:space='preserve'> <has/>
<whitespace/>
</this>

But see that those whitespace nodes are ONLY whitespace.

但请注意,这些空白节点只是空白。

I have been struggling to get Xerces to generate events allowing isolation of CDATA content as well. I have no solution as yet.

我一直在努力让Xerces生成允许隔离CDATA内容的事件。我还没有解决的办法。

#1


5  

I don't know how you parse and write your document, but here's an enhanced code example based on yours:

我不知道如何解析和编写文档,但这里有一个基于您的增强代码示例:

// creating the document in-memory                                                        
Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();

Element element = xmldoc.createElement("TestElement");                                    
xmldoc.appendChild(element);                                                              
element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n"));              

// serializing the xml to a string                                                        
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();             

DOMImplementationLS impl =                                                                
    (DOMImplementationLS)registry.getDOMImplementation("LS");                             

LSSerializer writer = impl.createLSSerializer();                                          
String str = writer.writeToString(xmldoc);                                                

// printing the xml for verification of whitespace in cdata                               
System.out.println("--- XML ---");                                                        
System.out.println(str);                                                                  

// de-serializing the xml from the string                                                 
final Charset charset = Charset.forName("utf-16");                                        
final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset));       
Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);

Node vs =  xmldoc2.getElementsByTagName("TestElement").item(0);                           
final Node child = vs.getFirstChild();                                                    
String x = child.getNodeValue();                                                          

// print the value, yay!                                                                  
System.out.println("--- Node Text ---");                                                  
System.out.println(x);                                                                    

The serialization using LSSerializer is the W3C way to do it (see here). The output is as expected, with line separators:

使用LSSerializer的序列化是实现它的W3C方法(请参见这里)。输出如预期,线分隔符:

--- XML --- 
<?xml version="1.0" encoding="UTF-16"?>
<TestElement><![CDATA[first line
second line ]]></TestElement>
--- Node Text --- 
first line
second line

#2


2  

You need to check the type of each node using node.getNodeType(). If the type is CDATA_SECTION_NODE, you need to concat the CDATA guards to node.getNodeValue.

您需要使用node. getnodetype()检查每个节点的类型。如果类型是CDATA_SECTION_NODE,则需要将CDATA警卫限制为node.getNodeValue。

#3


2  

You don't necessarily have to use CDATA to preserve white space characters. The XML specification specify how to encode these characters.

您不必使用CDATA来保存空白字符。XML规范指定如何对这些字符进行编码。

So for example, if you have an element with value that contains new space you should encode it with

例如,如果你有一个值包含新空间的元素,你应该用它来编码

  &#xA;

Carriage return:

回车:

 &#xD;

And so forth

等等

#4


0  

EDIT: cut all the irrelevant stuff

编辑:剪掉所有不相关的东西

I'm curious to know what DOM implementation you're using, because it doesn't mirror the default behaviour of the one in a couple of JVMs I've tried (they ship with a Xerces impl). I'm also interested in what newline characters your document has.

我很想知道您使用的DOM实现是什么,因为它没有反映我尝试过的几个jvm中的默认行为(它们与Xerces impl一起发布)。我还对您的文档的换行符感兴趣。

I'm not sure if whether CDATA should preserve whitespace is a given. I suspect that there are many factors involved. Don't DTDs/schemas affect how whitespace is processed?

我不确定CDATA是否应该保留空格。我怀疑这其中有很多因素。dtd /模式不影响空格的处理吗?

You could try using the xml:space="preserve" attribute.

您可以尝试使用xml:space=“preserve”属性。

#5


0  

xml:space='preserve' is not it. That is only for "all whitespace" nodes. That is, if you want the whitespace nodes in

xml:空间= '保存'不是它。这仅适用于“所有空格”节点。也就是说,如果您想要进入空白节点

<this xml:space='preserve'> <has/>
<whitespace/>
</this>

But see that those whitespace nodes are ONLY whitespace.

但请注意,这些空白节点只是空白。

I have been struggling to get Xerces to generate events allowing isolation of CDATA content as well. I have no solution as yet.

我一直在努力让Xerces生成允许隔离CDATA内容的事件。我还没有解决的办法。