xml cdata返回xml解析失败:语法错误,使用xslt编码非法字节序列

时间:2021-03-07 16:45:53

i searched for an answer and couldn't find one.

我寻找答案,却找不到。

i have a long xml generated by server. i want to display some of the nodes using xslt.

我有一个由服务器生成的长xml。我想使用xslt显示一些节点。

the problem is: when i open the xml on browser i get: XML parsing failed,XML parsing failed: syntax error,illegal byte sequence in encoding

问题是:当我在浏览器上打开xml时,我得到:xml解析失败,xml解析失败:语法错误,编码中的非法字节序列

the error is inside a cdata inside one of the nodes staring with <![CDATA[ and ends with ]]>

错误在cdata中,其中一个节点以以]>结尾</p>

the error is: , and a hundreds of characters like it.

错误是:,以及成百上千的字符。

to my understanding this shouldn't happen if its in cdata it should ignore or escape it.

根据我的理解,如果它在cdata中,它应该忽略或转义。

the xml encoding is utf8 .

xml编码是utf8。

thank you for your help.

谢谢你的帮助。

1 个解决方案

#1


5  

The encoding

You say "the xml encoding is utf8". Your parser is telling you that you're wrong. It's finding a byte sequence that cannot occur in UTF-8; in my experience that often happens when ISO 8859-1 (ISO Latin 1) data is wrongly tagged UTF-8.

您说“xml编码是utf8”。解析器告诉您,您错了。它寻找一个在UTF-8中不能出现的字节序列;根据我的经验,当ISO 8859-1 (ISO Latin 1)数据被错误地标记为UTF-8时,常常会发生这种情况。

If you have examined the data in question in a hex dump or similar tool and confirmed that it's legal UTF-8, then it looks like you have a bug to report to your vendor. If you haven't, then it says here the parser is likely to be right, and your data is unlikely to be UTF-8. Find out what it is, and declare it correctly, or fix the configuration of the server to make it produce a UTF-8 data stream, or use a character-encoding conversion utility to convert the server's output to UTF-8.

如果您在十六进制转储或类似工具中检查了相关数据并确认它是合法的UTF-8,那么看起来您有一个错误要报告给您的供应商。如果没有,那么它说解析器很可能是正确的,您的数据不太可能是UTF-8。找出它是什么,并正确声明它,或者修复服务器的配置,使其生成UTF-8数据流,或者使用字符编码转换实用程序将服务器的输出转换为UTF-8。

CDATA sections

CDATA sections occur within a character sequence being parsed as XML; they declare that their contents are character data and not XML delimiters. A CDATA section does not and cannot declare that its content is an arbitrary sequence of bits, bytes, or octets; by the time a sequence of characters is recognized as a CDATA section, the bits in the encoding of the data stream have already been converted to characters; it's too late to say "Don't parse these octets as characters!"

CDATA节出现在被解析为XML的字符序列中;它们声明其内容是字符数据,而不是XML分隔符。CDATA部分没有也不能声明其内容是位、字节或八进制数的任意序列;当字符序列被识别为CDATA部分时,数据流编码中的位已经被转换为字符;现在说“不要把这些八进制当作字符来解析”已经太晚了。

#1


5  

The encoding

You say "the xml encoding is utf8". Your parser is telling you that you're wrong. It's finding a byte sequence that cannot occur in UTF-8; in my experience that often happens when ISO 8859-1 (ISO Latin 1) data is wrongly tagged UTF-8.

您说“xml编码是utf8”。解析器告诉您,您错了。它寻找一个在UTF-8中不能出现的字节序列;根据我的经验,当ISO 8859-1 (ISO Latin 1)数据被错误地标记为UTF-8时,常常会发生这种情况。

If you have examined the data in question in a hex dump or similar tool and confirmed that it's legal UTF-8, then it looks like you have a bug to report to your vendor. If you haven't, then it says here the parser is likely to be right, and your data is unlikely to be UTF-8. Find out what it is, and declare it correctly, or fix the configuration of the server to make it produce a UTF-8 data stream, or use a character-encoding conversion utility to convert the server's output to UTF-8.

如果您在十六进制转储或类似工具中检查了相关数据并确认它是合法的UTF-8,那么看起来您有一个错误要报告给您的供应商。如果没有,那么它说解析器很可能是正确的,您的数据不太可能是UTF-8。找出它是什么,并正确声明它,或者修复服务器的配置,使其生成UTF-8数据流,或者使用字符编码转换实用程序将服务器的输出转换为UTF-8。

CDATA sections

CDATA sections occur within a character sequence being parsed as XML; they declare that their contents are character data and not XML delimiters. A CDATA section does not and cannot declare that its content is an arbitrary sequence of bits, bytes, or octets; by the time a sequence of characters is recognized as a CDATA section, the bits in the encoding of the data stream have already been converted to characters; it's too late to say "Don't parse these octets as characters!"

CDATA节出现在被解析为XML的字符序列中;它们声明其内容是字符数据,而不是XML分隔符。CDATA部分没有也不能声明其内容是位、字节或八进制数的任意序列;当字符序列被识别为CDATA部分时,数据流编码中的位已经被转换为字符;现在说“不要把这些八进制当作字符来解析”已经太晚了。