JAXB错误的说明:1字节UTF-8序列的字节1无效

时间:2023-01-06 14:33:41

We're parsing an XML document using JAXB and get this error:

我们正在使用JAXB解析XML文档并收到此错误:

[org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)

What exactly does this mean and how can we resolve this??

这究竟是什么意思,我们如何解决这个问题?

We are executing the code as:

我们正在执行以下代码:

jaxbContext = JAXBContext.newInstance(Results.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
unmarshaller.setSchema(getSchema());
results = (Results) unmarshaller.unmarshal(new FileInputStream(inputFile));

Update

更新

Issue appears to be due to this "funny" character in the XML file: ¿

问题似乎是由于XML文件中的这个“有趣”字符:¿

Why would this cause such a problem??

为什么会导致这样的问题?

Update 2

更新2

There are two of those weird characters in the file. They are around the middle of the file. Note that the file is created based on data in a database and those weird characters somehow got into the database.

文件中有两个奇怪的字符。它们位于文件的中间。请注意,该文件是基于数据库中的数据创建的,并且这些奇怪的字符以某种方式进入数据库。

Update 3

更新3

Here is the full XML snippet:

以下是完整的XML代码段:

<Description><![CDATA[Mt. Belvieu ¿ Texas]]></Description>

Update 4

更新4

Note that there is no <?xml ...?> header.

请注意,没有<?xml ...?>标头。

The HEX for the special character is BF

特殊字符的HEX是BF

3 个解决方案

#1


3  

So, you problem is that JAXB treats XML files without <?xml ...?> header as UTF-8, when your file uses some other encoding (probably ISO-8859-1 or Windows-1252, if 0xBF character actually intended to mean ¿).

所以,你的问题是当你的文件使用其他编码时,JAXB将没有<?xml ...?>标题的XML文件视为UTF-8(可能是ISO-8859-1或Windows-1252,如果0xBF字符实际上是为了意思 )。

If you can change the producer of the file, you may add <?xml ...?> header with actual encoding specification, or just use UTF-8 to write a file.

如果您可以更改文件的生产者,可以使用实际编码规范添加<?xml ...?>标头,或者只使用UTF-8编写文件。

If you can't change the producer, you have to use InputStreamReader with explicit encoding specification, because (unfortunately) JAXB don't allow to change its default encoding:

如果您无法更改生成器,则必须使用具有显式编码规范的InputStreamReader,因为(遗憾的是)JAXB不允许更改其默认编码:

results = (Results) unmarshaller.unmarshal(
   new InputStreamReader(new FileInputStream(inputFile), "ISO-8859-1")); 

However, this solution is fragile - it fails on input files with <?xml ...?> header with different encoding specification.

但是,这个解决方案很脆弱 - 它使用带有不同编码规范的<?xml ...?>标头的输入文件失败。

#2


1  

That's probably a Byte Order Mark (BOM), and is a special byte sequence at the start of a UTF file. They are, frankly, a pain in the arse, and seem particularly common when interacting with .net systems.

这可能是字节顺序标记(BOM),并且是UTF文件开头的特殊字节序列。坦率地说,它们是屁股中的痛苦,在与.net系统交互时似乎特别常见。

Try rephrasing your code to use a Reader rather than an InputStream:

尝试重新编写代码以使用Reader而不是InputStream:

results = (Results) unmarshaller.unmarshal(new FileReader(inputFile));

A Reader is UTF-aware, and might make a better stab at it. More simply, pass the File directly to the Unmarshaller, and let the JAXBContext worry about it:

Reader是UTF感知的,可能会更好地刺激它。更简单地说,直接将文件传递给Unmarshaller,让JAXBContext担心它:

results = (Results) unmarshaller.unmarshal(inputFile);

#3


0  

It sounds as if your XML is encoded with UTF-16 but that encoding is not getting passed to the Unmarshaller. With the Marshaller you can set that using marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-16"); but because the Unmarshaller is not required to support any properties, I am not sure how to enforce that other than ensuring your XML document has encoding="UTF-16" in the initial <?xml?> element.

听起来好像你的XML是用UTF-16编码的,但是这个编码没有传递给Unmarshaller。使用Marshaller你可以使用marshaller.setProperty(Marshaller.JAXB_ENCODING,“UTF-16”)设置它;但由于Unmarshaller不需要支持任何属性,我不确定除了确保您的XML文档在初始<?xml?>元素中具有encoding =“UTF-16”之外,如何强制执行。

#1


3  

So, you problem is that JAXB treats XML files without <?xml ...?> header as UTF-8, when your file uses some other encoding (probably ISO-8859-1 or Windows-1252, if 0xBF character actually intended to mean ¿).

所以,你的问题是当你的文件使用其他编码时,JAXB将没有<?xml ...?>标题的XML文件视为UTF-8(可能是ISO-8859-1或Windows-1252,如果0xBF字符实际上是为了意思 )。

If you can change the producer of the file, you may add <?xml ...?> header with actual encoding specification, or just use UTF-8 to write a file.

如果您可以更改文件的生产者,可以使用实际编码规范添加<?xml ...?>标头,或者只使用UTF-8编写文件。

If you can't change the producer, you have to use InputStreamReader with explicit encoding specification, because (unfortunately) JAXB don't allow to change its default encoding:

如果您无法更改生成器,则必须使用具有显式编码规范的InputStreamReader,因为(遗憾的是)JAXB不允许更改其默认编码:

results = (Results) unmarshaller.unmarshal(
   new InputStreamReader(new FileInputStream(inputFile), "ISO-8859-1")); 

However, this solution is fragile - it fails on input files with <?xml ...?> header with different encoding specification.

但是,这个解决方案很脆弱 - 它使用带有不同编码规范的<?xml ...?>标头的输入文件失败。

#2


1  

That's probably a Byte Order Mark (BOM), and is a special byte sequence at the start of a UTF file. They are, frankly, a pain in the arse, and seem particularly common when interacting with .net systems.

这可能是字节顺序标记(BOM),并且是UTF文件开头的特殊字节序列。坦率地说,它们是屁股中的痛苦,在与.net系统交互时似乎特别常见。

Try rephrasing your code to use a Reader rather than an InputStream:

尝试重新编写代码以使用Reader而不是InputStream:

results = (Results) unmarshaller.unmarshal(new FileReader(inputFile));

A Reader is UTF-aware, and might make a better stab at it. More simply, pass the File directly to the Unmarshaller, and let the JAXBContext worry about it:

Reader是UTF感知的,可能会更好地刺激它。更简单地说,直接将文件传递给Unmarshaller,让JAXBContext担心它:

results = (Results) unmarshaller.unmarshal(inputFile);

#3


0  

It sounds as if your XML is encoded with UTF-16 but that encoding is not getting passed to the Unmarshaller. With the Marshaller you can set that using marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-16"); but because the Unmarshaller is not required to support any properties, I am not sure how to enforce that other than ensuring your XML document has encoding="UTF-16" in the initial <?xml?> element.

听起来好像你的XML是用UTF-16编码的,但是这个编码没有传递给Unmarshaller。使用Marshaller你可以使用marshaller.setProperty(Marshaller.JAXB_ENCODING,“UTF-16”)设置它;但由于Unmarshaller不需要支持任何属性,我不确定除了确保您的XML文档在初始<?xml?>元素中具有encoding =“UTF-16”之外,如何强制执行。