有效删除UTF字节顺序标记[重复]

时间:2022-10-24 19:38:42

This question already has an answer here:

这个问题在这里已有答案:

I am looking for an efficient solution to the following problem:

我正在寻找以下问题的有效解决方案:

org.xml.sax.SAXParseException: Content is not allowed in prolog

org.xml.sax.SAXParseException:prolog中不允许使用内容

The problem is skipping (or removing) the first 3 bom bytes (if present) before unmarshalling the file (using jaxb).

问题是在解组文件之前(使用jaxb)跳过(或删除)前3个字节(如果存在)。

I can get it to work by checking the first three bytes and then writing everything after that to a new file and using the new file, however this seems horribly inefficient.

我可以通过检查前三个字节然后将其后的所有内容写入新文件并使用新文件来使其工作,但这看起来非常低效。

I have tried moving the file pointer over 3 bytes if the BOM is present (and verified the pointer position ofc.) , however when I pass the inputstream to jaxb it still throws the same exception; my gut instinct being that the file pointer is being reset.

如果存在BOM,我已经尝试将文件指针移动超过3个字节(并验证了指针的位置),但是当我将输入流传递给jaxb时,它仍会抛出相同的异常;我的直觉是文件指针正在被重置。

Does anyone have any ideas for this?

有没有人对此有任何想法?

Thanks

谢谢

1 个解决方案

#1


5  

Use a InputStream decoractor that strips the BOM such as BOMInputStream from Apache Commons IO.

使用一个InputStream decoractor剥离BOM,例如来自Apache Commons IO的BOMInputStream。

#1


5  

Use a InputStream decoractor that strips the BOM such as BOMInputStream from Apache Commons IO.

使用一个InputStream decoractor剥离BOM,例如来自Apache Commons IO的BOMInputStream。