用sax读xml文件时常会出现这个异常,一般上网可以看到的原因是:
在java中测试inputstream中是否有bom可以用apache commons IO 的,如果你的项目有引入IO的情况下
BOM的基础知识可以参考:/faq/utf_bom.html
也贴一个过滤BOM的小方法
private static InputStream checkForUtf8BOMAndDiscardIfAny(InputStream inputStream) throws IOException {
PushbackInputStream pushbackInputStream = new PushbackInputStream(new BufferedInputStream(inputStream), 3);
byte[] bom = new byte[3];
if ((bom) != -1) {
if (!(bom[0] == (byte) 0xEF && bom[1] == (byte) 0xBB && bom[2] == (byte) 0xBF)) {
(bom);
}
}
return pushbackInputStream;
}
2.非良构的xml内容,例如在<?xml前出现非法的字符
可以采用正则从<?xml开始截取内容
3.今天我发现的是因为:Accept-Encoding启用了压缩传输,可以试着把Accept-Encoding设成:identity,同时也要注意是否采用Chunked Transfer Encoding(分段传输)
后话,有人说直接把流传给sax parser可以过滤掉。当然我用的sax,jaxp发现不会过滤掉。文档测试的环境:java 7 + xerces sax。全部异常栈
; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
at (Unknown Source)
at (Unknown Source)
at (Unknown Source)
at (Unknown Source)
at (Unknown Source)
at (Unknown Source)
at $(Unknown Source)
at (Unknown Source)
at .(Unknown Source)
at .(Unknown Source)
at (Unknown Source)
at (Unknown Source)
at $(Unknown Source)
at (Unknown Source)