I'm trying to parse XML file which contains Hebrew chars. I know that the file is correct because if I output the file (from a different software) without the hebrew chars, it parses just fine.
我正在尝试解析包含希伯来语字符的XML文件。我知道该文件是正确的,因为如果我输出文件(来自不同的软件)没有希伯来字符,它解析就好了。
I tried many things, but I always get this error
我尝试了很多东西,但我总是遇到这个错误
MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
My latest attempt was to open it using FileInputStream
and specify the encoding
我最近的尝试是使用FileInputStream打开它并指定编码
DocumentBuilder db = dbf.newDocumentBuilder();
document = db.parse(new FileInputStream(new File(xmlFileName)), "Cp1252");
(Cp1252
is an encoding that worked for me in a different app) But I got the same result.
(Cp1252是一个在不同的应用程序中为我工作的编码)但我得到了相同的结果。
Tried using ByteArray
as well, nothing worked.
尝试使用ByteArray,没有任何效果。
Any suggestions?
2 个解决方案
#1
6
if you know the correct encoding of the file and it's not "utf-8", then you can either add it to the xml header:
如果您知道文件的正确编码并且它不是“utf-8”,那么您可以将其添加到xml标头:
<?xml version="1.0" encoding="[correct encoding here]" ?>
or parse it as a Reader:
或者将其解析为读者:
db.parse(new InputStreamReader(new FileInputStream(new File(xmlFileName)), "[correct encoding here]"));
#2
0
The solution is quite simple, get the content in UTF-8 format, and override the SAX input source.
解决方案非常简单,以UTF-8格式获取内容,并覆盖SAX输入源。
File file = new File("c:\\file-utf.xml");
InputStream inputStream= new FileInputStream(file);
Reader reader = new InputStreamReader(inputStream,"UTF-8");
InputSource is = new InputSource(reader);
// is.setEncoding("UTF-8"); -> This line causes error! Content is not allowed in prolog
saxParser.parse(is, handler);
You can read the full example here – http://www.mkyong.com/java/how-to-read-utf-8-xml-file-in-java-sax-parser/
你可以在这里阅读完整的例子 - http://www.mkyong.com/java/how-to-read-utf-8-xml-file-in-java-sax-parser/
#1
6
if you know the correct encoding of the file and it's not "utf-8", then you can either add it to the xml header:
如果您知道文件的正确编码并且它不是“utf-8”,那么您可以将其添加到xml标头:
<?xml version="1.0" encoding="[correct encoding here]" ?>
or parse it as a Reader:
或者将其解析为读者:
db.parse(new InputStreamReader(new FileInputStream(new File(xmlFileName)), "[correct encoding here]"));
#2
0
The solution is quite simple, get the content in UTF-8 format, and override the SAX input source.
解决方案非常简单,以UTF-8格式获取内容,并覆盖SAX输入源。
File file = new File("c:\\file-utf.xml");
InputStream inputStream= new FileInputStream(file);
Reader reader = new InputStreamReader(inputStream,"UTF-8");
InputSource is = new InputSource(reader);
// is.setEncoding("UTF-8"); -> This line causes error! Content is not allowed in prolog
saxParser.parse(is, handler);
You can read the full example here – http://www.mkyong.com/java/how-to-read-utf-8-xml-file-in-java-sax-parser/
你可以在这里阅读完整的例子 - http://www.mkyong.com/java/how-to-read-utf-8-xml-file-in-java-sax-parser/