将已知编码的文件转换为UTF-8

时间:2021-05-27 20:12:34

I need to convert text file to the String, which, finally, I should put as an input parameter (type InputStream) to IFile.create (Eclipse). Looking for the example or how to do that but still can not figure out...need your help!

我需要将文本文件转换为字符串,最后,我应该将该字符串作为输入参数(类型为InputStream)转换为IFile。创建(Eclipse)。找个例子或者怎么做,但还是搞不清楚……需要你的帮助!

just for testing, I did try to convert original text file to UTF-8 encoded with this code

为了进行测试,我尝试将原始文本文件转换为使用此代码编码的UTF-8

FileInputStream fis = new FileInputStream(FilePath);
InputStreamReader isr = new InputStreamReader(fis);

Reader in = new BufferedReader(isr);
StringBuffer buffer = new StringBuffer();

int ch;
while ((ch = in.read()) > -1) {
    buffer.append((char)ch);
}
in.close();


FileOutputStream fos = new FileOutputStream(FilePath+".test.txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(buffer.toString());
out.close();

but even thought the final *.test.txt file has UTF-8 encoding, the characters inside are corrupted.

但即使是最后的测试。txt文件有UTF-8编码,里面的字符被损坏。

1 个解决方案

#1


9  

You need to specify the encoding of the InputStreamReader using the Charset parameter.

您需要使用Charset参数指定InputStreamReader的编码。

                                    // ↓ whatever the input's encoding is
Charset inputCharset = Charset.forName("ISO-8859-1");
InputStreamReader isr = new InputStreamReader(fis, inputCharset));

This also works:

这同样适用:

InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1"));

See also:

参见:

SO search where I found all these links: https://*.com/search?q=java+detect+encoding

搜索我找到的所有这些链接:https://*.com/search?


You can get the default charset - which is comes from the system the JVM is running on - at runtime via Charset.defaultCharset().

您可以通过charset . defaultcharset()在运行时获得默认的字符集——它来自JVM正在运行的系统。

#1


9  

You need to specify the encoding of the InputStreamReader using the Charset parameter.

您需要使用Charset参数指定InputStreamReader的编码。

                                    // ↓ whatever the input's encoding is
Charset inputCharset = Charset.forName("ISO-8859-1");
InputStreamReader isr = new InputStreamReader(fis, inputCharset));

This also works:

这同样适用:

InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1"));

See also:

参见:

SO search where I found all these links: https://*.com/search?q=java+detect+encoding

搜索我找到的所有这些链接:https://*.com/search?


You can get the default charset - which is comes from the system the JVM is running on - at runtime via Charset.defaultCharset().

您可以通过charset . defaultcharset()在运行时获得默认的字符集——它来自JVM正在运行的系统。