I need to convert text file to the String, which, finally, I should put as an input parameter (type InputStream) to IFile.create (Eclipse). Looking for the example or how to do that but still can not figure out...need your help!
我需要将文本文件转换为字符串,最后,我应该将该字符串作为输入参数(类型为InputStream)转换为IFile。创建(Eclipse)。找个例子或者怎么做,但还是搞不清楚……需要你的帮助!
just for testing, I did try to convert original text file to UTF-8 encoded with this code
为了进行测试,我尝试将原始文本文件转换为使用此代码编码的UTF-8
FileInputStream fis = new FileInputStream(FilePath);
InputStreamReader isr = new InputStreamReader(fis);
Reader in = new BufferedReader(isr);
StringBuffer buffer = new StringBuffer();
int ch;
while ((ch = in.read()) > -1) {
buffer.append((char)ch);
}
in.close();
FileOutputStream fos = new FileOutputStream(FilePath+".test.txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(buffer.toString());
out.close();
but even thought the final *.test.txt file has UTF-8 encoding, the characters inside are corrupted.
但即使是最后的测试。txt文件有UTF-8编码,里面的字符被损坏。
1 个解决方案
#1
9
You need to specify the encoding of the InputStreamReader
using the Charset
parameter.
您需要使用Charset参数指定InputStreamReader的编码。
// ↓ whatever the input's encoding is
Charset inputCharset = Charset.forName("ISO-8859-1");
InputStreamReader isr = new InputStreamReader(fis, inputCharset));
This also works:
这同样适用:
InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1"));
See also:
参见:
InputStreamReader(InputStream in, Charset cs)
- InputStreamReader(InputStream,字符集cs)
Charset.forName(String charsetName)
- 字符集。forName(字符串charsetName)
- Java: How to determine the correct charset encoding of a stream
- Java:如何确定流的正确字符集编码
- How to reliably guess the encoding between MacRoman, CP1252, Latin1, UTF-8, and ASCII
- 如何可靠地猜测MacRoman、CP1252、Latin1、UTF-8和ASCII之间的编码?
- GuessEncoding - only works for UTF-8, UTF-16LE, UTF-16BE, and UTF-32 ☹
- GuessEncoding——只适用于utf - 8,UTF-16LE UTF-16BE,utf - 32☹
- ICU Charset Detector
- ICU字符集探测器
- cpdetector, free java codepage detection
- cpdetector,免费的java代码页检测
- JCharDet (Java port of Mozilla charset detector) ironically, that page does not render the apostrophe in "Mozilla's" correctly
- 具有讽刺意味的是,该页面没有正确地呈现“Mozilla's”中的撇号
SO search where I found all these links: https://*.com/search?q=java+detect+encoding
搜索我找到的所有这些链接:https://*.com/search?
You can get the default charset - which is comes from the system the JVM is running on - at runtime via Charset.defaultCharset()
.
您可以通过charset . defaultcharset()在运行时获得默认的字符集——它来自JVM正在运行的系统。
#1
9
You need to specify the encoding of the InputStreamReader
using the Charset
parameter.
您需要使用Charset参数指定InputStreamReader的编码。
// ↓ whatever the input's encoding is
Charset inputCharset = Charset.forName("ISO-8859-1");
InputStreamReader isr = new InputStreamReader(fis, inputCharset));
This also works:
这同样适用:
InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1"));
See also:
参见:
InputStreamReader(InputStream in, Charset cs)
- InputStreamReader(InputStream,字符集cs)
Charset.forName(String charsetName)
- 字符集。forName(字符串charsetName)
- Java: How to determine the correct charset encoding of a stream
- Java:如何确定流的正确字符集编码
- How to reliably guess the encoding between MacRoman, CP1252, Latin1, UTF-8, and ASCII
- 如何可靠地猜测MacRoman、CP1252、Latin1、UTF-8和ASCII之间的编码?
- GuessEncoding - only works for UTF-8, UTF-16LE, UTF-16BE, and UTF-32 ☹
- GuessEncoding——只适用于utf - 8,UTF-16LE UTF-16BE,utf - 32☹
- ICU Charset Detector
- ICU字符集探测器
- cpdetector, free java codepage detection
- cpdetector,免费的java代码页检测
- JCharDet (Java port of Mozilla charset detector) ironically, that page does not render the apostrophe in "Mozilla's" correctly
- 具有讽刺意味的是,该页面没有正确地呈现“Mozilla's”中的撇号
SO search where I found all these links: https://*.com/search?q=java+detect+encoding
搜索我找到的所有这些链接:https://*.com/search?
You can get the default charset - which is comes from the system the JVM is running on - at runtime via Charset.defaultCharset()
.
您可以通过charset . defaultcharset()在运行时获得默认的字符集——它来自JVM正在运行的系统。