I have the following code however, I want it to write as a UTF-8 file to handle foreign characters. Is there a way of doing this, is there some need to have a parameter?
但是,我有以下代码,我希望它以UTF-8文件的形式来处理外置字符。有没有这样的方法,是否需要一个参数?
I would really appreciate your help with this. Thanks.
我会很感激你的帮助。谢谢。
try {
BufferedReader reader = new BufferedReader(new FileReader("C:/Users/Jess/My Documents/actresses.list"));
writer = new BufferedWriter(new FileWriter("C:/Users/Jess/My Documents/actressesFormatted.csv"));
while( (line = reader.readLine()) != null) {
//If the line starts with a tab then we just want to add a movie
//using the current actor's name.
if(line.length() == 0)
continue;
else if(line.charAt(0) == '\t') {
readMovieLine2(0, line, surname.toString(), forename.toString());
} //Else we've reached a new actor
else {
readActorName(line);
}
}
} catch (IOException e) {
e.printStackTrace();
}
6 个解决方案
#1
66
Safe Encoding Constructors
Getting Java to properly notify you of encoding errors is tricky. You must use the most verbose and, alas, the least used of the four alternate contructors for each of InputStreamReader
and OutputStreamWriter
to receive a proper exception on an encoding glitch.
让Java正确地通知您编码错误是很棘手的。对于每个InputStreamReader和OutputStreamWriter,您必须使用最详细的和最少的四个替代构造函数来接收编码故障的适当异常。
For file I/O, always make sure to always use as the second argument to both OutputStreamWriter
and InputStreamReader
the fancy encoder argument:
文件I / O,总是确保始终使用第二个参数OutputStreamWriter和InputStreamReader花哨的编码器参数:
Charset.forName("UTF-8").newEncoder()
There are other even fancier possibilities, but none of the three simpler possibilities work for exception handing. These do:
还有其他更奇特的可能性,但这三种更简单的可能性中没有一种适用于异常处理。这些做的事:
OutputStreamWriter char_output = new OutputStreamWriter(
new FileOutputStream("some_output.utf8"),
Charset.forName("UTF-8").newEncoder()
);
InputStreamReader char_input = new InputStreamReader(
new FileInputStream("some_input.utf8"),
Charset.forName("UTF-8").newDecoder()
);
As for running with
至于跑步
$ java -Dfile.encoding=utf8 SomeTrulyRemarkablyLongcLassNameGoeShere
The problem is that that will not use the full encoder argument form for the character streams, and so you will again miss encoding problems.
问题是,它不会使用字符流的完整编码器参数形式,因此您将再次错过编码问题。
Longer Example
Here’s a longer example, this one managing a process instead of a file, where we promote two different input bytes streams and one output byte stream all to UTF-8 character streams with full exception handling:
这里有一个更长的示例,这个示例管理的是一个进程,而不是一个文件,在这个示例中,我们将两个不同的输入字节流和一个输出字节流推广到UTF-8字符流,并进行完全异常处理:
// this runs a perl script with UTF-8 STD{IN,OUT,ERR} streams
Process
slave_process = Runtime.getRuntime().exec("perl -CS script args");
// fetch his stdin byte stream...
OutputStream
__bytes_into_his_stdin = slave_process.getOutputStream();
// and make a character stream with exceptions on encoding errors
OutputStreamWriter
chars_into_his_stdin = new OutputStreamWriter(
__bytes_into_his_stdin,
/* DO NOT OMIT! */ Charset.forName("UTF-8").newEncoder()
);
// fetch his stdout byte stream...
InputStream
__bytes_from_his_stdout = slave_process.getInputStream();
// and make a character stream with exceptions on encoding errors
InputStreamReader
chars_from_his_stdout = new InputStreamReader(
__bytes_from_his_stdout,
/* DO NOT OMIT! */ Charset.forName("UTF-8").newDecoder()
);
// fetch his stderr byte stream...
InputStream
__bytes_from_his_stderr = slave_process.getErrorStream();
// and make a character stream with exceptions on encoding errors
InputStreamReader
chars_from_his_stderr = new InputStreamReader(
__bytes_from_his_stderr,
/* DO NOT OMIT! */ Charset.forName("UTF-8").newDecoder()
);
Now you have three character streams that all raise exception on encoding errors, respectively called chars_into_his_stdin
, chars_from_his_stdout
, and chars_from_his_stderr
.
现在您有三个字符流,它们都在编码错误上引发异常,分别称为chars_into_his_stdin、chars_from_his_stdout和chars_from_his_stderr。
This is only slightly more complicated that what you need for your problem, whose solution I gave in the first half of this answer. The key point is this is the only way to detect encoding errors.
这只是稍微复杂一点,你需要解决你的问题,我在回答的前半部分给出的答案。关键是这是检测编码错误的唯一方法。
Just don’t get me started about PrintStream
s eating exceptions.
只是不要让我开始关于PrintStreams吃异常。
#2
42
You need to use the OutputStreamWriter
class as the writer parameter for your BufferedWriter
. It does accept an encoding. Review javadocs for it.
您需要使用OutputStreamWriter类作为BufferedWriter的writer参数。它接受编码。回顾javadocs。
Somewhat like this:
有些是这样的:
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("jedis.txt"), "UTF-8"
));
Or you can set the current system encoding with the system property file.encoding
to UTF-8.
或者可以使用系统属性文件设置当前系统编码。utf - 8编码。
java -Dfile.encoding=UTF-8 com.jediacademy.Runner arg1 arg2 ...
You may also set it as a system property at runtime with System.setProperty(...)
if you only need it for this specific file, but in a case like this I think I would prefer the OutputStreamWriter
.
您也可以在运行时将它设置为system . setproperty(…)的系统属性,如果您只需要这个特定的文件,但是在这种情况下,我想我更喜欢OutputStreamWriter。
By setting the system property you can use FileWriter
and expect that it will use UTF-8 as the default encoding for your files. In this case for all the files that you read and write.
通过设置system属性,您可以使用FileWriter,并期望它将使用UTF-8作为文件的默认编码。在这种情况下,对于您读和写的所有文件。
EDIT
编辑
-
Starting from API 19, you can replace the String "UTF-8" with
StandardCharsets.UTF_8
从API 19开始,可以用standard字符集. utf_8替换字符串“UTF-8”
-
As suggested in the comments below by tchrist, if you intend to detect encoding errors in your file you would be forced to use the
OutputStreamWriter
approach and use the constructor that receives a charset encoder.正如tchrist在下面的评论中所建议的,如果您打算检测文件中的编码错误,您将*使用OutputStreamWriter方法并使用接收charset编码器的构造函数。
Somewhat like
有点像
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder(); encoder.onMalformedInput(CodingErrorAction.REPORT); encoder.onUnmappableCharacter(CodingErrorAction.REPORT); BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("jedis.txt"),encoder));
You may choose between actions
IGNORE | REPLACE | REPORT
您可以选择忽略|替换|报告的操作
Also, this question was already answered here.
而且,这个问题已经在这里得到了解答。
#3
42
Ditch FileWriter
and FileReader
, which are useless exactly because they do not allow you to specify the encoding. Instead, use
抛弃FileWriter和FileReader,它们完全无用,因为它们不允许您指定编码。相反,使用
new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8)
新OutputStreamWriter(新FileOutputStream(文件),StandardCharsets.UTF_8)
and
和
new InputStreamReader(new FileInputStream(file), StandardCharsets.UTF_8);
新InputStreamReader(新FileInputStream(文件),StandardCharsets.UTF_8);
#4
4
With Chinese text, I tried to use the Charset UTF-16 and lucklily it work.
使用中文文本,我尝试使用Charset UTF-16并幸运地使用它。
Hope this could help!
希望这可以帮助!
PrintWriter out = new PrintWriter( file, "UTF-16" );
#5
1
Since Java 7 there is an easy way to handle character encoding of BufferedWriter and BufferedReaders. You can create a BufferedWriter directly by using the Files class instead of creating various instances of Writer. You can simply create a BufferedWriter, which considers character encoding, by calling:
自Java 7以来,有一种简单的方法来处理BufferedWriter和bufferedreader的字符编码。您可以使用Files类直接创建BufferedWriter,而不是创建Writer的各种实例。您可以通过调用:
Files.newBufferedWriter(file.toPath(), StandardCharsets.UTF_8);
You can find more about it in JavaDoc:
你可以在JavaDoc中找到更多:
- Files class
- 文件类
- Files#newBufferedWriter
- 文件# newBufferedWriter
#6
-2
In my opinion
在我看来
If you wanna write follow kind UTF-8.You should create a byte array.Then,you can do such as the following: byte[] by=("<?xml version=\"1.0\" encoding=\"utf-8\"?>"+"Your string".getBytes();
如果你想写类似UTF-8的东西。您应该创建一个字节数组。然后,可以执行如下操作:byte[] by=(" “+”你的字符串“.getBytes();
Then, you can write each byte into file you created. Example:
然后,您可以将每个字节写入您创建的文件中。例子:
OutputStream f=new FileOutputStream(xmlfile);
byte[] by=("<?xml version=\"1.0\" encoding=\"utf-8\"?>"+"Your string".getBytes();
for (int i=0;i<by.length;i++){
byte b=by[i];
f.write(b);
}
f.close();
#1
66
Safe Encoding Constructors
Getting Java to properly notify you of encoding errors is tricky. You must use the most verbose and, alas, the least used of the four alternate contructors for each of InputStreamReader
and OutputStreamWriter
to receive a proper exception on an encoding glitch.
让Java正确地通知您编码错误是很棘手的。对于每个InputStreamReader和OutputStreamWriter,您必须使用最详细的和最少的四个替代构造函数来接收编码故障的适当异常。
For file I/O, always make sure to always use as the second argument to both OutputStreamWriter
and InputStreamReader
the fancy encoder argument:
文件I / O,总是确保始终使用第二个参数OutputStreamWriter和InputStreamReader花哨的编码器参数:
Charset.forName("UTF-8").newEncoder()
There are other even fancier possibilities, but none of the three simpler possibilities work for exception handing. These do:
还有其他更奇特的可能性,但这三种更简单的可能性中没有一种适用于异常处理。这些做的事:
OutputStreamWriter char_output = new OutputStreamWriter(
new FileOutputStream("some_output.utf8"),
Charset.forName("UTF-8").newEncoder()
);
InputStreamReader char_input = new InputStreamReader(
new FileInputStream("some_input.utf8"),
Charset.forName("UTF-8").newDecoder()
);
As for running with
至于跑步
$ java -Dfile.encoding=utf8 SomeTrulyRemarkablyLongcLassNameGoeShere
The problem is that that will not use the full encoder argument form for the character streams, and so you will again miss encoding problems.
问题是,它不会使用字符流的完整编码器参数形式,因此您将再次错过编码问题。
Longer Example
Here’s a longer example, this one managing a process instead of a file, where we promote two different input bytes streams and one output byte stream all to UTF-8 character streams with full exception handling:
这里有一个更长的示例,这个示例管理的是一个进程,而不是一个文件,在这个示例中,我们将两个不同的输入字节流和一个输出字节流推广到UTF-8字符流,并进行完全异常处理:
// this runs a perl script with UTF-8 STD{IN,OUT,ERR} streams
Process
slave_process = Runtime.getRuntime().exec("perl -CS script args");
// fetch his stdin byte stream...
OutputStream
__bytes_into_his_stdin = slave_process.getOutputStream();
// and make a character stream with exceptions on encoding errors
OutputStreamWriter
chars_into_his_stdin = new OutputStreamWriter(
__bytes_into_his_stdin,
/* DO NOT OMIT! */ Charset.forName("UTF-8").newEncoder()
);
// fetch his stdout byte stream...
InputStream
__bytes_from_his_stdout = slave_process.getInputStream();
// and make a character stream with exceptions on encoding errors
InputStreamReader
chars_from_his_stdout = new InputStreamReader(
__bytes_from_his_stdout,
/* DO NOT OMIT! */ Charset.forName("UTF-8").newDecoder()
);
// fetch his stderr byte stream...
InputStream
__bytes_from_his_stderr = slave_process.getErrorStream();
// and make a character stream with exceptions on encoding errors
InputStreamReader
chars_from_his_stderr = new InputStreamReader(
__bytes_from_his_stderr,
/* DO NOT OMIT! */ Charset.forName("UTF-8").newDecoder()
);
Now you have three character streams that all raise exception on encoding errors, respectively called chars_into_his_stdin
, chars_from_his_stdout
, and chars_from_his_stderr
.
现在您有三个字符流,它们都在编码错误上引发异常,分别称为chars_into_his_stdin、chars_from_his_stdout和chars_from_his_stderr。
This is only slightly more complicated that what you need for your problem, whose solution I gave in the first half of this answer. The key point is this is the only way to detect encoding errors.
这只是稍微复杂一点,你需要解决你的问题,我在回答的前半部分给出的答案。关键是这是检测编码错误的唯一方法。
Just don’t get me started about PrintStream
s eating exceptions.
只是不要让我开始关于PrintStreams吃异常。
#2
42
You need to use the OutputStreamWriter
class as the writer parameter for your BufferedWriter
. It does accept an encoding. Review javadocs for it.
您需要使用OutputStreamWriter类作为BufferedWriter的writer参数。它接受编码。回顾javadocs。
Somewhat like this:
有些是这样的:
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("jedis.txt"), "UTF-8"
));
Or you can set the current system encoding with the system property file.encoding
to UTF-8.
或者可以使用系统属性文件设置当前系统编码。utf - 8编码。
java -Dfile.encoding=UTF-8 com.jediacademy.Runner arg1 arg2 ...
You may also set it as a system property at runtime with System.setProperty(...)
if you only need it for this specific file, but in a case like this I think I would prefer the OutputStreamWriter
.
您也可以在运行时将它设置为system . setproperty(…)的系统属性,如果您只需要这个特定的文件,但是在这种情况下,我想我更喜欢OutputStreamWriter。
By setting the system property you can use FileWriter
and expect that it will use UTF-8 as the default encoding for your files. In this case for all the files that you read and write.
通过设置system属性,您可以使用FileWriter,并期望它将使用UTF-8作为文件的默认编码。在这种情况下,对于您读和写的所有文件。
EDIT
编辑
-
Starting from API 19, you can replace the String "UTF-8" with
StandardCharsets.UTF_8
从API 19开始,可以用standard字符集. utf_8替换字符串“UTF-8”
-
As suggested in the comments below by tchrist, if you intend to detect encoding errors in your file you would be forced to use the
OutputStreamWriter
approach and use the constructor that receives a charset encoder.正如tchrist在下面的评论中所建议的,如果您打算检测文件中的编码错误,您将*使用OutputStreamWriter方法并使用接收charset编码器的构造函数。
Somewhat like
有点像
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder(); encoder.onMalformedInput(CodingErrorAction.REPORT); encoder.onUnmappableCharacter(CodingErrorAction.REPORT); BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("jedis.txt"),encoder));
You may choose between actions
IGNORE | REPLACE | REPORT
您可以选择忽略|替换|报告的操作
Also, this question was already answered here.
而且,这个问题已经在这里得到了解答。
#3
42
Ditch FileWriter
and FileReader
, which are useless exactly because they do not allow you to specify the encoding. Instead, use
抛弃FileWriter和FileReader,它们完全无用,因为它们不允许您指定编码。相反,使用
new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8)
新OutputStreamWriter(新FileOutputStream(文件),StandardCharsets.UTF_8)
and
和
new InputStreamReader(new FileInputStream(file), StandardCharsets.UTF_8);
新InputStreamReader(新FileInputStream(文件),StandardCharsets.UTF_8);
#4
4
With Chinese text, I tried to use the Charset UTF-16 and lucklily it work.
使用中文文本,我尝试使用Charset UTF-16并幸运地使用它。
Hope this could help!
希望这可以帮助!
PrintWriter out = new PrintWriter( file, "UTF-16" );
#5
1
Since Java 7 there is an easy way to handle character encoding of BufferedWriter and BufferedReaders. You can create a BufferedWriter directly by using the Files class instead of creating various instances of Writer. You can simply create a BufferedWriter, which considers character encoding, by calling:
自Java 7以来,有一种简单的方法来处理BufferedWriter和bufferedreader的字符编码。您可以使用Files类直接创建BufferedWriter,而不是创建Writer的各种实例。您可以通过调用:
Files.newBufferedWriter(file.toPath(), StandardCharsets.UTF_8);
You can find more about it in JavaDoc:
你可以在JavaDoc中找到更多:
- Files class
- 文件类
- Files#newBufferedWriter
- 文件# newBufferedWriter
#6
-2
In my opinion
在我看来
If you wanna write follow kind UTF-8.You should create a byte array.Then,you can do such as the following: byte[] by=("<?xml version=\"1.0\" encoding=\"utf-8\"?>"+"Your string".getBytes();
如果你想写类似UTF-8的东西。您应该创建一个字节数组。然后,可以执行如下操作:byte[] by=(" “+”你的字符串“.getBytes();
Then, you can write each byte into file you created. Example:
然后,您可以将每个字节写入您创建的文件中。例子:
OutputStream f=new FileOutputStream(xmlfile);
byte[] by=("<?xml version=\"1.0\" encoding=\"utf-8\"?>"+"Your string".getBytes();
for (int i=0;i<by.length;i++){
byte b=by[i];
f.write(b);
}
f.close();