从文本文件中读取的第一个字符:i»[复制]

时间:2021-01-22 15:45:39

This question already has an answer here:

这个问题已经有了答案:

If I write this code, I get this as output --> This first:  and then the other lines

如果我写这段代码,我将它作为输出——>

try {
    BufferedReader br = new BufferedReader(new FileReader(
            "myFile.txt"));

    String line;
    while (line = br.readLine() != null) {
        System.out.println(line);
    }
    br.close();

} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

How can I avoid it?

我怎样才能避免呢?

2 个解决方案

#1


14  

You are getting the characters  on the first line because this sequence is the UTF-8 byte order mark (BOM). If a text file begins with a BOM, it's likely it was generated by a Windows program like Notepad.

您将在第一行获得字符i»,因为这个序列是UTF-8字节顺序标记(BOM)。如果一个文本文件以BOM开头,它很可能是由Windows程序(如记事本)生成的。

To solve your problem, we choose to read the file explicitly as UTF-8, instead of whatever default system character encoding (US-ASCII, etc.):

为了解决您的问题,我们选择将文件显式地读取为UTF-8,而不是任何默认的系统字符编码(US-ASCII等):

BufferedReader in = new BufferedReader(
    new InputStreamReader(
        new FileInputStream("myFile.txt"),
        "UTF-8"));

Then in UTF-8, the byte sequence  decodes to one character, which is U+FEFF. This character is optional - a legal UTF-8 file may or may not begin with it. So we will skip the first character only if it's U+FEFF:

然后在UTF-8中,字节序列i»¿解码为一个字符,即U+FEFF。此字符是可选的——合法的UTF-8文件可以或不可以以它开头。所以我们将跳过第一个字符只有当它是U+FEFF:

in.mark(1);
if (in.read() != 0xFEFF)
  in.reset();

And now you can continue with the rest of your code.

现在您可以继续使用余下的代码。

#2


1  

The problem could be in encoding used. try this:

问题可能是编码使用。试试这个:

BufferedReader in = new BufferedReader(new InputStreamReader(
      new FileInputStream("yourfile"), "UTF-8"));

#1


14  

You are getting the characters  on the first line because this sequence is the UTF-8 byte order mark (BOM). If a text file begins with a BOM, it's likely it was generated by a Windows program like Notepad.

您将在第一行获得字符i»,因为这个序列是UTF-8字节顺序标记(BOM)。如果一个文本文件以BOM开头,它很可能是由Windows程序(如记事本)生成的。

To solve your problem, we choose to read the file explicitly as UTF-8, instead of whatever default system character encoding (US-ASCII, etc.):

为了解决您的问题,我们选择将文件显式地读取为UTF-8,而不是任何默认的系统字符编码(US-ASCII等):

BufferedReader in = new BufferedReader(
    new InputStreamReader(
        new FileInputStream("myFile.txt"),
        "UTF-8"));

Then in UTF-8, the byte sequence  decodes to one character, which is U+FEFF. This character is optional - a legal UTF-8 file may or may not begin with it. So we will skip the first character only if it's U+FEFF:

然后在UTF-8中,字节序列i»¿解码为一个字符,即U+FEFF。此字符是可选的——合法的UTF-8文件可以或不可以以它开头。所以我们将跳过第一个字符只有当它是U+FEFF:

in.mark(1);
if (in.read() != 0xFEFF)
  in.reset();

And now you can continue with the rest of your code.

现在您可以继续使用余下的代码。

#2


1  

The problem could be in encoding used. try this:

问题可能是编码使用。试试这个:

BufferedReader in = new BufferedReader(new InputStreamReader(
      new FileInputStream("yourfile"), "UTF-8"));