Java中的UTF-8到String

时间:2023-01-05 23:17:07

I am having a little problem with the UTF-8 charset. I have a UTF-8 encoded file which I want to load and analyze. I am using BufferedReader to read the file line by line.

我对UTF-8字符集有点问题。我有一个UTF-8编码的文件,我想加载和分析。我正在使用BufferedReader逐行读取文件。

BufferedReader buffReader = new BufferedReader(new InputStreamReader
(new FileInputStream(file),"UTF-8"));

My problem is that the normals String methods (trim() and equals() for example) in Java are not suitable to use with the line read from the BufferReader in every iteration of the loop that I created to read all the content of the BufferedReader. For example, in the encoded file, I have < menu > which I want my program to treat it as it is, however, for now, it is seen as ?? < m e n u > mixed with some others strange characters. I want to know if there is a way to remove all the charset codifications and keep just the plain text so I can use all the methods of the String class without complications. Thank you

我的问题是Java中的法线String方法(例如trim()和equals())不适合在我创建的循环的每次迭代中使用从BufferReader读取的行来读取BufferedReader的所有内容。例如,在编码文件中,我有

,我希望我的程序按原样处理它,但是,现在,它被视为?? 与其他一些奇怪的人物混在一起。我想知道是否有一种方法可以删除所有的字符集编码并保留纯文本,这样我就可以使用String类的所有方法而不会出现复杂情况。谢谢

1 个解决方案

#1


0  

If your jdk is not getting too old (1.5) you can do it like this :

如果你的jdk没有太老(1.5),你可以这样做:

Locale frLocale = new Locale("fr", "FR");
Scanner scanner = new Scanner(new FileInputStream(file), "UTF-8");
scanner.useLocale(frLocale);

for (; scanner.hasNextLine(); numLine++) {
 line = scanner.nextLine();
}

The scanner can also use delimiters other than whitespace. This example reads several items in from a string:

扫描仪还可以使用除空白之外的分隔符。此示例从字符串中读取几个项目:

         String input = "1 fish 2 fish red fish blue fish";
         Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
         System.out.println(s.nextInt());
         System.out.println(s.nextInt());
         System.out.println(s.next());
         System.out.println(s.next());
         s.close(); 

prints the following output:

         1
         2
         red
         blue 

see Doc for Scanner here

请在此处查看Scan for Scanner

#1


0  

If your jdk is not getting too old (1.5) you can do it like this :

如果你的jdk没有太老(1.5),你可以这样做:

Locale frLocale = new Locale("fr", "FR");
Scanner scanner = new Scanner(new FileInputStream(file), "UTF-8");
scanner.useLocale(frLocale);

for (; scanner.hasNextLine(); numLine++) {
 line = scanner.nextLine();
}

The scanner can also use delimiters other than whitespace. This example reads several items in from a string:

扫描仪还可以使用除空白之外的分隔符。此示例从字符串中读取几个项目:

         String input = "1 fish 2 fish red fish blue fish";
         Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
         System.out.println(s.nextInt());
         System.out.println(s.nextInt());
         System.out.println(s.next());
         System.out.println(s.next());
         s.close(); 

prints the following output:

         1
         2
         red
         blue 

see Doc for Scanner here

请在此处查看Scan for Scanner