如何删除java中的中文字符

public class RemoveHZ                            //remove chinese characters
{
    public static String deal(String s){
        StringBuffer sb = new StringBuffer(s);
        StringBuffer se = new StringBuffer();    //store final results
        int l = sb.length();
        char c;
        for(int i=0; i<l; i++){                 
            c = sb.charAt(i);                   //get each char from string
            if(c>40 && c<127){                  //what does this mean?
                se.append(c);
            }
        }
        return new String(se);
    }
    public static void main(String[] args) 
    {
        System.out.println(deal("hello你好啊"));
    }
}

what does the statement "if(c>40 && c<127)" mean?

声明“if(c> 40 && c <127)”是什么意思?

Your help will be appreciated !

我们将不胜感激!

3 个解决方案

#1

Try this:

public class RemoveHZ {
    public static String deal(String s) {
        StringBuffer sb = new StringBuffer(s);
        StringBuffer se = new StringBuffer();    //store final results
        int l = sb.length();
        char c;
        for (int i = 0; i < l; i++) {
            c = sb.charAt(i);                   //get each char from string
            if (Character.UnicodeScript.of(c) != Character.UnicodeScript.HAN) {
                se.append(c);
            }
        }
        return new String(se);
    }

    public static void main(String[] args) {
        System.out.println(deal("hello你好啊"));
    }
}

Another solution would be to use if (!Character.isIdeographic(c)) but that would remove characters from other languages as well.

另一种解决方案是使用if(!Character.isIdeographic(c)),但也会删除其他语言中的字符。

#2

This loops through each character and only appends to the StringBuffer if the character is between 40 and 127 on the ASCII table.

这将循环遍历每个字符,如果字符在ASCII表上的介于40和127之间,则仅附加到StringBuffer。

So, your print statement will only print the following characters:

因此,您的print语句只会打印以下字符:

) * + , - . / 0-9 : ; < = > ? @ A-Z ^ _ ' a-z { | } ~

)* +, - 。 / 0-9:; <=>? @ A-Z ^ _'a-z {| }〜

Note that you're excluding ( and DEL (due to starting at 41 and ending at 126)

请注意,您要排除(和DEL(由于从41开始到126结束)

#3

Every character on your computer has a value. This is because a computer cannot "read" characters like a human can. These values are stored in an ASCII table for example. If you print out c in your code you can see the values.

计算机上的每个角色都有一个值。这是因为计算机不能像人类一样“读取”字符。例如,这些值存储在ASCII表中。如果在代码中打印出c,则可以看到值。

The values of the chinese characters are:

汉字的值是:

20320
22909
21834

If you look at the ASCII table below you can see that the code that you provided filters out all the characters from ) to ~

如果你查看下面的ASCII表,你可以看到你提供的代码过滤掉了〜)的所有字符

#1