在 Java 中直接使用Unicode 转码时会按照UTF-16LE 的方式拆分，并加上 BOM。如果采用 UTF-16 拆分，在 Java 中默认采用带有 BOM 的 UTF-16BE 拆分。

 String a ="12dss显示,‘；（）中文只";

        StringBuffer b = new StringBuffer();

        for(int i = 0;i<a.length();i++)

        {

            char t = a.charAt(i);

            String str = String.valueOf(t);

            if(str.getBytes().length ==2)

            {

                b.append(str);

            }

        }

        System.out.println(b);

结果: 显示‘；（）中文只

java：获取字符串中第一个汉字和第一个汉字汉字标点符号的位置？

package tool;

public class CopyCat

{

    public static void main ( String[] args )

    {

        String string = "adf你.？的说法sdf";

        String reg = "[\u4e00-\u9fa5]";

        int index = -1;

        if (string.matches (".*" + reg + ".*"))

        {

            index = string.split (reg)[0].length ();

        }

        System.out.println (index);

        String regex = "[。，！？（）《》……、：——【】；’”‘“]";

        int ind = -1;

        if (string.matches (".*" + regex + ".*"))

        {

            ind = string.split (regex)[0].length ();

        }

        System.out.println (ind);

    }

}

常用汉字的unicode 码范围是：\u4e00-\u9fa5，下面一个例子是把中英文文档中的汉字提取出来的简单例子：

public class DrawEnglish
{
    private static String draw(String content)
    {
        StringBuffer english = new StringBuffer();
        
        String regex = "[\u4e00-\u9fa5。，？”“《》：！——-、]";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(content);
        while(matcher.find())
        {
            String temp = matcher.group();
            english.append(temp);
        }
        return english.toString();
    }
    public static void drawEnglish(String path)
    {
        FileInputStream fr;
        BufferedReader br;
        
        FileWriter fw;
        BufferedWriter bw = null ;
        try
        {
            fr = new FileInputStream(path);
            br = new BufferedReader(new InputStreamReader(fr,"gb2312"));
            fw = new FileWriter("new1.txt");
            bw = new BufferedWriter(fw);
            String str = null;
            StringBuffer sb = new StringBuffer();
            while((str = br.readLine()) != null)
            {
                sb.append(str + "\n");
            }
            String temp = draw(sb.toString()); 
            bw.write(temp);
            
        } catch (FileNotFoundException e)
        {
            e.printStackTrace();
        } catch (IOException e)
        {
            e.printStackTrace();
        }
        finally
        {
            try
            {
                if(bw != null) bw.close();
            } catch (IOException e)
            {
                e.printStackTrace();
            }
        }
    }
    public static void main(String[] args)
    {
        drawEnglish("draw1.txt");
    }
}

秒客网

java - 只输出中文, 包含中文标点符号

java：获取字符串中第一个汉字和第一个汉字汉字标点符号的位置？

相关文章