java按字节截取字符串

时间:2023-01-12 11:00:31
public class SubByteString {

    private static String subStringByByte(String str, int len) {
        String result = null;
        if (str != null) {
            byte[] a = str.getBytes();
            if (a.length <= len) {
                result = str;
            } else if (len > 0) {
                result = new String(a, 0, len);
                int length = result.length();
                if (str.charAt(length - 1) != result.charAt(length - 1)) {
                    if (length < 2) {
                        result = null;
                    } else {
                        result = result.substring(0, length - 1);
                    }
                }
            }
        }
        return result;
    }

    /**
     * @param args
     */
    public static void main(String[] args) {
        String str1="一百二十个字符怎么就那么难弄呢我该说些啥呢算了还是先扯扯把哎还不到120个字啊让我怎么测试asdfghjklqwe哈rtuo";
        byte[] a = str1.getBytes();
        String str2 = subStringByByte(str1,100);
        System.out.println("--str1.length="+str1.length()+"----Byte长度="+a.length+"-------str2="+str2+"------");

    }

}

解析:上面这个方法将汉字默认为2个字节,其他为1个字节,缺点是遇到UTF-8等编码格式的时候不能用,经过代码验证"UTF-8"是默认一个汉字占3个字节。

结果:--str1.length=62----Byte长度=105-------str2=一百二十个字符怎么就那么难弄呢我该说些啥呢算了还是先扯扯把哎还不到120个字啊让我怎么测试asdfghjklqwe------

 

public static String getSubString(String targetString, int byteIndex)
            throws Exception {
        if (targetString.getBytes("UTF-8").length < byteIndex) {
            throw new Exception("超过长度");
        }
        String temp = targetString;
        for (int i = 0; i < targetString.length(); i++) {
            if (temp.getBytes("UTF-8").length <= byteIndex) {
                break;
            }
            temp = temp.substring(0, temp.length() - 1);
        }
        return temp;
    }

解析:可根据想要的编码方式进行截取字符串,UTF-8编码下汉字占3个字节。可以根据需要改为gbk方式等等,很方便的。推荐!