将UTF-8 Unicode字符串转换为ASCII Unicode字符串。

时间:2023-01-05 16:52:50

I need to convert unicode string to string which have non-ascii characters encoded in unicode. For example, string "漢字 Max" should be presented as "\u6F22\u5B57 Max".

我需要将unicode字符串转换为字符串,该字符串具有unicode编码的非ascii字符。例如,字符串“漢字马克斯”应视为“\ u6F22 \ u5B57 Max”。

What I have tried:

我已经尝试:

  1. Differenct combinations of

    不同的组合

    new String(sourceString.getBytes(encoding1), encoding2)

    新的字符串(sourceString.getBytes(encoding1)、encoding2)

  2. Apache StringEscapeUtils which escapes also ascii chars like double-quote

    Apache StringEscapeUtils也是ascii字符,比如双引号。

    StringEscapeUtils.escapeJava(source)

    StringEscapeUtils.escapeJava(源)

Is there an easy way to encode such string? Ideally only Java 6 SE or Apache Commons should be used to achieve desired result.

有没有一种简单的方法来编码这样的字符串?理想情况下,应该使用Java 6 SE或Apache Commons来实现预期的结果。

2 个解决方案

#1


5  

This is the kind of simple code Jon Skeet had in mind in his comment:

这是Jon Skeet在他的评论中想到的简单代码:

final String in = "šđčćasdf";
final StringBuilder out = new StringBuilder();
for (int i = 0; i < in.length(); i++) {
  final char ch = in.charAt(i);
  if (ch <= 127) out.append(ch);
  else out.append("\\u").append(String.format("%04x", (int)ch));
}
System.out.println(out.toString());

As Jon said, surrogate pairs will be represented as a pair of \u escapes.

就像Jon说的,代理配对将被表示成一对\u转义。

#2


0  

Guava Escaper Based Solution:

This escapes any non-ASCII characters into Unicode escape sequences.

这将从非ascii字符转成Unicode转义序列。

import static java.lang.String.format;    
import com.google.common.escape.CharEscaper;

public class NonAsciiUnicodeEscaper extends CharEscaper
{
    @Override
    protected char[] escape(final char c)
    {
        if (c >= 32 && c <= 127) { return new char[]{c}; }
        else { return format("\\u%04x", (int) c).toCharArray(); }
    }
}

#1


5  

This is the kind of simple code Jon Skeet had in mind in his comment:

这是Jon Skeet在他的评论中想到的简单代码:

final String in = "šđčćasdf";
final StringBuilder out = new StringBuilder();
for (int i = 0; i < in.length(); i++) {
  final char ch = in.charAt(i);
  if (ch <= 127) out.append(ch);
  else out.append("\\u").append(String.format("%04x", (int)ch));
}
System.out.println(out.toString());

As Jon said, surrogate pairs will be represented as a pair of \u escapes.

就像Jon说的,代理配对将被表示成一对\u转义。

#2


0  

Guava Escaper Based Solution:

This escapes any non-ASCII characters into Unicode escape sequences.

这将从非ascii字符转成Unicode转义序列。

import static java.lang.String.format;    
import com.google.common.escape.CharEscaper;

public class NonAsciiUnicodeEscaper extends CharEscaper
{
    @Override
    protected char[] escape(final char c)
    {
        if (c >= 32 && c <= 127) { return new char[]{c}; }
        else { return format("\\u%04x", (int) c).toCharArray(); }
    }
}