解码android中utf-8格式编码的字符串

时间:2023-01-04 20:53:27

I have a string which comes via an xml , and it is text in German. The characters that are German specific are encoded via the UTF-8 format. Before display the string I need to decode it.

我有一个通过xml来的字符串,它是德语的文本。特定于德语的字符通过UTF-8格式进行编码。在显示字符串之前,我需要解码它。

I have tried the following:

我试过以下方法:

try {
    BufferedReader in = new BufferedReader(
            new InputStreamReader(
                    new ByteArrayInputStream(nodevalue.getBytes()), "UTF8"));
    event.attributes.put("title", in.readLine());
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

I have also tried this:

我也尝试过:

try {
    event.attributes.put("title", URLDecoder.decode(nodevalue, "UTF-8"));
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

None of them are working. How do I decode the German string

他们都不工作。如何解码德国弦

thank you in advance.

提前谢谢你。

UDPDATE:

UDPDATE:

@Override
public void characters(char[] ch, int start, int length)
        throws SAXException {
    // TODO Auto-generated method stub
    super.characters(ch, start, length);
    if (nodename != null) {
        String nodevalue = String.copyValueOf(ch, 0, length);
        if (nodename.equals("startdat")) {
            if (event.attributes.get("eventid").equals("187")) {
            }
        }
        if (nodename.equals("startscreen")) {
            imageaddress = nodevalue;
        }
        else {
            if (nodename.equals("title")) {
                // try {
                // BufferedReader in = new BufferedReader(
                // new InputStreamReader(
                // new ByteArrayInputStream(nodevalue.getBytes()), "UTF8"));
                // event.attributes.put("title", in.readLine());
                // } catch (UnsupportedEncodingException e) {
                // // TODO Auto-generated catch block
                // e.printStackTrace();
                // } catch (IOException e) {
                // // TODO Auto-generated catch block
                // e.printStackTrace();
                // }
                // try {
                // event.attributes.put("title",
                // URLDecoder.decode(nodevalue, "UTF-8"));
                // } catch (UnsupportedEncodingException e) {
                // // TODO Auto-generated catch block
                // e.printStackTrace();
                // }
                event.attributes.put("title", StringEscapeUtils
                        .unescapeHtml(new String(ch, start, length).trim()));
            } else
                event.attributes.put(nodename, nodevalue);
        }
    }
}

1 个解决方案

#1


20  

You could use the String constructor with the charset parameter:

您可以使用带有charset参数的String构造函数:

try
{
    final String s = new String(nodevalue.getBytes(), "UTF-8");
}
catch (UnsupportedEncodingException e)
{
    Log.e("utf8", "conversion", e);
}

Also, since you get the data from an xml document, and I assume it is encoded UTF-8, probably the problem is in parsing it.

而且,由于您从xml文档中获得数据,并且我假设它是编码的UTF-8,所以可能问题在于解析它。

You should use InputStream/InputSource instead of a XMLReader implementation, because it comes with the encoding. So if you're getting this data from a http response, you could either use both InputStream and InputSource

您应该使用InputStream/InputSource而不是XMLReader实现,因为它附带了编码。如果你从http响应中获取数据,你可以同时使用InputStream和InputSource

try
{
    HttpEntity entity = response.getEntity();
    final InputStream in = entity.getContent();
    final SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    final XmlHandler handler = new XmlHandler();
    Reader reader = new InputStreamReader(in, "UTF-8");
    InputSource is = new InputSource(reader);
    is.setEncoding("UTF-8");
    parser.parse(is, handler);
    //TODO: get the data from your handler
}
catch (final Exception e)
{
    Log.e("ParseError", "Error parsing xml", e);
}

or just the InputStream:

或仅仅是InputStream:

try
{
    HttpEntity entity = response.getEntity();
    final InputStream in = entity.getContent();
    final SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    final XmlHandler handler = new XmlHandler();
    parser.parse(in, handler);
    //TODO: get the data from your handler
}
catch (final Exception e)
{
    Log.e("ParseError", "Error parsing xml", e);
}

Update 1

更新1

Here is a sample of a complete request and response handling:

以下是一个完整的请求和响应处理示例:

try
{
    final DefaultHttpClient client = new DefaultHttpClient();
    final HttpPost httppost = new HttpPost("http://example.location.com/myxml");
    final HttpResponse response = client.execute(httppost);
    final HttpEntity entity = response.getEntity();

    final InputStream in = entity.getContent();
    final SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    final XmlHandler handler = new XmlHandler();
    parser.parse(in, handler);
    //TODO: get the data from your handler
}
catch (final Exception e)
{
    Log.e("ParseError", "Error parsing xml", e);
}

Update 2

更新2

As the problem is not the encoding but the source xml being escaped to html entities, the best solution is (besides correcting the php to do not escape the response), to use the apache.commons.lang library's very handy static StringEscapeUtils class.

由于问题不在于编码,而在于将源xml转义到html实体,因此最好的解决方案是(除了纠正php以避免转义响应之外)使用apache.common .lang库的非常方便的静态StringEscapeUtils类。

After importing the library, in your xml handler's characters method you put the following:

在导入库之后,在xml处理程序的字符方法中,您输入以下内容:

@Override
public void characters(final char[] ch, final int start, final int length) 
    throws SAXException
{
    // This variable will hold the correct unescaped value
    final String elementValue = StringEscapeUtils.
        unescapeHtml(new String(ch, start, length).trim());
    [...]
}

Update 3

更新3

In your last code the problem is with the initialization of the nodevalue variable. It should be:

在最后一个代码中,问题是在nodevalue变量的初始化。应该是:

String nodevalue = StringEscapeUtils.unescapeHtml(
    new String(ch, start, length).trim());

#1


20  

You could use the String constructor with the charset parameter:

您可以使用带有charset参数的String构造函数:

try
{
    final String s = new String(nodevalue.getBytes(), "UTF-8");
}
catch (UnsupportedEncodingException e)
{
    Log.e("utf8", "conversion", e);
}

Also, since you get the data from an xml document, and I assume it is encoded UTF-8, probably the problem is in parsing it.

而且,由于您从xml文档中获得数据,并且我假设它是编码的UTF-8,所以可能问题在于解析它。

You should use InputStream/InputSource instead of a XMLReader implementation, because it comes with the encoding. So if you're getting this data from a http response, you could either use both InputStream and InputSource

您应该使用InputStream/InputSource而不是XMLReader实现,因为它附带了编码。如果你从http响应中获取数据,你可以同时使用InputStream和InputSource

try
{
    HttpEntity entity = response.getEntity();
    final InputStream in = entity.getContent();
    final SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    final XmlHandler handler = new XmlHandler();
    Reader reader = new InputStreamReader(in, "UTF-8");
    InputSource is = new InputSource(reader);
    is.setEncoding("UTF-8");
    parser.parse(is, handler);
    //TODO: get the data from your handler
}
catch (final Exception e)
{
    Log.e("ParseError", "Error parsing xml", e);
}

or just the InputStream:

或仅仅是InputStream:

try
{
    HttpEntity entity = response.getEntity();
    final InputStream in = entity.getContent();
    final SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    final XmlHandler handler = new XmlHandler();
    parser.parse(in, handler);
    //TODO: get the data from your handler
}
catch (final Exception e)
{
    Log.e("ParseError", "Error parsing xml", e);
}

Update 1

更新1

Here is a sample of a complete request and response handling:

以下是一个完整的请求和响应处理示例:

try
{
    final DefaultHttpClient client = new DefaultHttpClient();
    final HttpPost httppost = new HttpPost("http://example.location.com/myxml");
    final HttpResponse response = client.execute(httppost);
    final HttpEntity entity = response.getEntity();

    final InputStream in = entity.getContent();
    final SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    final XmlHandler handler = new XmlHandler();
    parser.parse(in, handler);
    //TODO: get the data from your handler
}
catch (final Exception e)
{
    Log.e("ParseError", "Error parsing xml", e);
}

Update 2

更新2

As the problem is not the encoding but the source xml being escaped to html entities, the best solution is (besides correcting the php to do not escape the response), to use the apache.commons.lang library's very handy static StringEscapeUtils class.

由于问题不在于编码,而在于将源xml转义到html实体,因此最好的解决方案是(除了纠正php以避免转义响应之外)使用apache.common .lang库的非常方便的静态StringEscapeUtils类。

After importing the library, in your xml handler's characters method you put the following:

在导入库之后,在xml处理程序的字符方法中,您输入以下内容:

@Override
public void characters(final char[] ch, final int start, final int length) 
    throws SAXException
{
    // This variable will hold the correct unescaped value
    final String elementValue = StringEscapeUtils.
        unescapeHtml(new String(ch, start, length).trim());
    [...]
}

Update 3

更新3

In your last code the problem is with the initialization of the nodevalue variable. It should be:

在最后一个代码中,问题是在nodevalue变量的初始化。应该是:

String nodevalue = StringEscapeUtils.unescapeHtml(
    new String(ch, start, length).trim());