将字节数组转换为字符串(Java)

I'm writing a web application in Google app Engine. It allows people to basically edit html code that gets stored as an .html file in the blobstore.

我在谷歌app Engine中编写一个web应用程序。它允许人们编辑html代码，将其作为.html文件存储在blobstore中。

I'm using fetchData to return a byte[] of all the characters in the file. I'm trying to print to an html in order for the user to edit the html code. Everything works great!

我使用fetchData返回文件中所有字符的字节[]。我尝试打印到html，以便用户编辑html代码。一切都太棒了!

Here's my only problem now:

现在我唯一的问题是:

The byte array is having some issues when converting back to a string. Smart quotes and a couple of characters are coming out looking funky. (?'s or japanese symbols etc.) Specifically it's several bytes I'm seeing that have negative values which are causing the problem.

字节数组在转换回字符串时出现了一些问题。聪明的引言和几个人物看起来很时髦。(?具体地说，我看到的是几个字节，它们的值为负值，这就造成了问题。

The smart quotes are coming back as -108 and -109 in the byte array. Why is this and how can I decode the negative bytes to show the correct character encoding?

在字节数组中，智能引号返回为-108和-109。为什么是这样，我如何解码负字节以显示正确的字符编码?

7 个解决方案

#1

141

The byte array contains characters in a special encoding (that you should know). The way to convert it to a String is:

字节数组包含特殊编码中的字符(您应该知道)。将它转换成字符串的方法是:

String decoded = new String(bytes, "UTF-8");  // example for one encoding type

By The Way - the raw bytes appear may appear as negative decimals just because the java datatype byte is signed, it covers the range from -128 to 127.

顺便说一下，原始字节可能会以小数形式出现，因为java数据类型字节是有符号的，它涵盖了从-128到127的范围。

-109 = 0x93: Control Code "Set Transmit State"

The value (-109) is a non-printable control character in UNICODE. So UTF-8 is not the correct encoding for that character stream.

值(-109)是UNICODE中不可打印的控制字符。因此，UTF-8并不是该字符流的正确编码。

0x93 in "Windows-1252" is the "smart quote" that you're looking for, so the Java name of that encoding is "Cp1252". The next line provides a test code:

“Windows-1252”中的0x93是您要查找的“smart quote”，因此该编码的Java名称是“Cp1252”。下一行提供测试代码:

System.out.println(new String(new byte[]{-109}, "Cp1252"));

#2

As of Java 7 you can also pass your desired encoding to the String constructor as a Charset constant from StandardCharsets.

从Java 7开始，还可以将所需的编码作为字符集常量从StandardCharsets传递给字符串构造函数。

This may be safer than passing the encoding as a String, as suggested in the other answers, and you should do it this way if you're using Java 7 or above.

这可能比像其他答案中建议的那样将编码作为字符串传递更安全，如果您正在使用Java 7或更高版本，那么应该这样做。

Example for UTF-8 encoding

utf - 8编码的例子

String bytesAsString = new String(bytes, StandardCharsets.UTF_8);

#3

You can try this.

你可以试试这个。

String s = new String(bytearray);

#4

public class Main {

    /**
     * Example method for converting a byte to a String.
     */
    public void convertByteToString() {

        byte b = 65;

        //Using the static toString method of the Byte class
        System.out.println(Byte.toString(b));

        //Using simple concatenation with an empty String
        System.out.println(b + "");

        //Creating a byte array and passing it to the String constructor
        System.out.println(new String(new byte[] {b}));

    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        new Main().convertByteToString();
    }
}

Output

输出

65
65
A

#5

public static String readFile(String fn)   throws IOException 
{
    File f = new File(fn);

    byte[] buffer = new byte[(int)f.length()];
    FileInputStream is = new FileInputStream(fn);
    is.read(buffer);
    is.close();

    return  new String(buffer, "UTF-8"); // use desired encoding
}

#6

I suggest Arrays.toString(byte_array);

我建议Arrays.toString(byte_array);

It depends on your purpose. For example, I wanted to save a byte array exactly like the format you can see at time of debug that is something like this : [1, 2, 3] If you want to save exactly same value without converting the bytes to character format, Arrays.toString (byte_array) does this,. But if you want to save characters instead of bytes, you should use String s = new String(byte_array). In this case, s is equal to equivalent of [1, 2, 3] in format of character.

这取决于你的目的。例如，我想要保存一个字节数组，就像调试时可以看到的格式:[1,2,3]如果您想要保存完全相同的值，而不需要将字节转换为字符格式，数组。toString(byte_array)这样做,。但是如果您想保存字符而不是字节，您应该使用String s = new String(byte_array)。在这种情况下，s等于字符格式的[1,2,3]。

#7

The previous answer from Andreas_D is good. I'm just going to add that wherever you are displaying the output there will be a font and a character encoding and it may not support some characters.

Andreas_D之前的答案是好的。我要补充的是无论你在哪里显示输出都会有一个字体和字符编码，它可能不支持一些字符。

To work out whether it is Java or your display that is a problem, do this:

要确定问题是Java还是您的显示，请执行以下操作:

    for(int i=0;i<str.length();i++) {
        char ch = str.charAt(i);
        System.out.println(i+" : "+ch+" "+Integer.toHexString(ch)+((ch=='\ufffd') ? " Unknown character" : ""));
    }

Java will have mapped any characters it cannot understand to 0xfffd the official character for unknown characters. If you see a '?' in the output, but it is not mapped to 0xfffd, it is your display font or encoding that is the problem, not Java.

Java将把它不能理解的任何字符映射到0xfffd，即未知字符的官方字符。如果你看到a '?在输出中，但它没有映射到0xfffd，而是你的显示字体或编码，这是问题，而不是Java。

#1

141