I'm writing a web application in Google app Engine. It allows people to basically edit html code that gets stored as an .html
file in the blobstore.
我在谷歌app Engine中编写一个web应用程序。它允许人们编辑html代码,将其作为.html文件存储在blobstore中。
I'm using fetchData to return a byte[]
of all the characters in the file. I'm trying to print to an html in order for the user to edit the html code. Everything works great!
我使用fetchData返回文件中所有字符的字节[]。我尝试打印到html,以便用户编辑html代码。一切都太棒了!
Here's my only problem now:
现在我唯一的问题是:
The byte array is having some issues when converting back to a string. Smart quotes and a couple of characters are coming out looking funky. (?'s or japanese symbols etc.) Specifically it's several bytes I'm seeing that have negative values which are causing the problem.
字节数组在转换回字符串时出现了一些问题。聪明的引言和几个人物看起来很时髦。(?具体地说,我看到的是几个字节,它们的值为负值,这就造成了问题。
The smart quotes are coming back as -108
and -109
in the byte array. Why is this and how can I decode the negative bytes to show the correct character encoding?
在字节数组中,智能引号返回为-108和-109。为什么是这样,我如何解码负字节以显示正确的字符编码?
7 个解决方案
#1
141
The byte array contains characters in a special encoding (that you should know). The way to convert it to a String is:
字节数组包含特殊编码中的字符(您应该知道)。将它转换成字符串的方法是:
String decoded = new String(bytes, "UTF-8"); // example for one encoding type
By The Way - the raw bytes appear may appear as negative decimals just because the java datatype byte
is signed, it covers the range from -128 to 127.
顺便说一下,原始字节可能会以小数形式出现,因为java数据类型字节是有符号的,它涵盖了从-128到127的范围。
-109 = 0x93: Control Code "Set Transmit State"
The value (-109) is a non-printable control character in UNICODE. So UTF-8 is not the correct encoding for that character stream.
值(-109)是UNICODE中不可打印的控制字符。因此,UTF-8并不是该字符流的正确编码。
0x93
in "Windows-1252" is the "smart quote" that you're looking for, so the Java name of that encoding is "Cp1252". The next line provides a test code:
“Windows-1252”中的0x93是您要查找的“smart quote”,因此该编码的Java名称是“Cp1252”。下一行提供测试代码:
System.out.println(new String(new byte[]{-109}, "Cp1252"));
#2
25
As of Java 7 you can also pass your desired encoding to the String
constructor as a Charset
constant from StandardCharsets.
从Java 7开始,还可以将所需的编码作为字符集常量从StandardCharsets传递给字符串构造函数。
This may be safer than passing the encoding as a String
, as suggested in the other answers, and you should do it this way if you're using Java 7 or above.
这可能比像其他答案中建议的那样将编码作为字符串传递更安全,如果您正在使用Java 7或更高版本,那么应该这样做。
Example for UTF-8 encoding
utf - 8编码的例子
String bytesAsString = new String(bytes, StandardCharsets.UTF_8);
#3
11
You can try this.
你可以试试这个。
String s = new String(bytearray);
#4
5
public class Main {
/**
* Example method for converting a byte to a String.
*/
public void convertByteToString() {
byte b = 65;
//Using the static toString method of the Byte class
System.out.println(Byte.toString(b));
//Using simple concatenation with an empty String
System.out.println(b + "");
//Creating a byte array and passing it to the String constructor
System.out.println(new String(new byte[] {b}));
}
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
new Main().convertByteToString();
}
}
Output
输出
65
65
A
#5
5
public static String readFile(String fn) throws IOException
{
File f = new File(fn);
byte[] buffer = new byte[(int)f.length()];
FileInputStream is = new FileInputStream(fn);
is.read(buffer);
is.close();
return new String(buffer, "UTF-8"); // use desired encoding
}
#6
4
I suggest Arrays.toString(byte_array);
我建议Arrays.toString(byte_array);
It depends on your purpose. For example, I wanted to save a byte array exactly like the format you can see at time of debug that is something like this : [1, 2, 3]
If you want to save exactly same value without converting the bytes to character format, Arrays.toString (byte_array)
does this,. But if you want to save characters instead of bytes, you should use String s = new String(byte_array)
. In this case, s
is equal to equivalent of [1, 2, 3]
in format of character.
这取决于你的目的。例如,我想要保存一个字节数组,就像调试时可以看到的格式:[1,2,3]如果您想要保存完全相同的值,而不需要将字节转换为字符格式,数组。toString(byte_array)这样做,。但是如果您想保存字符而不是字节,您应该使用String s = new String(byte_array)。在这种情况下,s等于字符格式的[1,2,3]。
#7
3
The previous answer from Andreas_D is good. I'm just going to add that wherever you are displaying the output there will be a font and a character encoding and it may not support some characters.
Andreas_D之前的答案是好的。我要补充的是无论你在哪里显示输出都会有一个字体和字符编码,它可能不支持一些字符。
To work out whether it is Java or your display that is a problem, do this:
要确定问题是Java还是您的显示,请执行以下操作:
for(int i=0;i<str.length();i++) {
char ch = str.charAt(i);
System.out.println(i+" : "+ch+" "+Integer.toHexString(ch)+((ch=='\ufffd') ? " Unknown character" : ""));
}
Java will have mapped any characters it cannot understand to 0xfffd the official character for unknown characters. If you see a '?' in the output, but it is not mapped to 0xfffd, it is your display font or encoding that is the problem, not Java.
Java将把它不能理解的任何字符映射到0xfffd,即未知字符的官方字符。如果你看到a '?在输出中,但它没有映射到0xfffd,而是你的显示字体或编码,这是问题,而不是Java。
#1
141
The byte array contains characters in a special encoding (that you should know). The way to convert it to a String is:
字节数组包含特殊编码中的字符(您应该知道)。将它转换成字符串的方法是:
String decoded = new String(bytes, "UTF-8"); // example for one encoding type
By The Way - the raw bytes appear may appear as negative decimals just because the java datatype byte
is signed, it covers the range from -128 to 127.
顺便说一下,原始字节可能会以小数形式出现,因为java数据类型字节是有符号的,它涵盖了从-128到127的范围。
-109 = 0x93: Control Code "Set Transmit State"
The value (-109) is a non-printable control character in UNICODE. So UTF-8 is not the correct encoding for that character stream.
值(-109)是UNICODE中不可打印的控制字符。因此,UTF-8并不是该字符流的正确编码。
0x93
in "Windows-1252" is the "smart quote" that you're looking for, so the Java name of that encoding is "Cp1252". The next line provides a test code:
“Windows-1252”中的0x93是您要查找的“smart quote”,因此该编码的Java名称是“Cp1252”。下一行提供测试代码:
System.out.println(new String(new byte[]{-109}, "Cp1252"));
#2
25
As of Java 7 you can also pass your desired encoding to the String
constructor as a Charset
constant from StandardCharsets.
从Java 7开始,还可以将所需的编码作为字符集常量从StandardCharsets传递给字符串构造函数。
This may be safer than passing the encoding as a String
, as suggested in the other answers, and you should do it this way if you're using Java 7 or above.
这可能比像其他答案中建议的那样将编码作为字符串传递更安全,如果您正在使用Java 7或更高版本,那么应该这样做。
Example for UTF-8 encoding
utf - 8编码的例子
String bytesAsString = new String(bytes, StandardCharsets.UTF_8);
#3
11
You can try this.
你可以试试这个。
String s = new String(bytearray);
#4
5
public class Main {
/**
* Example method for converting a byte to a String.
*/
public void convertByteToString() {
byte b = 65;
//Using the static toString method of the Byte class
System.out.println(Byte.toString(b));
//Using simple concatenation with an empty String
System.out.println(b + "");
//Creating a byte array and passing it to the String constructor
System.out.println(new String(new byte[] {b}));
}
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
new Main().convertByteToString();
}
}
Output
输出
65
65
A
#5
5
public static String readFile(String fn) throws IOException
{
File f = new File(fn);
byte[] buffer = new byte[(int)f.length()];
FileInputStream is = new FileInputStream(fn);
is.read(buffer);
is.close();
return new String(buffer, "UTF-8"); // use desired encoding
}
#6
4
I suggest Arrays.toString(byte_array);
我建议Arrays.toString(byte_array);
It depends on your purpose. For example, I wanted to save a byte array exactly like the format you can see at time of debug that is something like this : [1, 2, 3]
If you want to save exactly same value without converting the bytes to character format, Arrays.toString (byte_array)
does this,. But if you want to save characters instead of bytes, you should use String s = new String(byte_array)
. In this case, s
is equal to equivalent of [1, 2, 3]
in format of character.
这取决于你的目的。例如,我想要保存一个字节数组,就像调试时可以看到的格式:[1,2,3]如果您想要保存完全相同的值,而不需要将字节转换为字符格式,数组。toString(byte_array)这样做,。但是如果您想保存字符而不是字节,您应该使用String s = new String(byte_array)。在这种情况下,s等于字符格式的[1,2,3]。
#7
3
The previous answer from Andreas_D is good. I'm just going to add that wherever you are displaying the output there will be a font and a character encoding and it may not support some characters.
Andreas_D之前的答案是好的。我要补充的是无论你在哪里显示输出都会有一个字体和字符编码,它可能不支持一些字符。
To work out whether it is Java or your display that is a problem, do this:
要确定问题是Java还是您的显示,请执行以下操作:
for(int i=0;i<str.length();i++) {
char ch = str.charAt(i);
System.out.println(i+" : "+ch+" "+Integer.toHexString(ch)+((ch=='\ufffd') ? " Unknown character" : ""));
}
Java will have mapped any characters it cannot understand to 0xfffd the official character for unknown characters. If you see a '?' in the output, but it is not mapped to 0xfffd, it is your display font or encoding that is the problem, not Java.
Java将把它不能理解的任何字符映射到0xfffd,即未知字符的官方字符。如果你看到a '?在输出中,但它没有映射到0xfffd,而是你的显示字体或编码,这是问题,而不是Java。