I am getting html source from Aozora Bunko. Html file is Shift-JIS encoded. I am trying to get book title and author. Then I want to record title and author into SQLite(UTF-8) database.
我正在从Aozora Bunko获得html源代码。Html文件是移位jis编码。我正在努力获得书名和作者。然后将标题和作者记录到SQLite(UTF-8)数据库中。
String[] splittedResult = result.split("\"title\">");
splittedResult = splittedResult[1].split("</h1>");
String title = splittedResult[0];
byte[] b = null;
try {
b = title.getBytes("Shift_JIS");
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String value=null;
try {
value = new String(b, "UTF-8");
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
...
myDatabase.addBookInformation(value, author);
Result is like this: latin letters are showing normally. But japanese letters are shown by blocks question mark inside (please do not pay attention to null values)
结果是这样的:拉丁字母显示正常。但是日文字母在里面以块问号表示(请不要注意空值)
How to solve this problem?
如何解决这个问题?
1 个解决方案
#1
1
As @Codo pointed out, solution for this problem was before. I changed this
正如@Codo所指出的,这个问题的解决方案以前就有。我改变了这个
s = EntityUtils.toString(response.getEntity(), "UTF-8");
to this
这个
s = EntityUtils.toString(response.getEntity(), "Shift_JIS");
And now there is no need for encoding.
现在不需要编码了。
String[] splittedResult = result.split("\"title\">");
splittedResult = splittedResult[1].split("</h1>");
String title = splittedResult[0];
/** I HAVE TAKEN THIS PART OF MY CODE
byte[] b = null;
try {
b = title.getBytes("Shift_JIS");
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String value=null;
try {
value = new String(b, "UTF-8");
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
**/
#1
1
As @Codo pointed out, solution for this problem was before. I changed this
正如@Codo所指出的,这个问题的解决方案以前就有。我改变了这个
s = EntityUtils.toString(response.getEntity(), "UTF-8");
to this
这个
s = EntityUtils.toString(response.getEntity(), "Shift_JIS");
And now there is no need for encoding.
现在不需要编码了。
String[] splittedResult = result.split("\"title\">");
splittedResult = splittedResult[1].split("</h1>");
String title = splittedResult[0];
/** I HAVE TAKEN THIS PART OF MY CODE
byte[] b = null;
try {
b = title.getBytes("Shift_JIS");
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String value=null;
try {
value = new String(b, "UTF-8");
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
**/