转自:http://comeonbabye.iteye.com/blog/1467272
//注: 里面的*不是导致问题的特殊字符,因为特殊字符无法显示,总之是某个转成utf8后有4bytes的字符
背景:
数据库编码,建表编码,Content字段编码都设置为utf8,collation是默认的utf8_default(也尝试过修改为其他的,未果,似乎不是collation的问题)
mysql> status; -------------- mysql Ver 14.14 Distrib 5.1.49, for debian-linux-gnu (i686) using readline 6.1 Connection id: 1402357 Current database: ** Current user: ** SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 5.0.38 Debian etch distribution Protocol version: 10 Connection: ** via TCP/IP Server characterset: gbk Db characterset: utf8 Client characterset: utf8 Conn. characterset: utf8 TCP port: 4307 Uptime: 187 days 22 hours 51 min 18 sec --------------
现象:
插入的数据中如果含有某些特殊字符,会导致插入数据失败,例如字符串”测试*插入数据”,在console中insert是正常的,但是使用java代码insert的时候报错:
// 输出很长,重点就一行 2012-02-06 14:44:43,741 ERROR BlaBlaServiceImpl:110 - insertOrUpdateBlaBla failed! --- Cause: java.sql.SQLException: Incorrect string value: ' \xF0\x9F\x 92\x90</...' for column ……
问题的可能原因:(未证实)
mysql中规定utf8字符MaxLen=3,但是某些unicode字符转成utf8编码之后有4个字节,于是就杯具了
String c = "*" ; // *代表某个utf8编码后有4个byte的字符 byte[] bytes = c.getBytes("utf8"); for(byte b : bytes){ System.out.print(Integer.toHexString(0x00FF & b)+" "); }// 输出 f0 9f 8d 8e // mysql> show character set; +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | +----------+-----------------------------+---------------------+--------+
解决方案:
修改Content字段为MEDIUMBLOB(原来是MEDIUMTEXT),并且把SELECT语句修改成
SELECT CAST(Content AS CHAR CHARACTER SET utf8) AS Content ....
INSERT语句不需要修改,测试ok