java.sql.SQLException: Incorrect string value

时间:2020-12-24 22:51:38

转自:http://comeonbabye.iteye.com/blog/1467272


//注: 里面的*不是导致问题的特殊字符,因为特殊字符无法显示,总之是某个转成utf8后有4bytes的字符

背景: 
数据库编码,建表编码,Content字段编码都设置为utf8,collation是默认的utf8_default(也尝试过修改为其他的,未果,似乎不是collation的问题)

mysql> status;
--------------
mysql  Ver 14.14 Distrib 5.1.49, for debian-linux-gnu (i686) using readline 6.1

Connection id:		1402357
Current database:	**
Current user:		**
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.0.38 Debian etch distribution
Protocol version:	10
Connection:		** via TCP/IP
Server characterset:	gbk
Db     characterset:	utf8
Client characterset:	utf8
Conn.  characterset:	utf8
TCP port:		4307
Uptime:			187 days 22 hours 51 min 18 sec
--------------

现象: 
插入的数据中如果含有某些特殊字符,会导致插入数据失败,例如字符串”测试*插入数据”,在console中insert是正常的,但是使用java代码insert的时候报错:

// 输出很长,重点就一行
2012-02-06 14:44:43,741 ERROR BlaBlaServiceImpl:110 - insertOrUpdateBlaBla failed!
--- Cause: java.sql.SQLException: Incorrect string value: '
\xF0\x9F\x
92\x90</...' for column ……

问题的可能原因:(未证实) 
mysql中规定utf8字符MaxLen=3,但是某些unicode字符转成utf8编码之后有4个字节,于是就杯具了

String c = "*"  ; // *代表某个utf8编码后有4个byte的字符
byte[] bytes = c.getBytes("utf8");
for(byte b : bytes){
    System.out.print(Integer.toHexString(0x00FF & b)+" ");
}// 输出 f0 9f 8d 8e
//
mysql> show character set;
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| utf8     | UTF-8 Unicode               | utf8_general_ci     |      3 |
+----------+-----------------------------+---------------------+--------+

解决方案: 
修改Content字段为MEDIUMBLOB(原来是MEDIUMTEXT),并且把SELECT语句修改成

SELECT CAST(Content AS CHAR CHARACTER SET utf8) AS Content ....

INSERT语句不需要修改,测试ok