1.一般来说MySQL(小于5.5.3)字符集设置为utf8,指定连接的字符集也为utf8,django中save unicode string是木有问题的。但是,当字符串中有特殊字符(如emoji表情符号,以及其他凡是转成utf8要占用4字节的字符),就会有问题,会报错Incorrect string value: '\xF0\x9F\x92\x90</...' for column 'xxx' at row 1
大家都知道Unicode是一个标准,utf8是unicode一个实现方式, 某些Unicode字符转成utf8可能4字节,而在MySQl5.5.3之前,utf8最长只有3字节。
mysql> show character set;
+------------+----------------------------+------------------------+----------+
| Charset | Description | Default collation | Maxlen |
+------------+----------------------------+------------------------+----------+
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
+------------+----------------------------+------------------------+----------+
所以呢,这个需要4字节才能表示的Unicode字符会被截断,存不进去。
2. 低版本Mysql<5.5.3貌似没啥好办法,把字段类型改为 MEDIUMBLOB , 其他啥都不用改(继续保持数据库字符集和连接字符集都是utf8),问题解决。见下图,
<code><span class="pln">mysql</span><span class="pun">></span><span class="pln"> show variables like </span><span class="str">'char%'</span><span class="pun">;</span><span class="pln"> </span><span class="pun">+--------------------------+----------------------------+</span><span class="pln"> </span><span class="pun">|</span><span class="pln"> </span><span class="typ">Variable_name</span><span class="pln"> </span><span class="pun">|</span><span class="pln"> </span><span class="typ">Value</span><span class="pln"> </span><span class="pun">|</span><span class="pln"> </span><span class="pun">+--------------------------+----------------------------+</span><span class="pln"> </span><span class="pun">|</span><span class="pln"> character_set_client </span><span class="pun">|</span><span class="pln"> utf8 </span><span class="pun">|</span><span class="pln"> </span><span class="pun">|</span><span class="pln"> character_set_connection </span><span class="pun">|</span><span class="pln"> utf8 </span><span class="pun">|</span><span class="pln"> </span><span class="pun">|</span><span class="pln"> character_set_database </span><span class="pun">|</span><span class="pln"> utf8 </span><span class="pun">|</span><span class="pln"> </span><span class="pun">|</span><span class="pln"> character_set_filesystem </span><span class="pun">|</span><span class="pln"> binary </span><span class="pun">|</span><span class="pln"> </span><span class="pun">|</span><span class="pln"> character_set_results </span><span class="pun">|</span><span class="pln"> utf8 </span><span class="pun">|</span><span class="pln"> </span><span class="pun">|</span><span class="pln"> character_set_server </span><span class="pun">|</span><span class="pln"> utf8 </span><span class="pun">|</span><span class="pln"> </span><span class="pun">|</span><span class="pln"> character_set_system </span><span class="pun">|</span><span class="pln"> utf8 </span><span class="pun">|</span><span class="pln"> </span><span class="pun">|</span><span class="pln"> character_sets_dir </span><span class="pun">|</span><span class="pln"> </span><span class="str">/usr/</span><span class="pln">share</span><span class="pun">/</span><span class="pln">mysql</span><span class="pun">/</span><span class="pln">charsets</span><span class="pun">/</span><span class="pln"> </span><span class="pun">|</span><span class="pln"> </span><span class="pun">+--------------------------+----------------------------+</span></code>这个状态下 MEDIUMBLOB 就能搞定。
3.MySQl>=5.5.3,则可以不用像上面那么做。
3.1 修改mysql配置文件,设置默认字符集utf8mb4, 包括collation
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'
3.2 重启,确认上述配置生效
mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+-------------------------------------+------------------------------+
| Variable_name | Value |
+-------------------------------------+------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+-------------------------------------+-------------------------------+
其他不用改,都用 utf8mb4 , django中任意Unicode字符都能存入MySQL。
思路:判断你的MySQL utf8最大长度是不是4,
如果不是,支不支持utf8mb4,
如果不支持,升级 or MEDIUMBLOB
其实这个问题,网上已经太多了,没啥好写的,记一笔,仅为个人成长记录。
吃水不忘挖井人,修bug时参考了这两篇文章。
http://vivisidea.iteye.com/blog/1395571
http://www.linuxidc.com/Linux/2013-05/84360.htm
转载:http://blog.csdn.net/secretx/article/details/21253559