The "Incorrect string value error" is raised from MySQLdb.
从MySQLdb引发“不正确的字符串值错误”。
_mysql_exceptions.OperationalError: (1366, "Incorrect string value: '\\xF0\\xA0\
\x84\\x8E\\xE2\\x8B...' for column 'from_url' at row 1")
But I already set both the connection charset and from url encoding to utf8. It works without problem for millions for records previously.
但我已经将连接字符集和url编码设置为utf8。它以前为数百万的记录工作没有问题。
the value which will cause exception: I think the issue is related to the special character u'\U0002010e' (a chinese special character "ㄋ")
将导致异常的值:我认为该问题与特殊字符u'\ U0002010e'(中文特殊字符“ㄋ”)有关
u'http://www.ettoday.net/news/20120227/27879.htm?fb_action_ids=305328666231772&
fb_action_types=og.likes&fb_source=aggregation&fb_aggregation_id=288381481237582
\u7c89\u53ef\u611b\U0002010e\u22ef http://www.ettoday.net/news/20120221/26254.h
tm?fb_action_ids=305330026231636&fb_action_types=og.likes&fb_source=aggregation&
fb_aggregation_id=288381481237582 \u597d\u840c\u53c8\u22ef'
but this character can be encoded as utf8 in python as well.
但是这个字符也可以在python中编码为utf8。
>>> u'\U0002010e'.encode('utf8')
'\xf0\xa0\x84\x8e'
So why MySQL cannot accept this character?
那么为什么MySQL无法接受这个角色呢?
2 个解决方案
#1
3
The character you are using is outside the BMP, therefore it requires 4 bytes to store. Using the utf8
charset is not enough; you must have MySQL 5.5 or greater and use the utf8mb4
charset instead.
您使用的字符在BMP之外,因此需要4个字节来存储。使用utf8字符集是不够的;您必须拥有MySQL 5.5或更高版本并使用utf8mb4字符集。
#2
0
check the charset encoding you have set for mysql and make sure that you are using one that accepts utf8 encodings
检查你为mysql设置的charset编码,并确保你使用的是接受utf8编码的编码
#1
3
The character you are using is outside the BMP, therefore it requires 4 bytes to store. Using the utf8
charset is not enough; you must have MySQL 5.5 or greater and use the utf8mb4
charset instead.
您使用的字符在BMP之外,因此需要4个字节来存储。使用utf8字符集是不够的;您必须拥有MySQL 5.5或更高版本并使用utf8mb4字符集。
#2
0
check the charset encoding you have set for mysql and make sure that you are using one that accepts utf8 encodings
检查你为mysql设置的charset编码,并确保你使用的是接受utf8编码的编码