Possible Duplicate:
How to filter (or replace) unicode characters that would take more than 3 bytes in UTF-8?可能重复:如何过滤(或替换)将在UTF-8中占用3字节以上的unicode字符?
Background:
背景:
I am using Django with MySQL 5.1 and I am having trouble with 4-byte UTF-8 characters causing fatal errors throughout my web application.
我正在使用Django和MySQL 5.1,我在使用4字节的UTF-8字符时遇到了问题,这在我的web应用程序中导致了致命的错误。
I've used a script to convert all tables and columns in my database to UTF-8 which has fixed most unicode issues, but there is still an issue with 4-byte unicode characters. As noted elsewhere, MySQL 5.1 does not support UTF-8 characters over 3 bytes in length.
我使用了一个脚本将数据库中的所有表和列转换为UTF-8,这解决了大多数unicode问题,但是仍然存在4字节的unicode字符的问题。如前所述,MySQL 5.1不支持长度超过3字节的UTF-8字符。
Whenever I enter a 4-byte unicode character (e.g. ????) into a ModelForm on my Django website the form validates and then an exception similar to the following is raised:
每当我输入一个4字节的unicode字符(e.g.????)成ModelForm Django在我网站表单验证,然后下面是raised:异常相似
Incorrect string value: '\xF0\x9F\x80\x90' for column 'first_name' at row 1
My question:
我的问题:
What is a reasonable way to avoid fatal errors caused by 4-byte UTF-8 characters in a Django web application with a MySQL 5.1 database.
有什么合理的方法可以避免使用MySQL 5.1数据库的Django web应用程序中4字节的UTF-8字符造成的致命错误?
I have considered:
我一直认为:
- Selectively disabling MySQL warnings to avoid specifically that error message (not sure whether that is possible yet)
- 有选择地禁用MySQL警告以避免错误消息(尚不确定是否可能)
- Creating middleware that will look through the
request.POST
QueryDict
and substitute/remove all invalid UTF8 characters - 创建将检查请求的中间件。发布QueryDict并替换/删除所有无效的UTF8字符
- Somehow hook/alter/monkey patch the mechanism that outputs SQL queries for Django or for MySQLdb to substitute/remove all invalid UTF-8 characters before the query is executed
- 在执行查询之前,以某种方式钩子/alter/monkey修补为Django或为MySQLdb输出SQL查询的机制,以替换/删除所有无效的UTF-8字符
Example middleware to replacing invalid characters (inspired by this SO question):
替换无效字符的示例中间件(受到SO问题的启发):
import re
class MySQLUnicodeFixingMiddleware(object):
INVALID_UTF8_RE = re.compile(u'[^\u0000-\uD7FF\uE000-\uFFFF]', re.UNICODE)
def process_request(self, request):
"""Replace 4-byte unicode characters by REPLACEMENT CHARACTER"""
request.POST = request.POST.copy()
for key, values in request.POST.iterlists():
request.POST.setlist(key,
[self.INVALID_UTF8_RE.sub(u'\uFFFD', v) for v in values])
1 个解决方案
#1
1
Do you have an option to upgrade mysql? If you do, you can upgrade and set the encoding to utf8mb4.
你有升级mysql的选项吗?如果这样做,您可以升级并将编码设置为utf8mb4。
Assuming that you don't have the option, I see these options for you:
假设你没有这个选择,我为你看到这些选择:
1) Add java script / frontend validations to prevent entry of anything other than 1,2, or 3 byte unicode characters,
1)添加java脚本/前端验证,以防止输入除1、2或3字节的unicode字符之外的任何字符,
2) Supplement that with a cleanup function in your models to strip the data of any 4 byte unicode characters (which would be your option 2 or 3)
2)在模型中添加清理函数以删除任何4字节的unicode字符(这是您的选项2或3)
At the same time, it does look like your users are in fact using 4 byte characters. If there is a business case for using them in your application, you could go to the powers that be and request for an upgrade.
与此同时,看起来您的用户实际上使用了4字节的字符。如果在您的应用程序中有使用它们的业务案例,您可以访问相应的权限并请求升级。
#1
1
Do you have an option to upgrade mysql? If you do, you can upgrade and set the encoding to utf8mb4.
你有升级mysql的选项吗?如果这样做,您可以升级并将编码设置为utf8mb4。
Assuming that you don't have the option, I see these options for you:
假设你没有这个选择,我为你看到这些选择:
1) Add java script / frontend validations to prevent entry of anything other than 1,2, or 3 byte unicode characters,
1)添加java脚本/前端验证,以防止输入除1、2或3字节的unicode字符之外的任何字符,
2) Supplement that with a cleanup function in your models to strip the data of any 4 byte unicode characters (which would be your option 2 or 3)
2)在模型中添加清理函数以删除任何4字节的unicode字符(这是您的选项2或3)
At the same time, it does look like your users are in fact using 4 byte characters. If there is a business case for using them in your application, you could go to the powers that be and request for an upgrade.
与此同时,看起来您的用户实际上使用了4字节的字符。如果在您的应用程序中有使用它们的业务案例,您可以访问相应的权限并请求升级。