UnicodeEncodeError:“ascii”编码解码器不能在第47位中对字符u'\u2019进行编码:序数不在范围(128)

时间:2022-03-21 20:21:36

I am using Python 2.7 and MySQLdb 1.2.3. I tried everything I found on * and other forums to handle encoding errors my script is throwing. My script reads data from all tables in a source MySQL DB, writes them in a python StringIO.StringIO object, and then loads that data from StringIO object to Postgres database (which apparently is in UTF-8 encoding format. I found this by looking into Properties--Definition of database in pgadmin) using psycopg2 library's copy_from command.

我使用的是Python 2.7和MySQLdb 1.2.3。我尝试了所有在*和其他论坛上找到的处理我的脚本正在抛出的编码错误的论坛。我的脚本从源MySQL DB中的所有表读取数据,并在python StringIO中写入它们。StringIO对象,然后将数据从StringIO对象加载到Postgres数据库(显然是UTF-8编码格式)。通过使用psycopg2库的copy_from命令,通过查看pgadmin中的数据库的属性,我发现了这一点。

I found out that my source MySQL database has some tables in latin1_swedish_ci encoding while others in utf_8 encoding format (Found this from TABLE_COLLATION in information_schema.tables).

我发现我的源MySQL数据库在latin1_swedish_ci编码中有一些表,而在utf_8编码格式中有一些表(在information_schema.tables中从TABLE_COLLATION中找到这个表)。

I wrote all this code on the top of my Python script based on my research on the internet.

我在我的Python脚本的顶部写了所有这些代码,这是基于我在互联网上的研究。

db_conn = MySQLdb.connect(host=host,user=user,passwd=passwd,db=db, charset="utf8", init_command='SET NAMES UTF8' ,use_unicode=True) 
db_conn.set_character_set('utf8') 
db_conn_cursor = db_conn.cursor()
db_conn_cursor.execute('SET NAMES utf8;')
db_conn_cursor.execute('SET CHARACTER SET utf8;')
db_conn_cursor.execute('SET character_set_connection=utf8;')

I still get the UnicodeEncodeError below with this line: cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value,

我仍然使用这一行代码来获取UnicodeEncodeError: cell = str(cell)。替换(“\ r "," ")。替换(" \ n "," ")。替换(“\ t”、“)。替换("\"" ")#从列值中删除不需要的字符,

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)

I wrote the following line of code to clean cells in every table of source MySQL database when writing to StringIO object.

在编写StringIO对象时,我编写了下面一行代码来清理每个源MySQL数据库表中的单元格。

cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value

Please help.

请帮助。

1 个解决方案

#1


10  

str(cell) is trying to convert cell to ASCII. ASCII only supports characters with ordinals less than 255. What is cell?

(单元格)试图将单元格转换为ASCII。ASCII仅支持序数小于255的字符。细胞是什么?

If cell is a unicode string, just do cell.encode("utf8"), and that will return a bytestring encoded as utf 8

如果单元格是一个unicode字符串,只需要做cell.encode(“utf8”),它将返回一个编码为utf8的字节字符串。

...or really iirc. If you pass mysql unicode, then the database will automagically convert it to utf8...

…或者真的这个。如果您通过了mysql unicode,那么数据库将自动将其转换为utf8…

You could also try,

你也可以尝试,

cell = unicode(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "")

or just use a 3rd party library. There is a good one that will fix text for you.

或者只使用第三方库。有一个很好的方法可以为你修改文本。

#1


10  

str(cell) is trying to convert cell to ASCII. ASCII only supports characters with ordinals less than 255. What is cell?

(单元格)试图将单元格转换为ASCII。ASCII仅支持序数小于255的字符。细胞是什么?

If cell is a unicode string, just do cell.encode("utf8"), and that will return a bytestring encoded as utf 8

如果单元格是一个unicode字符串,只需要做cell.encode(“utf8”),它将返回一个编码为utf8的字节字符串。

...or really iirc. If you pass mysql unicode, then the database will automagically convert it to utf8...

…或者真的这个。如果您通过了mysql unicode,那么数据库将自动将其转换为utf8…

You could also try,

你也可以尝试,

cell = unicode(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "")

or just use a 3rd party library. There is a good one that will fix text for you.

或者只使用第三方库。有一个很好的方法可以为你修改文本。