I am using Python 2.7 and MySQLdb 1.2.3. I tried everything I found on * and other forums to handle encoding errors my script is throwing. My script reads data from all tables in a source MySQL DB, writes them in a python StringIO.StringIO
object, and then loads that data from StringIO
object to Postgres database (which apparently is in UTF-8 encoding format. I found this by looking into Properties--Definition of database in pgadmin) using psycopg2 library's copy_from command.
我使用的是Python 2.7和MySQLdb 1.2.3。我尝试了所有在*和其他论坛上找到的处理我的脚本正在抛出的编码错误的论坛。我的脚本从源MySQL DB中的所有表读取数据,并在python StringIO中写入它们。StringIO对象,然后将数据从StringIO对象加载到Postgres数据库(显然是UTF-8编码格式)。通过使用psycopg2库的copy_from命令,通过查看pgadmin中的数据库的属性,我发现了这一点。
I found out that my source MySQL database has some tables in latin1_swedish_ci encoding while others in utf_8 encoding format (Found this from TABLE_COLLATION in information_schema.tables).
我发现我的源MySQL数据库在latin1_swedish_ci编码中有一些表,而在utf_8编码格式中有一些表(在information_schema.tables中从TABLE_COLLATION中找到这个表)。
I wrote all this code on the top of my Python script based on my research on the internet.
我在我的Python脚本的顶部写了所有这些代码,这是基于我在互联网上的研究。
db_conn = MySQLdb.connect(host=host,user=user,passwd=passwd,db=db, charset="utf8", init_command='SET NAMES UTF8' ,use_unicode=True)
db_conn.set_character_set('utf8')
db_conn_cursor = db_conn.cursor()
db_conn_cursor.execute('SET NAMES utf8;')
db_conn_cursor.execute('SET CHARACTER SET utf8;')
db_conn_cursor.execute('SET character_set_connection=utf8;')
I still get the UnicodeEncodeError
below with this line: cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value
,
我仍然使用这一行代码来获取UnicodeEncodeError: cell = str(cell)。替换(“\ r "," ")。替换(" \ n "," ")。替换(“\ t”、“)。替换("\"" ")#从列值中删除不需要的字符,
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)
I wrote the following line of code to clean cells in every table of source MySQL database when writing to StringIO object.
在编写StringIO对象时,我编写了下面一行代码来清理每个源MySQL数据库表中的单元格。
cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value
Please help.
请帮助。
1 个解决方案
#1
10
str(cell)
is trying to convert cell
to ASCII. ASCII only supports characters with ordinals less than 255. What is cell?
(单元格)试图将单元格转换为ASCII。ASCII仅支持序数小于255的字符。细胞是什么?
If cell
is a unicode string, just do cell.encode("utf8")
, and that will return a bytestring encoded as utf 8
如果单元格是一个unicode字符串,只需要做cell.encode(“utf8”),它将返回一个编码为utf8的字节字符串。
...or really iirc. If you pass mysql unicode, then the database will automagically convert it to utf8...
…或者真的这个。如果您通过了mysql unicode,那么数据库将自动将其转换为utf8…
You could also try,
你也可以尝试,
cell = unicode(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "")
or just use a 3rd party library. There is a good one that will fix text for you.
或者只使用第三方库。有一个很好的方法可以为你修改文本。
#1
10
str(cell)
is trying to convert cell
to ASCII. ASCII only supports characters with ordinals less than 255. What is cell?
(单元格)试图将单元格转换为ASCII。ASCII仅支持序数小于255的字符。细胞是什么?
If cell
is a unicode string, just do cell.encode("utf8")
, and that will return a bytestring encoded as utf 8
如果单元格是一个unicode字符串,只需要做cell.encode(“utf8”),它将返回一个编码为utf8的字节字符串。
...or really iirc. If you pass mysql unicode, then the database will automagically convert it to utf8...
…或者真的这个。如果您通过了mysql unicode,那么数据库将自动将其转换为utf8…
You could also try,
你也可以尝试,
cell = unicode(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "")
or just use a 3rd party library. There is a good one that will fix text for you.
或者只使用第三方库。有一个很好的方法可以为你修改文本。