I have a sqlite database with this row of information, the ù should really be a '-'
我有一个带有这一行信息的sqlite数据库,ù应该真的是' - '
sqlite> select * from t_question where rowid=193;
193|SAT1000|having a pointed, sharp qualityùoften used to describe smells|pungent|lethargic|enigmatic|resolute|grievous
When I read that row from python I get this error, what am I doing wrong?
当我从python中读取该行时出现此错误,我做错了什么?
Traceback (most recent call last):
File "foo_error.py", line 8, in <module>
cur.execute(sql_string)
sqlite3.OperationalError: Could not decode to UTF-8 column 'posit' with text 'having a pointed, sharp qualityùoften used to describe smells'
Python File:
import sqlite3
conn = sqlite3.connect('sat1000.db')
cur = conn.cursor()
sql_string = 'SELECT * FROM t_question WHERE rowid=193'
cur.execute(sql_string)
conn.close()
2 个解决方案
#1
18
Set text_factory
to str
:
将text_factory设置为str:
conn = sqlite3.connect('sat1000.db')
conn.text_factory = str
This will cause cur
to return str
s instead of automatically trying to decode the str
with the UTF-8
codec.
这将导致cur返回strs而不是自动尝试使用UTF-8编解码器解码str。
I wasn't able to find any chain of decodings and encodings that would transform 'ù'
to a hyphen, but there are many possible unicode hyphens such as u'-'
, u'\xad'
, u'\u2010'
, u'\u2011'
, u'\u2043'
, u'\ufe63'
and u'\uff0d'
, and I haven't ruled out the possibility that such a chain of decoding/encodings might exist. However, unless you can find the right transformation, it might be easiest to simply use str.replace
to fix the string.
我无法找到任何可以将'ù'转换为连字符的解码和编码链,但是有许多可能的unicode连字符,例如u' - ',u'\ xad',u'\ u2010',u '\ u2011',u'\ u2043',u'\ ufe63'和u'\ uff0d',我还没有排除这种解码/编码链可能存在的可能性。但是,除非您能找到正确的转换,否则最简单的方法是使用str.replace来修复字符串。
Correction:
In [43]: print('ù'.decode('utf-8').encode('cp437').decode('cp1252'))
— # EM DASH u'\u2014'
So there are chains of decoding/encodings which can transform 'ù'
into some form of hyphen.
所以有解码/编码链可以将'ù'转换成某种形式的连字符。
#2
2
conn.text_factory = str
doesn't work for me.
conn.text_factory = str对我不起作用。
I use conn.text_factory = bytes
. reference here: https://*.com/a/23509002/6452438
我使用conn.text_factory = bytes。请参考:https://*.com/a/23509002/6452438
#1
18
Set text_factory
to str
:
将text_factory设置为str:
conn = sqlite3.connect('sat1000.db')
conn.text_factory = str
This will cause cur
to return str
s instead of automatically trying to decode the str
with the UTF-8
codec.
这将导致cur返回strs而不是自动尝试使用UTF-8编解码器解码str。
I wasn't able to find any chain of decodings and encodings that would transform 'ù'
to a hyphen, but there are many possible unicode hyphens such as u'-'
, u'\xad'
, u'\u2010'
, u'\u2011'
, u'\u2043'
, u'\ufe63'
and u'\uff0d'
, and I haven't ruled out the possibility that such a chain of decoding/encodings might exist. However, unless you can find the right transformation, it might be easiest to simply use str.replace
to fix the string.
我无法找到任何可以将'ù'转换为连字符的解码和编码链,但是有许多可能的unicode连字符,例如u' - ',u'\ xad',u'\ u2010',u '\ u2011',u'\ u2043',u'\ ufe63'和u'\ uff0d',我还没有排除这种解码/编码链可能存在的可能性。但是,除非您能找到正确的转换,否则最简单的方法是使用str.replace来修复字符串。
Correction:
In [43]: print('ù'.decode('utf-8').encode('cp437').decode('cp1252'))
— # EM DASH u'\u2014'
So there are chains of decoding/encodings which can transform 'ù'
into some form of hyphen.
所以有解码/编码链可以将'ù'转换成某种形式的连字符。
#2
2
conn.text_factory = str
doesn't work for me.
conn.text_factory = str对我不起作用。
I use conn.text_factory = bytes
. reference here: https://*.com/a/23509002/6452438
我使用conn.text_factory = bytes。请参考:https://*.com/a/23509002/6452438