I have copied this script from [python web site][1] This is another question but now problem with encoding:
我已经从[python网站][1]复制了这个脚本[0],这是另一个问题,但是现在编码的问题是:
import sqlite3
import csv
import codecs
import cStringIO
import sys
class UTF8Recoder:
"""
Iterator that reads an encoded stream and reencodes the input to UTF-8
"""
def __init__(self, f, encoding):
self.reader = codecs.getreader(encoding)(f)
def __iter__(self):
return self
def next(self):
return self.reader.next().encode("utf-8")
class UnicodeReader:
"""
A CSV reader which will iterate over lines in the CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
f = UTF8Recoder(f, encoding)
self.reader = csv.reader(f, dialect=dialect, **kwds)
def next(self):
row = self.reader.next()
return [unicode(s, "utf-8") for s in row]
def __iter__(self):
return self
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
This time problem with encoding, when I ran this it gave me this error:
这一次编码的问题,当我运行它时它给了我这个错误:
Traceback (most recent call last):
File "makeCSV.py", line 87, in <module>
uW.writerow(d)
File "makeCSV.py", line 54, in writerow
self.writer.writerow([s.encode("utf-8") for s in row])
AttributeError: 'int' object has no attribute 'encode'
Then I converted all integers to string, but this time I got this error:
然后我将所有整数转换为字符串,但这次我得到了这个错误:
Traceback (most recent call last):
File "makeCSV.py", line 87, in <module>
uW.writerow(d)
File "makeCSV.py", line 54, in writerow
self.writer.writerow([str(s).encode("utf-8") for s in row])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128)
I have implemented above to deal with unicode characters, but it gives me such error. What is the problem and how to fix it?
我在上面实现了处理unicode字符,但是它给了我这样的错误。问题是什么,如何解决?
2 个解决方案
#1
66
Then I converted all integers to string,
然后我将所有整数转换为字符串,
You converted both integers and strings to byte strings. For strings this will use the default character encoding which happens to be ASCII, and this fails when you have non-ASCII characters. You want unicode
instead of str
.
将整数和字符串转换为字节字符串。对于字符串,这将使用默认的字符编码,这是ASCII码,当你有非ASCII字符时,这就失败了。你需要的是unicode而不是str。
self.writer.writerow([unicode(s).encode("utf-8") for s in row])
It might be better to convert everything to unicode before calling that method. The class is designed specifically for parsing Unicode strings. It was not designed to support other data types.
在调用该方法之前,最好将所有内容都转换为unicode。该类是专门为解析Unicode字符串而设计的。它不是为支持其他数据类型而设计的。
#2
2
From the documentation:
从文档:
- http://docs.python.org/library/stringio.html?highlight=cstringio#cStringIO.StringIO
- http://docs.python.org/library/stringio.html?highlight=cstringio cStringIO.StringIO
Unlike the StringIO module, this module is not able to accept Unicode strings that cannot be encoded as plain ASCII strings.
与StringIO模块不同,这个模块不能接受不能被编码为普通ASCII字符串的Unicode字符串。
I.e. only 7-bit clean strings can be stored.
即只有7位干净的字符串可以存储。
#1
66
Then I converted all integers to string,
然后我将所有整数转换为字符串,
You converted both integers and strings to byte strings. For strings this will use the default character encoding which happens to be ASCII, and this fails when you have non-ASCII characters. You want unicode
instead of str
.
将整数和字符串转换为字节字符串。对于字符串,这将使用默认的字符编码,这是ASCII码,当你有非ASCII字符时,这就失败了。你需要的是unicode而不是str。
self.writer.writerow([unicode(s).encode("utf-8") for s in row])
It might be better to convert everything to unicode before calling that method. The class is designed specifically for parsing Unicode strings. It was not designed to support other data types.
在调用该方法之前,最好将所有内容都转换为unicode。该类是专门为解析Unicode字符串而设计的。它不是为支持其他数据类型而设计的。
#2
2
From the documentation:
从文档:
- http://docs.python.org/library/stringio.html?highlight=cstringio#cStringIO.StringIO
- http://docs.python.org/library/stringio.html?highlight=cstringio cStringIO.StringIO
Unlike the StringIO module, this module is not able to accept Unicode strings that cannot be encoded as plain ASCII strings.
与StringIO模块不同,这个模块不能接受不能被编码为普通ASCII字符串的Unicode字符串。
I.e. only 7-bit clean strings can be stored.
即只有7位干净的字符串可以存储。