Python DictWriter编写UTF-8编码的CSV文件。

I have a list of dictionaries containing unicode strings.
我有一个包含unicode字符串的字典列表。
csv.DictWriter can write a list of dictionaries into a CSV file.
csv。DictWriter可以将字典列表写入CSV文件中。
I want the CSV file to be encoded in UTF8.
我希望将CSV文件编码在UTF8中。
The csv module cannot handle converting unicode strings into UTF8.
csv模块不能将unicode字符串转换为UTF8。
The csv module documentation has an example for converting everything to UTF8:

csv模块文档有一个将所有内容转换为UTF8的示例:
```
def utf_8_encoder(unicode_csv_data):
    for line in unicode_csv_data:
        yield line.encode('utf-8')
```
It also has a UnicodeWriter class.

它还有一个UnicodeWriter类。

But... how do I make DictWriter work with these? Wouldn't they have to inject themselves in the middle of it, to catch the disassembled dictionaries and encode them before it writes them to the file? I don't get it.

但是…我怎么让DictWriter和这些一起工作?难道他们不需要把自己插入到中间，去捕捉拆解的字典并在它把它们写入文件之前对它们进行编码吗?我不明白。

6 个解决方案

#1

If using Python 2.7 or later, use a dict comprehension to remap the dictionary to utf-8 before passing to DictWriter:

如果使用Python 2.7或更高版本，在传递给DictWriter之前，使用命令理解将字典重新映射到utf-8:

# coding: utf-8
import csv
D = {'name':u'马克','pinyin':u'mǎkè'}
f = open('out.csv','wb')
f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly)
w = csv.DictWriter(f,sorted(D.keys()))
w.writeheader()
w.writerow({k:v.encode('utf8') for k,v in D.items()})
f.close()

You can use this idea to update UnicodeWriter to DictUnicodeWriter:

你可以用这个想法来更新UnicodeWriter到DictUnicodeWriter:

# coding: utf-8
import csv
import cStringIO
import codecs

class DictUnicodeWriter(object):

    def __init__(self, f, fieldnames, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.DictWriter(self.queue, fieldnames, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, D):
        self.writer.writerow({k:v.encode("utf-8") for k,v in D.items()})
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for D in rows:
            self.writerow(D)

    def writeheader(self):
        self.writer.writeheader()

D1 = {'name':u'马克','pinyin':u'Mǎkè'}
D2 = {'name':u'美国','pinyin':u'Měiguó'}
f = open('out.csv','wb')
f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly)
w = DictUnicodeWriter(f,sorted(D.keys()))
w.writeheader()
w.writerows([D1,D2])
f.close()

#2

There is a simple workaround using the wonderful UnicodeCSV module. After having it, just change the line

使用奇妙的UnicodeCSV模块有一个简单的解决方案。在拥有它之后，只要改变这条线就行了。

import csv

来

import unicodecsv as csv

And it automagically begins playing nice with UTF-8.

而且它自动开始使用UTF-8。

Note: Switching to Python 3 will also rid you of this problem (thanks jamescampbell for the tip). And it's something one should do anyway.

注意:切换到Python 3也可以解决这个问题(感谢jamescampbell提供的提示)。这是一个人应该做的事情。

#3

You can convert the values to UTF-8 on the fly as you pass the dict to DictWriter.writerow(). For example:

当您将该命令传递给DictWriter.writerow()时，您可以将值转换为UTF-8。例如:

import csv

rows = [
    {'name': u'Anton\xedn Dvo\u0159\xe1k','country': u'\u010cesko'},
    {'name': u'Bj\xf6rk Gu\xf0mundsd\xf3ttir', 'country': u'\xcdsland'},
    {'name': u'S\xf8ren Kierkeg\xe5rd', 'country': u'Danmark'}
    ]

# implement this wrapper on 2.6 or lower if you need to output a header
class DictWriterEx(csv.DictWriter):
    def writeheader(self):
        header = dict(zip(self.fieldnames, self.fieldnames))
        self.writerow(header)

out = open('foo.csv', 'wb')
writer = DictWriterEx(out, fieldnames=['name','country'])
# DictWriter.writeheader() was added in 2.7 (use class above for <= 2.6)
writer.writeheader()
for row in rows:
    writer.writerow(dict((k, v.encode('utf-8')) for k, v in row.iteritems()))
out.close()

Output foo.csv:

输出foo.csv:

name,country
Antonín Dvořák,Česko
Björk Guðmundsdóttir,Ísland
Søren Kierkegård,Danmark

#4

You can use some proxy class to encode dict values as needed, like this:

您可以使用一些代理类在需要时对dict值进行编码，如下所示:

# -*- coding: utf-8 -*- 
import csv
d = {'a':123,'b':456, 'c':u'Non-ASCII: проверка'}

class DictUnicodeProxy(object):
    def __init__(self, d):
        self.d = d
    def __iter__(self):
        return self.d.__iter__()
    def get(self, item, default=None):
        i = self.d.get(item, default)
        if isinstance(i, unicode):
            return i.encode('utf-8')
        return i

with open('some.csv', 'wb') as f:
    writer = csv.DictWriter(f, ['a', 'b', 'c'])
    writer.writerow(DictUnicodeProxy(d))

#5

When you call csv.writer with your content, the idea is to pass the content through utf_8_encoder as it would give you the (utf-8) encoded content.

当你叫csv。内容的作者，想法是通过utf_8_编码器传递内容，因为它会给你(utf-8)编码的内容。

#6

My solution is a bit different. While all solutions above are focusing on having unicode compatible dict, my solutions makes DictWriter compatible with unicode. This approach is even suggested in python docs (1).

我的解决方案有点不同。虽然以上所有解决方案都专注于使用unicode兼容的命令，但我的解决方案使DictWriter与unicode兼容。这种方法甚至在python文档(1)中提出。

Classes UTF8Recoder, UnicodeReader, UnicodeWriter are taken from python docs. UnicodeWriter->writerow was changed a little bit too.

类UTF8Recoder, UnicodeReader, UnicodeWriter取自python文档。UnicodeWriter->writerow也发生了一些变化。

Use it as regular DictWriter/DictReader.

使用它作为常规的DictWriter/DictReader。

Here is the code:

这是代码:

import csv, codecs, cStringIO

class UTF8Recoder:
    """
    Iterator that reads an encoded stream and reencodes the input to UTF-8
    """
    def __init__(self, f, encoding):
        self.reader = codecs.getreader(encoding)(f)

    def __iter__(self):
        return self

    def next(self):
        return self.reader.next().encode("utf-8")

class UnicodeReader:
    """
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        f = UTF8Recoder(f, encoding)
        self.reader = csv.reader(f, dialect=dialect, **kwds)

    def next(self):
        row = self.reader.next()
        return [unicode(s, "utf-8") for s in row]

    def __iter__(self):
        return self

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([unicode(s).encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

class UnicodeDictWriter(csv.DictWriter, object):
    def __init__(self, f, fieldnames, restval="", extrasaction="raise", dialect="excel", *args, **kwds):
        super(UnicodeDictWriter, self).__init__(f, fieldnames, restval="", extrasaction="raise", dialect="excel", *args, **kwds)
        self.writer = UnicodeWriter(f, dialect, **kwds)

#1