I am trying to extract some data from a JSON file which contains tweets and write it to a csv. The file contains all kinds of characters, I'm guessing this is why i get this error message:
我试图从包含tweet并将其写入csv的JSON文件中提取一些数据。这个文件包含了所有类型的字符,我猜这就是为什么我得到这个错误信息的原因:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026'
UnicodeEncodeError:“ascii”编码解码器不能对字符u'\u2026进行编码。
I guess I have to convert the output to utf-8 before writing the csv file, but I have not been able to do that. I have found similar questions here on *, but not I've not been able to adapt the solutions to my problem (I should add that I am not really familiar with python. I'm a social scientist, not a programmer)
我想在编写csv文件之前,我必须将输出转换为utf-8,但是我还没有做到这一点。我在*上也发现了类似的问题,但我并没有能够适应我的问题的解决方案(我应该补充一点,我不是很熟悉python)。我是社会科学家,不是程序员)
import csv
import json
fieldnames = ['id', 'text']
with open('MY_SOURCE_FILE', 'r') as f, open('MY_OUTPUT', 'a') as out:
writer = csv.DictWriter(
out, fieldnames=fieldnames, delimiter=',', quoting=csv.QUOTE_ALL)
for line in f:
tweet = json.loads(line)
user = tweet['user']
output = {
'text': tweet['text'],
'id': tweet['id'],
}
writer.writerow(output)
1 个解决方案
#1
6
You just need to encode the text to utf-8:
你只需要将文本编码为utf-8:
for line in f:
tweet = json.loads(line)
user = tweet['user']
output = {
'text': tweet['text'].encode("utf-8"),
'id': tweet['id'],
}
writer.writerow(output)
The csv module does not support writing unicode in python2:
csv模块不支持在python2中编写unicode:
Note This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section Examples.
注意,这个版本的csv模块不支持Unicode输入。此外,目前还有一些关于ASCII NUL字符的问题。因此,所有输入都应该是UTF-8或可打印的ASCII以保证安全;请参见小节示例中的示例。
#1
6
You just need to encode the text to utf-8:
你只需要将文本编码为utf-8:
for line in f:
tweet = json.loads(line)
user = tweet['user']
output = {
'text': tweet['text'].encode("utf-8"),
'id': tweet['id'],
}
writer.writerow(output)
The csv module does not support writing unicode in python2:
csv模块不支持在python2中编写unicode:
Note This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section Examples.
注意,这个版本的csv模块不支持Unicode输入。此外,目前还有一些关于ASCII NUL字符的问题。因此,所有输入都应该是UTF-8或可打印的ASCII以保证安全;请参见小节示例中的示例。