UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 0: invalid continuation byte

时间:2022-02-24 11:01:48

使用Pandas导入CSV文件的时候出错,encoding = ‘UTF-8’

#-*- coding: utf-8 -*-
import pandas as pd

inputfile = 'data/huizong.csv' #评论汇总文件
outputfile = 'data/meidi_jd1.txt' #评论提取后保存路径
data = pd.read_csv(inputfile, encoding = 'utf-8')
data = data[[u'评论']][data[u'品牌'] == u'美的']
data.to_csv(outputfile, index = False, header = False, encoding = 'utf-8')



UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 0: invalid continuation byte

修改 encoding = 'gb18030' 后导入正常,这个问题在使用Python导入中文内容经常出现,很多教程是在Python2.7上运行正常,但是实际用Python3跑的时候可能会出错。

#-*- coding: utf-8 -*-
import pandas as pd

inputfile = 'data/huizong.csv' #评论汇总文件
outputfile = 'data/meidi_jd1.txt' #评论提取后保存路径
data = pd.read_csv(inputfile, encoding = 'gb18030')
data = data[[u'评论']][data[u'品牌'] == u'美的']
data.to_csv(outputfile, index = False, header = False, encoding = 'utf-8')