本文翻译自:Writing a pandas DataFrame to CSV file
I have a dataframe in pandas which I would like to write to a CSV file. 我有一个熊猫数据框,我想将其写入CSV文件。 I am doing this using: 我正在使用以下方法:
df.to_csv('')
And getting the error: 并得到错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b1' in position 20: ordinal not in range(128)
Is there any way to get around this easily (ie I have unicode characters in my data frame)? 有什么方法可以轻松解决此问题(即我的数据框中有Unicode字符)吗? And is there a way to write to a tab delimited file instead of a CSV using eg a 'to-tab' method (that I dont think exists)? 是否有一种方法可以使用例如“ to-tab”方法(我认为不存在)写入制表符分隔文件而不是CSV?
#1楼
参考:/question/190W9/将pandas-DataFrame写入CSV文件
#2楼
To delimit by a tab you can use the sep
argument of to_csv : 要用制表符分隔,可以使用to_csv的sep
参数:
df.to_csv(file_name, sep='\t')
To use a specific encoding (eg 'utf-8') use the encoding
argument: 要使用特定的编码(例如'utf-8'),请使用encoding
参数:
df.to_csv(file_name, sep='\t', encoding='utf-8')
#3楼
Sometimes you face these problems if you specify UTF-8 encoding also. 如果同时指定UTF-8编码,有时会遇到这些问题。 I recommend you to specify encoding while reading file and same encoding while writing to file. 我建议您在读取文件时指定编码,而在写入文件时指定相同的编码。 This might solve your problem. 这可能会解决您的问题。
#4楼
Something else you can try if you are having issues encoding to 'utf-8' and want to go cell by cell you could try the following. 如果您遇到编码为'utf-8'的问题,并且想要逐个单元地进行操作,则可以尝试其他方法。
Python 2 Python 2
(Where "df" is your DataFrame object.) (其中“ df”是您的DataFrame对象。)
for column in :
for idx in df[column].index:
x = df.get_value(idx,column)
try:
x = unicode(('utf-8','ignore'),errors ='ignore') if type(x) == unicode else unicode(str(x),errors='ignore')
df.set_value(idx,column,x)
except Exception:
print 'encoding error: {0} {1}'.format(idx,column)
df.set_value(idx,column,'')
continue
Then try: 然后尝试:
df.to_csv(file_name)
You can check the encoding of the columns by: 您可以通过以下方式检查列的编码:
for column in :
print '{0} {1}'.format(str(type(df[column][0])),str(column))
Warning: errors='ignore' will just omit the character eg 警告:errors ='ignore'只会忽略字符,例如
IN: unicode('Regenexx\xae',errors='ignore')
OUT: u'Regenexx'
Python 3 Python 3
for column in :
for idx in df[column].index:
x = df.get_value(idx,column)
try:
x = x if type(x) == str else str(x).encode('utf-8','ignore').decode('utf-8','ignore')
df.set_value(idx,column,x)
except Exception:
print('encoding error: {0} {1}'.format(idx,column))
df.set_value(idx,column,'')
continue
#5楼
When you are storing a DataFrame
object into a csv file using the to_csv
method, you probably wont be needing to store the preceding indices of each row of the DataFrame
object. 当使用to_csv
方法将DataFrame
对象存储到csv文件中时 ,可能不需要存储DataFrame
对象每一行的先前索引 。
You can avoid that by passing a False
boolean value to index
parameter. 您可以通过将False
布尔值传递给index
参数来避免这种情况。
Somewhat like: 有点像:
df.to_csv(file_name, encoding='utf-8', index=False)
So if your DataFrame object is something like: 因此,如果您的DataFrame对象类似于:
Color Number
0 red 22
1 blue 10
The csv file will store: csv文件将存储:
Color,Number
red,22
blue,10
instead of (the case when the default value True
was passed) 而不是(通过默认值 True
的情况 )
,Color,Number
0,red,22
1,blue,10
#6楼
it could be not the answer for this case, but as I had the same error-message with .to_csv I tried .toCSV('') and the error-message was different ("'SparseDataFrame' object has no attribute 'toCSV'"). 它可能不是这种情况的答案,但是由于我对.to_csv使用了相同的错误消息, 因此我尝试使用.toCSV('')并且错误消息有所不同(“'SparseDataFrame'对象没有属性' toCSV'“)。 So the problem was solved by turning dataframe to dense dataframe 因此,通过将数据帧转换为密集数据帧解决了该问题
df.to_dense().to_csv("", index = False, sep=',', encoding='utf-8')