将pandas DataFrame写入CSV文件

时间:2025-03-27 18:29:31

本文翻译自:Writing a pandas DataFrame to CSV file

I have a dataframe in pandas which I would like to write to a CSV file. 我有一个熊猫数据框,我想将其写入CSV文件。 I am doing this using: 我正在使用以下方法:

df.to_csv('')

And getting the error: 并得到错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b1' in position 20: ordinal not in range(128)

Is there any way to get around this easily (ie I have unicode characters in my data frame)? 有什么方法可以轻松解决此问题(即我的数据框中有Unicode字符)吗? And is there a way to write to a tab delimited file instead of a CSV using eg a 'to-tab' method (that I dont think exists)? 是否有一种方法可以使用例如“ to-tab”方法(我认为不存在)写入制表符分隔文件而不是CSV?


#1楼

参考:/question/190W9/将pandas-DataFrame写入CSV文件


#2楼

To delimit by a tab you can use the sep argument of to_csv : 要用制表符分隔,可以使用to_csvsep参数:

df.to_csv(file_name, sep='\t')

To use a specific encoding (eg 'utf-8') use the encoding argument: 要使用特定的编码(例如'utf-8'),请使用encoding参数:

df.to_csv(file_name, sep='\t', encoding='utf-8')

#3楼

Sometimes you face these problems if you specify UTF-8 encoding also. 如果同时指定UTF-8编码,有时会遇到这些问题。 I recommend you to specify encoding while reading file and same encoding while writing to file. 我建议您在读取文件时指定编码,而在写入文件时指定相同的编码。 This might solve your problem. 这可能会解决您的问题。


#4楼

Something else you can try if you are having issues encoding to 'utf-8' and want to go cell by cell you could try the following. 如果您遇到编码为'utf-8'的问题,并且想要逐个单元地进行操作,则可以尝试其他方法。

Python 2 Python 2

(Where "df" is your DataFrame object.) (其中“ df”是您的DataFrame对象。)

for column in :
    for idx in df[column].index:
        x = df.get_value(idx,column)
        try:
            x = unicode(('utf-8','ignore'),errors ='ignore') if type(x) == unicode else unicode(str(x),errors='ignore')
            df.set_value(idx,column,x)
        except Exception:
            print 'encoding error: {0} {1}'.format(idx,column)
            df.set_value(idx,column,'')
            continue

Then try: 然后尝试:

df.to_csv(file_name)

You can check the encoding of the columns by: 您可以通过以下方式检查列的编码:

for column in :
    print '{0} {1}'.format(str(type(df[column][0])),str(column))

Warning: errors='ignore' will just omit the character eg 警告:errors ='ignore'只会忽略字符,例如

IN: unicode('Regenexx\xae',errors='ignore')
OUT: u'Regenexx'

Python 3 Python 3

for column in :
    for idx in df[column].index:
        x = df.get_value(idx,column)
        try:
            x = x if type(x) == str else str(x).encode('utf-8','ignore').decode('utf-8','ignore')
            df.set_value(idx,column,x)
        except Exception:
            print('encoding error: {0} {1}'.format(idx,column))
            df.set_value(idx,column,'')
            continue

#5楼

When you are storing a DataFrame object into a csv file using the to_csv method, you probably wont be needing to store the preceding indices of each row of the DataFrame object. 当使用to_csv方法将DataFrame对象存储到csv文件中时 ,可能不需要存储DataFrame对象每一先前索引

You can avoid that by passing a False boolean value to index parameter. 您可以通过将False布尔值传递给index参数来避免这种情况。

Somewhat like: 有点像:

df.to_csv(file_name, encoding='utf-8', index=False)

So if your DataFrame object is something like: 因此,如果您的DataFrame对象类似于:

  Color  Number
0   red     22
1  blue     10

The csv file will store: csv文件将存储:

Color,Number
red,22
blue,10

instead of (the case when the default value True was passed) 而不是(通过默认值 True情况

,Color,Number
0,red,22
1,blue,10

#6楼

it could be not the answer for this case, but as I had the same error-message with .to_csv I tried .toCSV('') and the error-message was different ("'SparseDataFrame' object has no attribute 'toCSV'"). 它可能不是这种情况的答案,但是由于我对.to_csv使用了相同的错误消息, 因此我尝试使用.toCSV('')并且错误消息有所不同(“'SparseDataFrame'对象没有属性' toCSV'“)。 So the problem was solved by turning dataframe to dense dataframe 因此,通过将数据帧转换为密集数据帧解决了该问题

df.to_dense().to_csv("", index = False, sep=',', encoding='utf-8')