Python Pandas read_excel dtype str在读取或通过to_csv写入时用空格('')替换nan

时间:2022-02-28 11:47:17

Python version: Python 2.7.13 :: Anaconda custom (64-bit) Pandas version: pandas 0.20.2

Python版本:Python 2.7.13 :: Anaconda自定义(64位)Pandas版本:pandas 0.20.2

Hello,

你好,

I have a quite simple requirement. I would like to read an excel file and write a specific sheet to a csv file. Blank values in the source Excel file should be treated / written as blank when writing the csv file. However, my blank records are always written as 'nan' to the output file. (without the quotes)

我有一个非常简单的要求。我想阅读一个excel文件并将特定表格写入csv文件。编写csv文件时,源Excel文件中的空白值应被视为空白。但是,我的空白记录始终写为输出文件的“nan”。 (没有报价)

I read the Excel file via method

我通过方法读取Excel文件

read_excel(xlsx, sheetname='sheet1', dtype = str)

read_excel(xlsx,sheetname ='sheet1',dtype = str)

I am specifying dtype because I have some columns that are numbers but should be treated as string. (Otherwise they might lose leading 0s etc) i.e. I would like to read the exact value from every cell.

我指定dtype因为我有一些数字列但应该被视为字符串。 (否则他们可能会丢失前导0等),即我想从每个单元格中读取确切的值。

Now I write the output .csv file via to_csv(output_file,index=False,mode='wb',sep=',',encoding='utf-8')

现在我通过to_csv写出输出.csv文件(output_file,index = False,mode ='wb',sep =',',encoding ='utf-8')

However, my result csv file contains nan for all blank cells from the excel file.

但是,我的结果csv文件包含来自excel文件的所有空白单元格的nan。

What am I missing? I already tried .fillna('', inplace=True) function but it seems to be doing nothing to my data. I also tried to add parameter na_rep ='' to the to_csv method but without success.

我错过了什么?我已经尝试过.fillna('',inplace = True)函数,但它似乎对我的数据没有任何作用。我还尝试将参数na_rep =''添加到to_csv方法但没有成功。

Thanks for any help!

谢谢你的帮助!

Addendum: Please find hereafter a reproducible example.

附录:请在下文中找到可重复的示例。

Please find hereafter a reproducible example code. Please first create a new Excel file with 2 columns with the following content: COLUMNA COLUMNB COLUMNC 01 test 02 test
03 test

请在下文中找到可重现的示例代码。请首先使用以下内容创建一个包含2列的新Excel文件:COLUMNA COLUMNB COLUMNC 01 test 02 test 03 test

(I saved this Excel file to c:\test.xls Please note that 1st and 3rd row for column B as well as the 2nd row for Column C is blank/empty)

(我将此Excel文件保存到c:\ test.xls请注意,列B的第1行和第3行以及列C的第2行为空/空)

Now here is my code:

现在这是我的代码:

import pandas as pd
xlsx = pd.ExcelFile('c:\\test.xlsx')
df = pd.read_excel(xlsx, sheetname='Sheet1', dtype = str)
df.fillna('', inplace=True)
df.to_csv('c:\\test.csv', index=False,mode='wb',sep=',',encoding='utf-8', na_rep ='')

My result is:
COLUMNA,COLUMNB,COLUMNC
01,nan,test
02,test,nan
03,nan,test

我的结果是:COLUMNA,COLUMNB,COLUMNC 01,nan,test 02,test,nan 03,nan,test

My desired result would be:
COLUMNA,COLUMNB,COLUMNC
01,,test
02,test,
03,,test

我想要的结果是:COLUMNA,COLUMNB,COLUMNC 01 ,, test 02,test,03 ,, test

1 个解决方案

#1


3  

Since you are dealing with nan strings, you can the df.replace function:

由于你正在处理nan字符串,你可以使用df.replace函数:

In [625]: df = pd.DataFrame({'Col1' : ['nan', 'foo', 'bar', 'baz', 'nan', 'test']})

In [626]: df.replace('nan', '')
Out[626]: 
   Col1
0      
1   foo
2   bar
3   baz
4      
5  test

You can then write it to your file:

然后,您可以将其写入您的文件:

df.to_csv(output_file, index=False, mode='wb', sep=',', encoding='utf-8')

All 'nan' string values will be replaced by the empty string ''.

所有'nan'字符串值都将被空字符串''替换。

#1


3  

Since you are dealing with nan strings, you can the df.replace function:

由于你正在处理nan字符串,你可以使用df.replace函数:

In [625]: df = pd.DataFrame({'Col1' : ['nan', 'foo', 'bar', 'baz', 'nan', 'test']})

In [626]: df.replace('nan', '')
Out[626]: 
   Col1
0      
1   foo
2   bar
3   baz
4      
5  test

You can then write it to your file:

然后,您可以将其写入您的文件:

df.to_csv(output_file, index=False, mode='wb', sep=',', encoding='utf-8')

All 'nan' string values will be replaced by the empty string ''.

所有'nan'字符串值都将被空字符串''替换。