获取pandas.read_csv将空值读取为空字符串而不是nan

时间:2022-06-30 15:29:30

I'm using the pandas library to read in some CSV data. In my data, certain columns contain strings. The string "nan" is a possible value, as is an empty string. I managed to get pandas to read "nan" as a string, but I can't figure out how to get it not to read an empty value as NaN. Here's sample data and output

我正在使用pandas库来读取一些CSV数据。在我的数据中,某些列包含字符串。字符串“nan”是可能的值,空字符串也是如此。我设法让大熊猫把“nan”作为一个字符串,但我无法弄清楚如何让它不读取空值作为NaN。这是样本数据和输出

One,Two,Three
a,1,one
b,2,two
,3,three
d,4,nan
e,5,five
nan,6,
g,7,seven

>>> pandas.read_csv('test.csv', na_values={'One': [], "Three": []})
    One  Two  Three
0    a    1    one
1    b    2    two
2  NaN    3  three
3    d    4    nan
4    e    5   five
5  nan    6    NaN
6    g    7  seven

It correctly reads "nan" as the string "nan', but still reads the empty cells as NaN. I tried passing in str in the converters argument to read_csv (with converters={'One': str})), but it still reads the empty cells as NaN.

它正确地将“nan”读作字符串“nan”,但仍然将空单元格读取为NaN。我尝试将转换器参数中的str传递给read_csv(使用converter = {'One':str})),但它仍然读取空单元格为NaN。

I realize I can fill the values after reading, with fillna, but is there really no way to tell pandas that an empty cell in a particular CSV column should be read as an empty string instead of NaN?

我意识到我可以用fillna读取后填充值,但是真的没有办法告诉pandas特定CSV列中的空单元应该被读作空字符串而不是NaN吗?

3 个解决方案

#1


27  

I added a ticket to add an option of some sort here:

我在这里添加了一个添加某种选项的票证:

https://github.com/pydata/pandas/issues/1450

https://github.com/pydata/pandas/issues/1450

In the meantime, result.fillna('') should do what you want

在此期间,result.fillna('')应该做你想要的

EDIT: in the development version (to be 0.8.0 final) if you specify an empty list of na_values, empty strings will stay empty strings in the result

编辑:在开发版本(最终为0.8.0)中如果指定一个空的na_values列表,空字符串将在结果中保留空字符串

#2


22  

I was still confused after reading the other answers and comments. But the answer now seems simpler, so here you go.

在阅读其他答案和评论后,我仍然感到困惑。但现在答案似乎更简单,所以你走了。

Since Pandas version 0.9 (from 2012), you can read your csv with empty cells interpreted as empty strings by simply setting keep_default_na=False:

从Pandas 0.9版(从2012年开始)开始,只需设置keep_default_na = False即可读取空单元格被解释为空字符串的csv:

pd.read_csv('test.csv', keep_default_na=False)

This issue is more clearly explained in

这个问题在下面有更清楚的解释

That was fixed on on Aug 19, 2012 for Pandas version 0.9 in

这是在2012年8月19日为Pandas 0.9版本修复的

#3


-1  

Use the fillna method, but use it twice 'nan' = 'nan', 'NaN' = "". This would keep comma's lined up. If the NAN werent there then the columns would not line up. Remember: nan does not equal NaN.

使用fillna方法,但使用两次'nan'='nan','NaN'=“”。这将保持逗号排队。如果NAN不在那里,则列不会排列。记住:nan不等于NaN。

#1


27  

I added a ticket to add an option of some sort here:

我在这里添加了一个添加某种选项的票证:

https://github.com/pydata/pandas/issues/1450

https://github.com/pydata/pandas/issues/1450

In the meantime, result.fillna('') should do what you want

在此期间,result.fillna('')应该做你想要的

EDIT: in the development version (to be 0.8.0 final) if you specify an empty list of na_values, empty strings will stay empty strings in the result

编辑:在开发版本(最终为0.8.0)中如果指定一个空的na_values列表,空字符串将在结果中保留空字符串

#2


22  

I was still confused after reading the other answers and comments. But the answer now seems simpler, so here you go.

在阅读其他答案和评论后,我仍然感到困惑。但现在答案似乎更简单,所以你走了。

Since Pandas version 0.9 (from 2012), you can read your csv with empty cells interpreted as empty strings by simply setting keep_default_na=False:

从Pandas 0.9版(从2012年开始)开始,只需设置keep_default_na = False即可读取空单元格被解释为空字符串的csv:

pd.read_csv('test.csv', keep_default_na=False)

This issue is more clearly explained in

这个问题在下面有更清楚的解释

That was fixed on on Aug 19, 2012 for Pandas version 0.9 in

这是在2012年8月19日为Pandas 0.9版本修复的

#3


-1  

Use the fillna method, but use it twice 'nan' = 'nan', 'NaN' = "". This would keep comma's lined up. If the NAN werent there then the columns would not line up. Remember: nan does not equal NaN.

使用fillna方法,但使用两次'nan'='nan','NaN'=“”。这将保持逗号排队。如果NAN不在那里,则列不会排列。记住:nan不等于NaN。