I have a dataframe "column" which has blank & NaN (nulls) at the same time. Now I want to replace the blank & NaN field with a string "No Data". Please give some guidance on the same. I am using Python Pandas.
我有一个数据帧“列”,它同时具有空白和NaN(空值)。现在我想用空格和NaN字段替换字符串“No Data”。请给出相同的指导。我正在使用Python Pandas。
My dataframe column -
我的数据框列 -
Col1
----
NaN
New York
NaN
This is what I have tried -
这是我试过的 -
df['Col1'] = df['Col1'].replace(r'\s+', "No Data", regex=True)
df['Col1'] = df['Col1'].replace(np.NaN, "No Data", regex=True)
My resultant column looks like -
我的结果列看起来像 -
Col1
----
No Data
No data
NewNo DataYork
No Data
Thanks.
4 个解决方案
#1
filter the df to set the empty/blank entries to NaN
and then fill these:
过滤df以将空/空条目设置为NaN,然后填写以下内容:
In [27]:
df = pd.DataFrame({'Col1':['',np.NaN,'New York',np.NaN]})
df
Out[27]:
Col1
0
1 NaN
2 New York
3 NaN
In [28]:
df.loc[df['Col1'].str.len() == 0, 'Col1'] = np.NaN
df['Col1'] = df['Col1'].fillna('No Data')
df
Out[28]:
Col1
0 No Data
1 No Data
2 New York
3 No Data
#2
You have to specify the start and end of the regex:
您必须指定正则表达式的开头和结尾:
In [11]: df.replace('^\s*$', np.nan, regex=True)
Out[11]:
Col1
0 NaN
1 NaN
2 New York
3 NaN
In [12]: df.replace('^\s*$', np.nan, regex=True).fillna("No Data")
Out[12]:
Col1
0 No Data
1 No Data
2 New York
3 No Data
#3
You could pass the values you want to replace in a dictionary to the replace
function:
您可以将要在字典中替换的值传递给replace函数:
In [944]: x.head()
Out[944]:
ind1 ind2 value identifier
0 EA 01/01/07 0.231 55
1 EA 01/01/07 0.511 56
2 EA 01/01/07 0.357 57
3 EA 01/02/07 0.091 55
4 EA 01/02/07 0.161 57
In [945]: x.head().replace({55:'N/A', 56:'FiftySix'}, axis=1)
Out[945]:
ind1 ind2 value identifier
0 EA 01/01/07 0.231 N/A
1 EA 01/01/07 0.511 FiftySix
2 EA 01/01/07 0.357 57
3 EA 01/02/07 0.091 N/A
4 EA 01/02/07 0.161 57
#4
Okay, here's a where
-based approach:
好的,这是一个基于位置的方法:
>>> df["Col1"] = df.Col1.where(df.Col1.str.strip().str.len() > 0, "No Data")
>>> df
Col1
0 No Data
1 No Data
2 New York
3 No Data
This replaces anything which after stripping doesn't have a positive length with "No Data". NaNs stay NaN, and so they don't have a positive length.
这取代了剥离后没有“无数据”的正长度的任何东西。 NaNs保持NaN,所以他们没有正长度。
(I'm really bad at remembering regex syntax so I tend to use named methods instead.)
(我很难记住正则表达式语法,所以我倾向于使用命名方法。)
#1
filter the df to set the empty/blank entries to NaN
and then fill these:
过滤df以将空/空条目设置为NaN,然后填写以下内容:
In [27]:
df = pd.DataFrame({'Col1':['',np.NaN,'New York',np.NaN]})
df
Out[27]:
Col1
0
1 NaN
2 New York
3 NaN
In [28]:
df.loc[df['Col1'].str.len() == 0, 'Col1'] = np.NaN
df['Col1'] = df['Col1'].fillna('No Data')
df
Out[28]:
Col1
0 No Data
1 No Data
2 New York
3 No Data
#2
You have to specify the start and end of the regex:
您必须指定正则表达式的开头和结尾:
In [11]: df.replace('^\s*$', np.nan, regex=True)
Out[11]:
Col1
0 NaN
1 NaN
2 New York
3 NaN
In [12]: df.replace('^\s*$', np.nan, regex=True).fillna("No Data")
Out[12]:
Col1
0 No Data
1 No Data
2 New York
3 No Data
#3
You could pass the values you want to replace in a dictionary to the replace
function:
您可以将要在字典中替换的值传递给replace函数:
In [944]: x.head()
Out[944]:
ind1 ind2 value identifier
0 EA 01/01/07 0.231 55
1 EA 01/01/07 0.511 56
2 EA 01/01/07 0.357 57
3 EA 01/02/07 0.091 55
4 EA 01/02/07 0.161 57
In [945]: x.head().replace({55:'N/A', 56:'FiftySix'}, axis=1)
Out[945]:
ind1 ind2 value identifier
0 EA 01/01/07 0.231 N/A
1 EA 01/01/07 0.511 FiftySix
2 EA 01/01/07 0.357 57
3 EA 01/02/07 0.091 N/A
4 EA 01/02/07 0.161 57
#4
Okay, here's a where
-based approach:
好的,这是一个基于位置的方法:
>>> df["Col1"] = df.Col1.where(df.Col1.str.strip().str.len() > 0, "No Data")
>>> df
Col1
0 No Data
1 No Data
2 New York
3 No Data
This replaces anything which after stripping doesn't have a positive length with "No Data". NaNs stay NaN, and so they don't have a positive length.
这取代了剥离后没有“无数据”的正长度的任何东西。 NaNs保持NaN,所以他们没有正长度。
(I'm really bad at remembering regex syntax so I tend to use named methods instead.)
(我很难记住正则表达式语法,所以我倾向于使用命名方法。)