what is the quickest/simplest way to drop nan and inf/-inf values from a pandas DataFrame without resetting mode.use_inf_as_null
? I'd like to be able to use the subset
and how
arguments of dropna
, except with inf
values considered missing, like:
什么是从pandas DataFrame中删除nan和inf / -inf值而不重置mode.use_inf_as_null的最快/最简单的方法?我希望能够使用子集以及dropna的参数,除了inf值被认为缺失,例如:
df.dropna(subset=["col1", "col2"], how="all", with_inf=True)
is this possible? Is there a way to tell dropna
to include inf
in its definition of missing values?
这可能吗?有没有办法告诉dropna在缺失值的定义中包含inf?
6 个解决方案
#1
193
The simplest way would be to first replace
infs to NaN:
最简单的方法是首先将infs替换为NaN:
df.replace([np.inf, -np.inf], np.nan)
and then use the dropna
:
然后使用dropna:
df.replace([np.inf, -np.inf], np.nan).dropna(subset=["col1", "col2"], how="all")
For example:
例如:
In [11]: df = pd.DataFrame([1, 2, np.inf, -np.inf])
In [12]: df.replace([np.inf, -np.inf], np.nan)
Out[12]:
0
0 1
1 2
2 NaN
3 NaN
The same method would work for a Series.
同样的方法适用于系列。
#2
10
Here is another method using .loc
to replace inf with nan on a Series:
这是使用.loc在系列上用nan替换inf的另一种方法:
s.loc[(~np.isfinite(s)) & s.notnull()] = np.nan
So, in response to the original question:
所以,回答原来的问题:
df = pd.DataFrame(np.ones((3, 3)), columns=list('ABC'))
for i in range(3):
df.iat[i, i] = np.inf
df
A B C
0 inf 1.000000 1.000000
1 1.000000 inf 1.000000
2 1.000000 1.000000 inf
df.sum()
A inf
B inf
C inf
dtype: float64
df.apply(lambda s: s[np.isfinite(s)].dropna()).sum()
A 2
B 2
C 2
dtype: float64
#3
8
With option context, this is possible without permanently setting use_inf_as_null
. For example:
使用选项上下文,无需永久设置use_inf_as_null即可。例如:
with pd.option_context('mode.use_inf_as_null', True):
df = df.dropna(subset=['col1', 'col2'], how='all')
Of course it can be set to treat inf
as NaN
permanently with pd.set_option('use_inf_as_null', True)
too.
当然可以设置为使用pd.set_option('use_inf_as_null',True)将inf永久地视为NaN。
#4
5
The above solution will modify the inf
s that are not in the target columns. To remedy that,
上述解决方案将修改不在目标列中的inf。要解决这个问题,
lst = [np.inf, -np.inf]
to_replace = dict((v, lst) for v in ['col1', 'col2'])
df.replace(to_replace, np.nan)
#5
3
Yet another solution would be to use the isin
method. Use it to determine whether each value is infinite or missing and then chain the all
method to determine if all the values in the rows are infinite or missing.
另一种解决方案是使用isin方法。使用它来确定每个值是无限还是缺失,然后链接all方法以确定行中的所有值是无限还是缺失。
Finally, use the negation of that result to select the rows that don't have all infinite or missing values via boolean indexing.
最后,使用该结果的否定通过布尔索引选择没有所有无限或缺失值的行。
all_inf_or_nan = df.isin([np.inf, -np.inf, np.nan]).all(axis='columns')
df[~all_inf_or_nan]
#6
0
You can use pd.DataFrame.mask
with np.isinf
. You should ensure first your dataframe series are all of type float
. Then use dropna
with your existing logic.
您可以将pd.DataFrame.mask与np.isinf一起使用。您应首先确保您的数据帧系列都是float类型。然后使用dropna和您现有的逻辑。
print(df)
col1 col2
0 -0.441406 inf
1 -0.321105 -inf
2 -0.412857 2.223047
3 -0.356610 2.513048
df = df.mask(np.isinf(df))
print(df)
col1 col2
0 -0.441406 NaN
1 -0.321105 NaN
2 -0.412857 2.223047
3 -0.356610 2.513048
#1
193
The simplest way would be to first replace
infs to NaN:
最简单的方法是首先将infs替换为NaN:
df.replace([np.inf, -np.inf], np.nan)
and then use the dropna
:
然后使用dropna:
df.replace([np.inf, -np.inf], np.nan).dropna(subset=["col1", "col2"], how="all")
For example:
例如:
In [11]: df = pd.DataFrame([1, 2, np.inf, -np.inf])
In [12]: df.replace([np.inf, -np.inf], np.nan)
Out[12]:
0
0 1
1 2
2 NaN
3 NaN
The same method would work for a Series.
同样的方法适用于系列。
#2
10
Here is another method using .loc
to replace inf with nan on a Series:
这是使用.loc在系列上用nan替换inf的另一种方法:
s.loc[(~np.isfinite(s)) & s.notnull()] = np.nan
So, in response to the original question:
所以,回答原来的问题:
df = pd.DataFrame(np.ones((3, 3)), columns=list('ABC'))
for i in range(3):
df.iat[i, i] = np.inf
df
A B C
0 inf 1.000000 1.000000
1 1.000000 inf 1.000000
2 1.000000 1.000000 inf
df.sum()
A inf
B inf
C inf
dtype: float64
df.apply(lambda s: s[np.isfinite(s)].dropna()).sum()
A 2
B 2
C 2
dtype: float64
#3
8
With option context, this is possible without permanently setting use_inf_as_null
. For example:
使用选项上下文,无需永久设置use_inf_as_null即可。例如:
with pd.option_context('mode.use_inf_as_null', True):
df = df.dropna(subset=['col1', 'col2'], how='all')
Of course it can be set to treat inf
as NaN
permanently with pd.set_option('use_inf_as_null', True)
too.
当然可以设置为使用pd.set_option('use_inf_as_null',True)将inf永久地视为NaN。
#4
5
The above solution will modify the inf
s that are not in the target columns. To remedy that,
上述解决方案将修改不在目标列中的inf。要解决这个问题,
lst = [np.inf, -np.inf]
to_replace = dict((v, lst) for v in ['col1', 'col2'])
df.replace(to_replace, np.nan)
#5
3
Yet another solution would be to use the isin
method. Use it to determine whether each value is infinite or missing and then chain the all
method to determine if all the values in the rows are infinite or missing.
另一种解决方案是使用isin方法。使用它来确定每个值是无限还是缺失,然后链接all方法以确定行中的所有值是无限还是缺失。
Finally, use the negation of that result to select the rows that don't have all infinite or missing values via boolean indexing.
最后,使用该结果的否定通过布尔索引选择没有所有无限或缺失值的行。
all_inf_or_nan = df.isin([np.inf, -np.inf, np.nan]).all(axis='columns')
df[~all_inf_or_nan]
#6
0
You can use pd.DataFrame.mask
with np.isinf
. You should ensure first your dataframe series are all of type float
. Then use dropna
with your existing logic.
您可以将pd.DataFrame.mask与np.isinf一起使用。您应首先确保您的数据帧系列都是float类型。然后使用dropna和您现有的逻辑。
print(df)
col1 col2
0 -0.441406 inf
1 -0.321105 -inf
2 -0.412857 2.223047
3 -0.356610 2.513048
df = df.mask(np.isinf(df))
print(df)
col1 col2
0 -0.441406 NaN
1 -0.321105 NaN
2 -0.412857 2.223047
3 -0.356610 2.513048