从pandas中的数据帧中删除无限值？

what is the quickest/simplest way to drop nan and inf/-inf values from a pandas DataFrame without resetting mode.use_inf_as_null? I'd like to be able to use the subset and how arguments of dropna, except with inf values considered missing, like:

什么是从pandas DataFrame中删除nan和inf / -inf值而不重置mode.use_inf_as_null的最快/最简单的方法？我希望能够使用子集以及dropna的参数，除了inf值被认为缺失，例如：

df.dropna(subset=["col1", "col2"], how="all", with_inf=True)

is this possible? Is there a way to tell dropna to include inf in its definition of missing values?

这可能吗？有没有办法告诉dropna在缺失值的定义中包含inf？

6 个解决方案

#1

193

The simplest way would be to first replace infs to NaN:

最简单的方法是首先将infs替换为NaN：

df.replace([np.inf, -np.inf], np.nan)

and then use the dropna:

然后使用dropna：

df.replace([np.inf, -np.inf], np.nan).dropna(subset=["col1", "col2"], how="all")

For example:

例如：

In [11]: df = pd.DataFrame([1, 2, np.inf, -np.inf])

In [12]: df.replace([np.inf, -np.inf], np.nan)
Out[12]:
    0
0   1
1   2
2 NaN
3 NaN

The same method would work for a Series.

同样的方法适用于系列。

#2

Here is another method using .loc to replace inf with nan on a Series:

这是使用.loc在系列上用nan替换inf的另一种方法：

s.loc[(~np.isfinite(s)) & s.notnull()] = np.nan

So, in response to the original question:

所以，回答原来的问题：

df = pd.DataFrame(np.ones((3, 3)), columns=list('ABC'))

for i in range(3): 
    df.iat[i, i] = np.inf

df
          A         B         C
0       inf  1.000000  1.000000
1  1.000000       inf  1.000000
2  1.000000  1.000000       inf

df.sum()
A    inf
B    inf
C    inf
dtype: float64

df.apply(lambda s: s[np.isfinite(s)].dropna()).sum()
A    2
B    2
C    2
dtype: float64

#3

With option context, this is possible without permanently setting use_inf_as_null. For example:

使用选项上下文，无需永久设置use_inf_as_null即可。例如：

with pd.option_context('mode.use_inf_as_null', True):
    df = df.dropna(subset=['col1', 'col2'], how='all')

Of course it can be set to treat inf as NaN permanently with pd.set_option('use_inf_as_null', True) too.

当然可以设置为使用pd.set_option（'use_inf_as_null'，True）将inf永久地视为NaN。

#4

The above solution will modify the infs that are not in the target columns. To remedy that,

上述解决方案将修改不在目标列中的inf。要解决这个问题，

lst = [np.inf, -np.inf]
to_replace = dict((v, lst) for v in ['col1', 'col2'])
df.replace(to_replace, np.nan)

#5

Yet another solution would be to use the isin method. Use it to determine whether each value is infinite or missing and then chain the all method to determine if all the values in the rows are infinite or missing.

另一种解决方案是使用isin方法。使用它来确定每个值是无限还是缺失，然后链接all方法以确定行中的所有值是无限还是缺失。

Finally, use the negation of that result to select the rows that don't have all infinite or missing values via boolean indexing.

最后，使用该结果的否定通过布尔索引选择没有所有无限或缺失值的行。

all_inf_or_nan = df.isin([np.inf, -np.inf, np.nan]).all(axis='columns')
df[~all_inf_or_nan]

#6

You can use pd.DataFrame.mask with np.isinf. You should ensure first your dataframe series are all of type float. Then use dropna with your existing logic.

您可以将pd.DataFrame.mask与np.isinf一起使用。您应首先确保您的数据帧系列都是float类型。然后使用dropna和您现有的逻辑。

print(df)

       col1      col2
0 -0.441406       inf
1 -0.321105      -inf
2 -0.412857  2.223047
3 -0.356610  2.513048

df = df.mask(np.isinf(df))

print(df)

       col1      col2
0 -0.441406       NaN
1 -0.321105       NaN
2 -0.412857  2.223047
3 -0.356610  2.513048

#1

193