Pandas:从DataFrame中删除带有nans,0和NA的所有列

时间:2022-09-26 23:01:52

I have a DataFrame that looks like this:

我有一个看起来像这样的DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
                'B': [0, np.nan, np.nan, 0, 0, 0],
                'C': [0, 0, 0, 0, 0, 0.0],
                'D': [5, 5, 5, 5, 5.6, 6.8],
                'E': ['NA', 'NA', 'NA', 'NA', 'NA', 'NA'],})

How would I drop all the NA, Nans and 0 in the columns so I would get the following output?

我如何删除列中的所有NA,Nans和0,以便获得以下输出?

df2 = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
                'D': [5, 5, 5, 5, 5.6, 6.8],})

So far I know .dropna() will get rid of all the nan also I tried df2=df[~(df==0).all(axis=1)] and it did not work.

到目前为止,我知道.dropna()将摆脱所有的纳,我也试过df2 = df [〜(df == 0).all(axis = 1)]并且它不起作用。

2 个解决方案

#1


1  

>>> df
     A   B  C    D   E
0  1.0   0  0  5.0  NA
1  2.1 NaN  0  5.0  NA
2  NaN NaN  0  5.0  NA
3  4.7   0  0  5.0  NA
4  5.6   0  0  5.6  NA
5  6.8   0  0  6.8  NA
>>> f = df.replace([0,'NA'], np.nan).apply(lambda x: any(~x.isnull()))
>>> f
A     True
B    False
C    False
D     True
E    False
dtype: bool
>>> df.loc[:,f]
     A    D
0  1.0  5.0
1  2.1  5.0
2  NaN  5.0
3  4.7  5.0
4  5.6  5.6
5  6.8  6.8

#2


1  

You could try using df.isin() and all() to find an array of columns which don't contain only null values and then use this array to select the relevant columns of df:

您可以尝试使用df.isin()和all()来查找不包含空值的列数组,然后使用此数组选择df的相关列:

>>> df[df.columns[(~df.isin([NaN, 'NA', 0])).all().values]]
     A    D
0  1.0  5.0
1  2.1  5.0
2  NaN  5.0
3  4.7  5.0
4  5.6  5.6
5  6.8  6.8

Or more concisely: df.loc[:, (~df.isin([NaN, 'NA', 0])).all()]

或者更简洁:df.loc [:,(~df.isin([NaN,'NA',0]))。all()]

#1


1  

>>> df
     A   B  C    D   E
0  1.0   0  0  5.0  NA
1  2.1 NaN  0  5.0  NA
2  NaN NaN  0  5.0  NA
3  4.7   0  0  5.0  NA
4  5.6   0  0  5.6  NA
5  6.8   0  0  6.8  NA
>>> f = df.replace([0,'NA'], np.nan).apply(lambda x: any(~x.isnull()))
>>> f
A     True
B    False
C    False
D     True
E    False
dtype: bool
>>> df.loc[:,f]
     A    D
0  1.0  5.0
1  2.1  5.0
2  NaN  5.0
3  4.7  5.0
4  5.6  5.6
5  6.8  6.8

#2


1  

You could try using df.isin() and all() to find an array of columns which don't contain only null values and then use this array to select the relevant columns of df:

您可以尝试使用df.isin()和all()来查找不包含空值的列数组,然后使用此数组选择df的相关列:

>>> df[df.columns[(~df.isin([NaN, 'NA', 0])).all().values]]
     A    D
0  1.0  5.0
1  2.1  5.0
2  NaN  5.0
3  4.7  5.0
4  5.6  5.6
5  6.8  6.8

Or more concisely: df.loc[:, (~df.isin([NaN, 'NA', 0])).all()]

或者更简洁:df.loc [:,(~df.isin([NaN,'NA',0]))。all()]