根据值删除Pandas中的DataFrame列

时间:2021-01-01 21:39:59

I have a dataframe something like this:

我有一个像这样的数据框:

    Col0    Col1    Col2    Col3
1   a       b       g       a
2   a       d       z       a
3   a       g       x       a
4   a       h       p       a
5   a       b       c       a

I need to remove the columns where the value is 'a'. No other cells contain the value 'a'(Ex. Here Col1 and Col2 will have no cells with value 'a').I have around 1000 columns and I'm not really sure what all columns have the value 'a'. The dataframe required should be something like this.,

我需要删除值为'a'的列。没有其他单元格包含值'a'(例如,Col1和Col2将没有值为'a'的单元格。)我有大约1000列,我不确定所有列的值是否为'a'。所需的数据框应该是这样的。

    Col1    Col2
1   b       g   
2   d       z    
3   g       x    
4   h       p    
5   b       c    

What's the best way to do this?

最好的方法是什么?

2 个解决方案

#1


4  

Use any if need check if at least one True or all if need check all Trues with boolean indexing and loc, because filter columns:

如果需要检查是否至少有一个True或全部如果需要检查所有具有布尔索引和loc的Trues,因为过滤器列:

print (df)
  Col0 Col1 Col2 Col3
0    a    a    g    a
1    a    d    z    a
2    a    g    x    a
3    a    h    p    a
4    a    b    c    a


df2 = df.loc[:, ~(df == 'a').any()]
print (df2)
  Col2
0    g
1    z
2    x
3    p
4    c

df1 = df.loc[:, ~(df == 'a').all()]
print (df1)
  Col1 Col2
0    a    g
1    d    z
2    g    x
3    h    p
4    b    c

Detail:

详情:

print (df == 'a')

   Col0   Col1   Col2  Col3
0  True   True  False  True
1  True  False  False  True
2  True  False  False  True
3  True  False  False  True
4  True  False  False  True

df2 = df.loc[:, (df != 'a').any()]
print (df2)
  Col1 Col2
0    a    g
1    d    z
2    g    x
3    h    p
4    b    c

df1 = df.loc[:, (df != 'a').all()]
print (df1)
  Col2
0    g
1    z
2    x
3    p
4    c

print (df != 'a')

    Col0   Col1  Col2   Col3
0  False  False  True  False
1  False   True  True  False
2  False   True  True  False
3  False   True  True  False
4  False   True  True  False

EDIT:

编辑:

For check mixed types - numeric with strings are 2 possible solutions convert all to strings or compare numpy arrays:

对于检查混合类型 - 带字符串的数字是2种可能的解决方案将所有转换为字符串或比较numpy数组:

df.astype(str) == 'a'

Or:

要么:

df.values == 'a'

#2


3  

Option 1
Using pd.DataFrame.dropna with pd.DataFrame.mask
The concept is that I replace 'a' with np.nan and then conveniently use dropna.

选项1将pd.DataFrame.dropna与pd.DataFrame.mask一起使用概念是我用np.nan替换'a'然后方便地使用dropna。

This drops the column even it has one a.

即使它有一个a,这也会使列丢弃。

df.mask(df.astype(str).eq('a')).dropna(1)

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

This requires that all elements of the column be a

这要求列的所有元素都是a

df.mask(df.astype(str).eq('a')).dropna(1, how='all')

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

Option 2
Creative way using np.where to find the unique column positions that have 'a'
This is cool because np.where will return a tuple of arrays that give the positions of all True values in an array. The second array of the tuple will be all the column positions. I grab a unique set of those and find the other column names.

选项2使用np.where创建具有'a'的唯一列位置的创造性方式这很酷,因为np.where将返回一个数组元组,给出数组中所有True值的位置。元组的第二个数组将是所有列位置。我抓住一组独特的,找到其他列名。

df[df.columns.difference(
       df.columns[np.unique(np.where(df.astype(str).eq('a'))[1]
)])]

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

Or similarly with pd.DataFrame.drop

或者类似于pd.DataFrame.drop

df.drop(df.columns[np.unique(np.where(df.astype(str).eq('a'))[1])], 1)

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

Option 3
Probably bad way of doing it.

选项3可能不好的做法。

df.loc[:, ~df.astype(str).sum().str.contains('a')]

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

#1


4  

Use any if need check if at least one True or all if need check all Trues with boolean indexing and loc, because filter columns:

如果需要检查是否至少有一个True或全部如果需要检查所有具有布尔索引和loc的Trues,因为过滤器列:

print (df)
  Col0 Col1 Col2 Col3
0    a    a    g    a
1    a    d    z    a
2    a    g    x    a
3    a    h    p    a
4    a    b    c    a


df2 = df.loc[:, ~(df == 'a').any()]
print (df2)
  Col2
0    g
1    z
2    x
3    p
4    c

df1 = df.loc[:, ~(df == 'a').all()]
print (df1)
  Col1 Col2
0    a    g
1    d    z
2    g    x
3    h    p
4    b    c

Detail:

详情:

print (df == 'a')

   Col0   Col1   Col2  Col3
0  True   True  False  True
1  True  False  False  True
2  True  False  False  True
3  True  False  False  True
4  True  False  False  True

df2 = df.loc[:, (df != 'a').any()]
print (df2)
  Col1 Col2
0    a    g
1    d    z
2    g    x
3    h    p
4    b    c

df1 = df.loc[:, (df != 'a').all()]
print (df1)
  Col2
0    g
1    z
2    x
3    p
4    c

print (df != 'a')

    Col0   Col1  Col2   Col3
0  False  False  True  False
1  False   True  True  False
2  False   True  True  False
3  False   True  True  False
4  False   True  True  False

EDIT:

编辑:

For check mixed types - numeric with strings are 2 possible solutions convert all to strings or compare numpy arrays:

对于检查混合类型 - 带字符串的数字是2种可能的解决方案将所有转换为字符串或比较numpy数组:

df.astype(str) == 'a'

Or:

要么:

df.values == 'a'

#2


3  

Option 1
Using pd.DataFrame.dropna with pd.DataFrame.mask
The concept is that I replace 'a' with np.nan and then conveniently use dropna.

选项1将pd.DataFrame.dropna与pd.DataFrame.mask一起使用概念是我用np.nan替换'a'然后方便地使用dropna。

This drops the column even it has one a.

即使它有一个a,这也会使列丢弃。

df.mask(df.astype(str).eq('a')).dropna(1)

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

This requires that all elements of the column be a

这要求列的所有元素都是a

df.mask(df.astype(str).eq('a')).dropna(1, how='all')

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

Option 2
Creative way using np.where to find the unique column positions that have 'a'
This is cool because np.where will return a tuple of arrays that give the positions of all True values in an array. The second array of the tuple will be all the column positions. I grab a unique set of those and find the other column names.

选项2使用np.where创建具有'a'的唯一列位置的创造性方式这很酷,因为np.where将返回一个数组元组,给出数组中所有True值的位置。元组的第二个数组将是所有列位置。我抓住一组独特的,找到其他列名。

df[df.columns.difference(
       df.columns[np.unique(np.where(df.astype(str).eq('a'))[1]
)])]

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

Or similarly with pd.DataFrame.drop

或者类似于pd.DataFrame.drop

df.drop(df.columns[np.unique(np.where(df.astype(str).eq('a'))[1])], 1)

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

Option 3
Probably bad way of doing it.

选项3可能不好的做法。

df.loc[:, ~df.astype(str).sum().str.contains('a')]

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c