I have a simple question which relates to similar questions here, and here.
我有一个简单的问题,与此处的类似问题有关。
I am trying to drop all columns from a pandas dataframe, which have only zeroes (vertically, axis=1
). Let me give you an example:
我试图从pandas数据帧中删除所有列,这些数据帧只有零(垂直,轴= 1)。让我举一个例子:
df = pd.DataFrame({'a':[0,0,0,0], 'b':[0,-1,0,1]})
a b
0 0 0
1 0 -1
2 0 0
3 0 1
I'd like to drop column a
since it has only zeroes.
我想删除列,因为它只有零。
However, I'd like to do it in a nice and vectorized fashion if possible. My data set is huge - so I don't want to loop. Hence I tried
但是,如果可能的话,我想以漂亮和矢量化的方式做到这一点。我的数据集很大 - 所以我不想循环。因此我试过了
df = df.loc[(df).any(1), (df!=0).any(0)]
b
1 -1
3 1
Which allows me to drop both columns and rows. But if I just try to drop the columns, loc
seems to fail. Any ideas?
这允许我删除列和行。但是,如果我只是试图删除列,则看似失败。有任何想法吗?
3 个解决方案
#1
4
If it's a matter of 0s and not sum, use df.any
:
如果它是0的问题而不是总和,请使用df.any:
In [291]: df.T[df.any()].T
Out[291]:
b
0 0
1 -1
2 0
3 1
Alternatively:
或者:
In [296]: df.T[(df != 0).any()].T # or df.loc[:, (df != 0).any()]
Out[296]:
b
0 0
1 -1
2 0
3 1
#2
5
You are really close, use any
- 0
are casted to False
s:
你真的很接近,使用任何 - 0被铸造到Falses:
df = df.loc[:, df.any()]
print (df)
b
0 0
1 1
2 0
3 1
#3
4
In [73]: df.loc[:, df.ne(0).any()]
Out[73]:
b
0 0
1 1
2 0
3 1
or:
要么:
In [71]: df.loc[:, ~df.eq(0).all()]
Out[71]:
b
0 0
1 1
2 0
3 1
If we want to check those that do NOT sum up to 0
:
如果我们要检查那些不总和为0的那些:
In [78]: df.loc[:, df.sum().astype(bool)]
Out[78]:
b
0 0
1 1
2 0
3 1
#1
4
If it's a matter of 0s and not sum, use df.any
:
如果它是0的问题而不是总和,请使用df.any:
In [291]: df.T[df.any()].T
Out[291]:
b
0 0
1 -1
2 0
3 1
Alternatively:
或者:
In [296]: df.T[(df != 0).any()].T # or df.loc[:, (df != 0).any()]
Out[296]:
b
0 0
1 -1
2 0
3 1
#2
5
You are really close, use any
- 0
are casted to False
s:
你真的很接近,使用任何 - 0被铸造到Falses:
df = df.loc[:, df.any()]
print (df)
b
0 0
1 1
2 0
3 1
#3
4
In [73]: df.loc[:, df.ne(0).any()]
Out[73]:
b
0 0
1 1
2 0
3 1
or:
要么:
In [71]: df.loc[:, ~df.eq(0).all()]
Out[71]:
b
0 0
1 1
2 0
3 1
If we want to check those that do NOT sum up to 0
:
如果我们要检查那些不总和为0的那些:
In [78]: df.loc[:, df.sum().astype(bool)]
Out[78]:
b
0 0
1 1
2 0
3 1