So I have a Python 2.7 Pandas data frame with lots of columns like:
所以我有一个Python 2.7 Pandas数据框,有很多列,如:
['SiteName', 'SSP', 'PlatformClientCost', 'rawmachinecost', 'rawmachineprice', 'ClientBid' +... + 20 more]
And I would like to exclude all the columns contains either the word 'Platform' or 'Client' and below is my attempt:
我想排除所有包含“平台”或“客户”字样的列,以下是我的尝试:
col = [c for c in dataframe.columns if c.lower() not in ('platform','client') ]
print col
['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost', 'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM', 'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio']
I cannot find any related solutions online so any help would be super grateful!
我在网上找不到任何相关的解决方案,所以任何帮助都会非常感激!
Thanks, Will
2 个解决方案
#1
1
use the vectorised str.contains
:
使用vectorised str.contains:
In [222]:
df = pd.DataFrame(columns=['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost', 'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM', 'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio'])
df.columns
Out[222]:
Index(['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost',
'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM',
'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio'],
dtype='object')
In [223]:
df.columns[~df.columns.str.contains(r'platform|client', case=False)]
Out[223]:
Index(['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'rawbidprice',
'RawBidCPM', 'CostCPM', 'BidRatio'],
dtype='object')
here we can pass a regex pattern and case=False
so you don't need lower
here, which will return a boolean mask:
在这里我们可以传递一个正则表达式模式和case = False所以你不需要在这里降低,这将返回一个布尔掩码:
In [225]:
df.columns.str.contains(r'platform|client', case=False)
Out[225]:
array([False, False, False, False, False, True, True, False, True,
False, True, False, True, False], dtype=bool)
we then apply the negation operator ~
to invert the boolean mask and mask the column array.
然后我们应用否定运算符〜来反转布尔掩码并掩盖列数组。
#2
1
It's a nice attempt but you got your logic mixed up somewhere:
这是一个很好的尝试,但你的逻辑在某处混淆了:
col = [c for c in dataframe.columns if c.lower() not in ('platform','client') ]
print col
['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost', 'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM', 'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio']
Look closely at your condition. You are excluding only columns whose name exactly matches (regardless of case) "platform" and "client".
仔细看看你的情况。您只排除名称与“平台”和“客户端”完全匹配的列(无论情况如何)。
What you'd want would be:
你想要的是:
col = [c for c in dataframe.columns if 'platform' not in c.lower() and 'client' not in c.lower()]
print col
['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'rawbidprice', 'RawBidCPM', 'CostCPM', 'BidRatio']
EdChum's answer using pandas methods is probably more efficient if that matters for you.
EdChum使用pandas方法的答案可能更有效,如果这对你很重要。
#1
1
use the vectorised str.contains
:
使用vectorised str.contains:
In [222]:
df = pd.DataFrame(columns=['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost', 'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM', 'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio'])
df.columns
Out[222]:
Index(['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost',
'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM',
'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio'],
dtype='object')
In [223]:
df.columns[~df.columns.str.contains(r'platform|client', case=False)]
Out[223]:
Index(['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'rawbidprice',
'RawBidCPM', 'CostCPM', 'BidRatio'],
dtype='object')
here we can pass a regex pattern and case=False
so you don't need lower
here, which will return a boolean mask:
在这里我们可以传递一个正则表达式模式和case = False所以你不需要在这里降低,这将返回一个布尔掩码:
In [225]:
df.columns.str.contains(r'platform|client', case=False)
Out[225]:
array([False, False, False, False, False, True, True, False, True,
False, True, False, True, False], dtype=bool)
we then apply the negation operator ~
to invert the boolean mask and mask the column array.
然后我们应用否定运算符〜来反转布尔掩码并掩盖列数组。
#2
1
It's a nice attempt but you got your logic mixed up somewhere:
这是一个很好的尝试,但你的逻辑在某处混淆了:
col = [c for c in dataframe.columns if c.lower() not in ('platform','client') ]
print col
['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost', 'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM', 'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio']
Look closely at your condition. You are excluding only columns whose name exactly matches (regardless of case) "platform" and "client".
仔细看看你的情况。您只排除名称与“平台”和“客户端”完全匹配的列(无论情况如何)。
What you'd want would be:
你想要的是:
col = [c for c in dataframe.columns if 'platform' not in c.lower() and 'client' not in c.lower()]
print col
['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'rawbidprice', 'RawBidCPM', 'CostCPM', 'BidRatio']
EdChum's answer using pandas methods is probably more efficient if that matters for you.
EdChum使用pandas方法的答案可能更有效,如果这对你很重要。