Python Pandas文件管理器列由多个字符串组成

时间:2021-12-16 04:30:28

So I have a Python 2.7 Pandas data frame with lots of columns like:

所以我有一个Python 2.7 Pandas数据框,有很多列,如:

['SiteName', 'SSP', 'PlatformClientCost', 'rawmachinecost', 'rawmachineprice', 'ClientBid' +... + 20 more]

And I would like to exclude all the columns contains either the word 'Platform' or 'Client' and below is my attempt:

我想排除所有包含“平台”或“客户”字样的列,以下是我的尝试:

col = [c for c in dataframe.columns if c.lower() not in ('platform','client') ]
print col
['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost', 'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM', 'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio']

I cannot find any related solutions online so any help would be super grateful!

我在网上找不到任何相关的解决方案,所以任何帮助都会非常感激!

Thanks, Will

2 个解决方案

#1


1  

use the vectorised str.contains:

使用vectorised str.contains:

In [222]:
df = pd.DataFrame(columns=['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost', 'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM', 'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio'])
df.columns

Out[222]:
Index(['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost',
       'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM',
       'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio'],
      dtype='object')

In [223]:
df.columns[~df.columns.str.contains(r'platform|client', case=False)]
​
Out[223]:
Index(['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'rawbidprice',
       'RawBidCPM', 'CostCPM', 'BidRatio'],
      dtype='object')

here we can pass a regex pattern and case=False so you don't need lower here, which will return a boolean mask:

在这里我们可以传递一个正则表达式模式和case = False所以你不需要在这里降低,这将返回一个布尔掩码:

In [225]:
df.columns.str.contains(r'platform|client', case=False)

Out[225]:
array([False, False, False, False, False,  True,  True, False,  True,
       False,  True, False,  True, False], dtype=bool)

we then apply the negation operator ~ to invert the boolean mask and mask the column array.

然后我们应用否定运算符〜来反转布尔掩码并掩盖列数组。

#2


1  

It's a nice attempt but you got your logic mixed up somewhere:

这是一个很好的尝试,但你的逻辑在某处混淆了:

col = [c for c in dataframe.columns if c.lower() not in ('platform','client') ]
print col
['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost', 'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM', 'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio']

Look closely at your condition. You are excluding only columns whose name exactly matches (regardless of case) "platform" and "client".

仔细看看你的情况。您只排除名称与“平台”和“客户端”完全匹配的列(无论情况如何)。

What you'd want would be:

你想要的是:

col = [c for c in dataframe.columns if 'platform' not in c.lower() and 'client' not in c.lower()]
print col
['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'rawbidprice', 'RawBidCPM', 'CostCPM', 'BidRatio']

EdChum's answer using pandas methods is probably more efficient if that matters for you.

EdChum使用pandas方法的答案可能更有效,如果这对你很重要。

#1


1  

use the vectorised str.contains:

使用vectorised str.contains:

In [222]:
df = pd.DataFrame(columns=['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost', 'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM', 'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio'])
df.columns

Out[222]:
Index(['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost',
       'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM',
       'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio'],
      dtype='object')

In [223]:
df.columns[~df.columns.str.contains(r'platform|client', case=False)]
​
Out[223]:
Index(['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'rawbidprice',
       'RawBidCPM', 'CostCPM', 'BidRatio'],
      dtype='object')

here we can pass a regex pattern and case=False so you don't need lower here, which will return a boolean mask:

在这里我们可以传递一个正则表达式模式和case = False所以你不需要在这里降低,这将返回一个布尔掩码:

In [225]:
df.columns.str.contains(r'platform|client', case=False)

Out[225]:
array([False, False, False, False, False,  True,  True, False,  True,
       False,  True, False,  True, False], dtype=bool)

we then apply the negation operator ~ to invert the boolean mask and mask the column array.

然后我们应用否定运算符〜来反转布尔掩码并掩盖列数组。

#2


1  

It's a nice attempt but you got your logic mixed up somewhere:

这是一个很好的尝试,但你的逻辑在某处混淆了:

col = [c for c in dataframe.columns if c.lower() not in ('platform','client') ]
print col
['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'PlatformClientCost', 'rawplatformcost', 'rawbidprice', 'PlatformClientBid', 'RawBidCPM', 'ClientBidCPM', 'CostCPM', 'ClientCostCPM', 'BidRatio']

Look closely at your condition. You are excluding only columns whose name exactly matches (regardless of case) "platform" and "client".

仔细看看你的情况。您只排除名称与“平台”和“客户端”完全匹配的列(无论情况如何)。

What you'd want would be:

你想要的是:

col = [c for c in dataframe.columns if 'platform' not in c.lower() and 'client' not in c.lower()]
print col
['SiteName', 'SSP', 'IONumber', 'userkey', 'Imps', 'rawbidprice', 'RawBidCPM', 'CostCPM', 'BidRatio']

EdChum's answer using pandas methods is probably more efficient if that matters for you.

EdChum使用pandas方法的答案可能更有效,如果这对你很重要。