Inspired by this answer and the lack of an easy answer to this question I found myself writing a little syntactic sugar to make life easier to filter by MultiIndex level.
受到这个答案的启发,以及对这个问题缺乏一个简单的答案,我发现我自己写了一些语法糖,使生活更容易被多指标水平过滤。
def _filter_series(x, level_name, filter_by):
"""
Filter a pd.Series or pd.DataFrame x by `filter_by` on the MultiIndex level
`level_name`
Uses `pd.Index.get_level_values()` in the background. `filter_by` is either
a string or an iterable.
"""
if isinstance(x, pd.Series) or isinstance(x, pd.DataFrame):
if type(filter_by) is str:
filter_by = [filter_by]
index = x.index.get_level_values(level_name).isin(filter_by)
return x[index]
else:
print "Not a pandas object"
But if I know the pandas development team (and I'm starting to, slowly!) there's already a nice way to do this, and I just don't know what it is yet!
但是如果我知道熊猫发展团队(我开始慢慢地!)有一个很好的方法来做这个,我只是不知道它是什么!
Am I right?
我说的对吗?
3 个解决方案
#1
4
I actually upvoted joris's answer... but unfortunately the refactoring he mentions has not happened in 0.14 and is not happening in 0.17 neither. So for the moment let me suggest a quick and dirty solution (obviously derived from Jeff's one):
实际上我赞成乔里斯的回答……但不幸的是,他提到的重构在0.14中没有发生,在0.17中也没有发生。所以现在,让我建议一个快速而肮脏的解决方案(很明显是来自Jeff的一个):
def filter_by(df, constraints):
"""Filter MultiIndex by sublevels."""
indexer = [constraints[name] if name in constraints else slice(None)
for name in df.index.names]
return df.loc[tuple(indexer)] if len(df.shape) == 1 else df.loc[tuple(indexer),]
pd.Series.filter_by = filter_by
pd.DataFrame.filter_by = filter_by
... to be used as
…作为
df.filter_by({'level_name' : value})
where value
can be indeed a single value, but also a list, a slice...
这里的值可以是一个单独的值,也可以是一个列表,一个切片……
(untested with Panels and higher dimension elements, but I do expect it to work)
(未经测试的面板和更高维度的元素,但我希望它能起作用)
#2
3
This is very easy using the new multi-index slicers in master/0.14 (releasing soon), see here
在master/0.14(即将发布)中使用新的多索引切片器非常容易,请参见这里
There is an open issue to make this syntatically easier (its not hard to do), see here e.g something like this: df.loc[{ 'third' : ['C1','C3'] }]
I think is reasonable
有一个开放的问题使语法更容易(这并不难),请看这里的e。像这样的东西:df。loc[{'third': ['C1','C3']]我认为是合理的
Here's how you can do it (requires master/0.14):
你可以这样做(需要master/0.14):
In [2]: def mklbl(prefix,n):
...: return ["%s%s" % (prefix,i) for i in range(n)]
...:
In [11]: index = MultiIndex.from_product([mklbl('A',4),
mklbl('B',2),
mklbl('C',4),
mklbl('D',2)],names=['first','second','third','fourth'])
In [12]: columns = ['value']
In [13]: df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),index=index,columns=columns).sortlevel()
In [14]: df
Out[14]:
value
first second third fourth
A0 B0 C0 D0 0
D1 1
C1 D0 2
D1 3
C2 D0 4
D1 5
C3 D0 6
D1 7
B1 C0 D0 8
D1 9
C1 D0 10
D1 11
C2 D0 12
D1 13
C3 D0 14
D1 15
A1 B0 C0 D0 16
D1 17
C1 D0 18
D1 19
C2 D0 20
D1 21
C3 D0 22
D1 23
B1 C0 D0 24
D1 25
C1 D0 26
D1 27
C2 D0 28
D1 29
C3 D0 30
D1 31
A2 B0 C0 D0 32
D1 33
C1 D0 34
D1 35
C2 D0 36
D1 37
C3 D0 38
D1 39
B1 C0 D0 40
D1 41
C1 D0 42
D1 43
C2 D0 44
D1 45
C3 D0 46
D1 47
A3 B0 C0 D0 48
D1 49
C1 D0 50
D1 51
C2 D0 52
D1 53
C3 D0 54
D1 55
B1 C0 D0 56
D1 57
C1 D0 58
D1 59
...
[64 rows x 1 columns]
Create an indexer across all of the levels, selecting all entries
在所有级别上创建索引器,选择所有条目。
In [15]: indexer = [slice(None)]*len(df.index.names)
Make the level we care about only have the entries we care about
使我们关心的级别只有我们关心的条目
In [16]: indexer[df.index.names.index('third')] = ['C1','C3']
Select it (its important that this is a tuple!)
选择它(重要的是这是一个元组!)
In [18]: df.loc[tuple(indexer),:]
Out[18]:
value
first second third fourth
A0 B0 C1 D0 2
D1 3
C3 D0 6
D1 7
B1 C1 D0 10
D1 11
C3 D0 14
D1 15
A1 B0 C1 D0 18
D1 19
C3 D0 22
D1 23
B1 C1 D0 26
D1 27
C3 D0 30
D1 31
A2 B0 C1 D0 34
D1 35
C3 D0 38
D1 39
B1 C1 D0 42
D1 43
C3 D0 46
D1 47
A3 B0 C1 D0 50
D1 51
C3 D0 54
D1 55
B1 C1 D0 58
D1 59
C3 D0 62
D1 63
[32 rows x 1 columns]
#3
1
You have the filter
method that can do things like this. Eg with the example that was asked in the linked SO question:
你有一个过滤器方法可以做这样的事情。例如,在连接SO问题中所问的例子:
In [188]: df.filter(like='0630', axis=0)
Out[188]:
sales cogs net_pft
STK_ID RPT_Date
876 20060630 857483000 729541000 67157200
20070630 1146245000 1050808000 113468500
20080630 1932470000 1777010000 133756300
2254 20070630 501221000 289167000 118012200
The filter method is refactored at the moment (in upcoming 0.14), and a level
keyword will be added (because now you can have a problem if the same labels appear in different levels of the index).
filter方法正在重构(在即将到来的0.14中),并将添加level关键字(因为如果相同的标签出现在索引的不同级别中,那么现在可能会出现问题)。
#1
4
I actually upvoted joris's answer... but unfortunately the refactoring he mentions has not happened in 0.14 and is not happening in 0.17 neither. So for the moment let me suggest a quick and dirty solution (obviously derived from Jeff's one):
实际上我赞成乔里斯的回答……但不幸的是,他提到的重构在0.14中没有发生,在0.17中也没有发生。所以现在,让我建议一个快速而肮脏的解决方案(很明显是来自Jeff的一个):
def filter_by(df, constraints):
"""Filter MultiIndex by sublevels."""
indexer = [constraints[name] if name in constraints else slice(None)
for name in df.index.names]
return df.loc[tuple(indexer)] if len(df.shape) == 1 else df.loc[tuple(indexer),]
pd.Series.filter_by = filter_by
pd.DataFrame.filter_by = filter_by
... to be used as
…作为
df.filter_by({'level_name' : value})
where value
can be indeed a single value, but also a list, a slice...
这里的值可以是一个单独的值,也可以是一个列表,一个切片……
(untested with Panels and higher dimension elements, but I do expect it to work)
(未经测试的面板和更高维度的元素,但我希望它能起作用)
#2
3
This is very easy using the new multi-index slicers in master/0.14 (releasing soon), see here
在master/0.14(即将发布)中使用新的多索引切片器非常容易,请参见这里
There is an open issue to make this syntatically easier (its not hard to do), see here e.g something like this: df.loc[{ 'third' : ['C1','C3'] }]
I think is reasonable
有一个开放的问题使语法更容易(这并不难),请看这里的e。像这样的东西:df。loc[{'third': ['C1','C3']]我认为是合理的
Here's how you can do it (requires master/0.14):
你可以这样做(需要master/0.14):
In [2]: def mklbl(prefix,n):
...: return ["%s%s" % (prefix,i) for i in range(n)]
...:
In [11]: index = MultiIndex.from_product([mklbl('A',4),
mklbl('B',2),
mklbl('C',4),
mklbl('D',2)],names=['first','second','third','fourth'])
In [12]: columns = ['value']
In [13]: df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),index=index,columns=columns).sortlevel()
In [14]: df
Out[14]:
value
first second third fourth
A0 B0 C0 D0 0
D1 1
C1 D0 2
D1 3
C2 D0 4
D1 5
C3 D0 6
D1 7
B1 C0 D0 8
D1 9
C1 D0 10
D1 11
C2 D0 12
D1 13
C3 D0 14
D1 15
A1 B0 C0 D0 16
D1 17
C1 D0 18
D1 19
C2 D0 20
D1 21
C3 D0 22
D1 23
B1 C0 D0 24
D1 25
C1 D0 26
D1 27
C2 D0 28
D1 29
C3 D0 30
D1 31
A2 B0 C0 D0 32
D1 33
C1 D0 34
D1 35
C2 D0 36
D1 37
C3 D0 38
D1 39
B1 C0 D0 40
D1 41
C1 D0 42
D1 43
C2 D0 44
D1 45
C3 D0 46
D1 47
A3 B0 C0 D0 48
D1 49
C1 D0 50
D1 51
C2 D0 52
D1 53
C3 D0 54
D1 55
B1 C0 D0 56
D1 57
C1 D0 58
D1 59
...
[64 rows x 1 columns]
Create an indexer across all of the levels, selecting all entries
在所有级别上创建索引器,选择所有条目。
In [15]: indexer = [slice(None)]*len(df.index.names)
Make the level we care about only have the entries we care about
使我们关心的级别只有我们关心的条目
In [16]: indexer[df.index.names.index('third')] = ['C1','C3']
Select it (its important that this is a tuple!)
选择它(重要的是这是一个元组!)
In [18]: df.loc[tuple(indexer),:]
Out[18]:
value
first second third fourth
A0 B0 C1 D0 2
D1 3
C3 D0 6
D1 7
B1 C1 D0 10
D1 11
C3 D0 14
D1 15
A1 B0 C1 D0 18
D1 19
C3 D0 22
D1 23
B1 C1 D0 26
D1 27
C3 D0 30
D1 31
A2 B0 C1 D0 34
D1 35
C3 D0 38
D1 39
B1 C1 D0 42
D1 43
C3 D0 46
D1 47
A3 B0 C1 D0 50
D1 51
C3 D0 54
D1 55
B1 C1 D0 58
D1 59
C3 D0 62
D1 63
[32 rows x 1 columns]
#3
1
You have the filter
method that can do things like this. Eg with the example that was asked in the linked SO question:
你有一个过滤器方法可以做这样的事情。例如,在连接SO问题中所问的例子:
In [188]: df.filter(like='0630', axis=0)
Out[188]:
sales cogs net_pft
STK_ID RPT_Date
876 20060630 857483000 729541000 67157200
20070630 1146245000 1050808000 113468500
20080630 1932470000 1777010000 133756300
2254 20070630 501221000 289167000 118012200
The filter method is refactored at the moment (in upcoming 0.14), and a level
keyword will be added (because now you can have a problem if the same labels appear in different levels of the index).
filter方法正在重构(在即将到来的0.14中),并将添加level关键字(因为如果相同的标签出现在索引的不同级别中,那么现在可能会出现问题)。