I think there is a conceptual bug in the way multiIndex is created on a dataframe slice. Consider the following code:
我认为在数据帧切片上创建multiIndex的方式存在概念性错误。请考虑以下代码:
import cufflinks as cfdf=cf.datagen.lines(6,mode='abc')df.columns = MultiIndex.from_tuples([('Iter1','a'), ('Iter1','b'), ('Iter2','c'), ('Iter2','d'), ('Iter3','e'), ('Iter3','f')])df.head()
Which create a simple multiIndexed columned dataframe:
其中创建了一个简单的multiIndexed柱状数据帧:
Slicing this data frame:
切片此数据框:
new_df = df[['Iter1','Iter2']].copy()new_df.head()
So it seems like the data is presented ok, but behind the scenes the complete index is still there:
所以似乎数据显示正常,但在幕后,完整的索引仍然存在:
In [52]: new_df.columnsOut[52]:MultiIndex(levels=[[u'Iter1', u'Iter2', u'Iter3'], [u'a', u'b', u'c', u'd', u'e', u'f']], labels=[[0, 0, 1, 1], [0, 1, 2, 3]])
Which seems like a bug to me since now when trying to approach the last column in the sliced dataframe returns nothing:
从现在开始尝试接近切片数据帧中的最后一列时,这对我来说似乎是一个错误:
In [54]:last_col = new_df.columns.levels[0][-1]new_df[last_col].head()Out[54]:2015-01-012015-01-022015-01-032015-01-042015-01-05
I'm would like to pass to my function a couple of multi-columns by slicing my original dataframe but it seems like there is no way for me to approach those columns programmatically.
我想通过切片我的原始数据帧将几个多列传递给我的函数,但似乎没有办法让我以编程方式接近这些列。
1 个解决方案
#1
3
You need remove_unused_levels
what is new functionality in pandas 0.20.0
, you can also check docs:
你需要remove_unused_levels什么是pandas 0.20.0中的新功能,你也可以检查文档:
new_df.columns.remove_unused_levels()
Sample:
np.random.seed(23)cols = pd.MultiIndex.from_tuples([('Iter1','a'), ('Iter1','b'), ('Iter2','c'), ('Iter2','d'), ('Iter3','e'), ('Iter3','f')])idx = pd.date_range('2015-01-01', periods=5)df = pd.DataFrame(np.random.rand(5,6), columns=cols, index=idx)print (df) Iter1 Iter2 Iter3 a b c d e f2015-01-01 0.517298 0.946963 0.765460 0.282396 0.221045 0.6862222015-01-02 0.167139 0.392442 0.618052 0.411930 0.002465 0.8840322015-01-03 0.884948 0.300410 0.589582 0.978427 0.845094 0.0650752015-01-04 0.294744 0.287934 0.822466 0.626183 0.110478 0.0005292015-01-05 0.942166 0.141501 0.421597 0.346489 0.869785 0.428602
new_df = df[['Iter1','Iter2']].copy()print (new_df) Iter1 Iter2 a b c d2015-01-01 0.517298 0.946963 0.765460 0.2823962015-01-02 0.167139 0.392442 0.618052 0.4119302015-01-03 0.884948 0.300410 0.589582 0.9784272015-01-04 0.294744 0.287934 0.822466 0.6261832015-01-05 0.942166 0.141501 0.421597 0.346489print (new_df.columns)MultiIndex(levels=[['Iter1', 'Iter2', 'Iter3'], ['a', 'b', 'c', 'd', 'e', 'f']], labels=[[0, 0, 1, 1], [0, 1, 2, 3]])print (new_df.columns.remove_unused_levels())MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']], labels=[[0, 0, 1, 1], [0, 1, 2, 3]])new_df.columns = new_df.columns.remove_unused_levels()print (new_df.columns)MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']], labels=[[0, 0, 1, 1], [0, 1, 2, 3]])
#1
3
You need remove_unused_levels
what is new functionality in pandas 0.20.0
, you can also check docs:
你需要remove_unused_levels什么是pandas 0.20.0中的新功能,你也可以检查文档:
new_df.columns.remove_unused_levels()
Sample:
np.random.seed(23)cols = pd.MultiIndex.from_tuples([('Iter1','a'), ('Iter1','b'), ('Iter2','c'), ('Iter2','d'), ('Iter3','e'), ('Iter3','f')])idx = pd.date_range('2015-01-01', periods=5)df = pd.DataFrame(np.random.rand(5,6), columns=cols, index=idx)print (df) Iter1 Iter2 Iter3 a b c d e f2015-01-01 0.517298 0.946963 0.765460 0.282396 0.221045 0.6862222015-01-02 0.167139 0.392442 0.618052 0.411930 0.002465 0.8840322015-01-03 0.884948 0.300410 0.589582 0.978427 0.845094 0.0650752015-01-04 0.294744 0.287934 0.822466 0.626183 0.110478 0.0005292015-01-05 0.942166 0.141501 0.421597 0.346489 0.869785 0.428602
new_df = df[['Iter1','Iter2']].copy()print (new_df) Iter1 Iter2 a b c d2015-01-01 0.517298 0.946963 0.765460 0.2823962015-01-02 0.167139 0.392442 0.618052 0.4119302015-01-03 0.884948 0.300410 0.589582 0.9784272015-01-04 0.294744 0.287934 0.822466 0.6261832015-01-05 0.942166 0.141501 0.421597 0.346489print (new_df.columns)MultiIndex(levels=[['Iter1', 'Iter2', 'Iter3'], ['a', 'b', 'c', 'd', 'e', 'f']], labels=[[0, 0, 1, 1], [0, 1, 2, 3]])print (new_df.columns.remove_unused_levels())MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']], labels=[[0, 0, 1, 1], [0, 1, 2, 3]])new_df.columns = new_df.columns.remove_unused_levels()print (new_df.columns)MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']], labels=[[0, 0, 1, 1], [0, 1, 2, 3]])