Pandas multiIndex完全复制到数据帧切片

时间:2022-08-31 16:17:57

I think there is a conceptual bug in the way multiIndex is created on a dataframe slice. Consider the following code:

我认为在数据帧切片上创建multiIndex的方式存在概念性错误。请考虑以下代码:

import cufflinks as cfdf=cf.datagen.lines(6,mode='abc')df.columns = MultiIndex.from_tuples([('Iter1','a'), ('Iter1','b'),                                     ('Iter2','c'), ('Iter2','d'),                                     ('Iter3','e'), ('Iter3','f')])df.head()

Which create a simple multiIndexed columned dataframe:

其中创建了一个简单的multiIndexed柱状数据帧:

Pandas multiIndex完全复制到数据帧切片

Slicing this data frame:

切片此数据框:

new_df = df[['Iter1','Iter2']].copy()new_df.head()

Pandas multiIndex完全复制到数据帧切片

So it seems like the data is presented ok, but behind the scenes the complete index is still there:

所以似乎数据显示正常,但在幕后,完整的索引仍然存在:

In [52]: new_df.columnsOut[52]:MultiIndex(levels=[[u'Iter1', u'Iter2', u'Iter3'], [u'a', u'b', u'c', u'd', u'e', u'f']],           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])

Which seems like a bug to me since now when trying to approach the last column in the sliced dataframe returns nothing:

从现在开始尝试接近切片数据帧中的最后一列时,这对我来说似乎是一个错误:

In [54]:last_col = new_df.columns.levels[0][-1]new_df[last_col].head()Out[54]:2015-01-012015-01-022015-01-032015-01-042015-01-05

I'm would like to pass to my function a couple of multi-columns by slicing my original dataframe but it seems like there is no way for me to approach those columns programmatically.

我想通过切片我的原始数据帧将几个多列传递给我的函数,但似乎没有办法让我以编程方式接近这些列。

1 个解决方案

#1


3  

You need remove_unused_levels what is new functionality in pandas 0.20.0, you can also check docs:

你需要remove_unused_levels什么是pandas 0.20.0中的新功能,你也可以检查文档:

new_df.columns.remove_unused_levels()

Sample:

np.random.seed(23)cols = pd.MultiIndex.from_tuples([('Iter1','a'), ('Iter1','b'),                                     ('Iter2','c'), ('Iter2','d'),                                     ('Iter3','e'), ('Iter3','f')])idx = pd.date_range('2015-01-01', periods=5)df = pd.DataFrame(np.random.rand(5,6), columns=cols, index=idx)print (df)               Iter1               Iter2               Iter3                             a         b         c         d         e         f2015-01-01  0.517298  0.946963  0.765460  0.282396  0.221045  0.6862222015-01-02  0.167139  0.392442  0.618052  0.411930  0.002465  0.8840322015-01-03  0.884948  0.300410  0.589582  0.978427  0.845094  0.0650752015-01-04  0.294744  0.287934  0.822466  0.626183  0.110478  0.0005292015-01-05  0.942166  0.141501  0.421597  0.346489  0.869785  0.428602

new_df = df[['Iter1','Iter2']].copy()print (new_df)               Iter1               Iter2                             a         b         c         d2015-01-01  0.517298  0.946963  0.765460  0.2823962015-01-02  0.167139  0.392442  0.618052  0.4119302015-01-03  0.884948  0.300410  0.589582  0.9784272015-01-04  0.294744  0.287934  0.822466  0.6261832015-01-05  0.942166  0.141501  0.421597  0.346489print (new_df.columns)MultiIndex(levels=[['Iter1', 'Iter2', 'Iter3'], ['a', 'b', 'c', 'd', 'e', 'f']],           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])print (new_df.columns.remove_unused_levels())MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']],           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])new_df.columns = new_df.columns.remove_unused_levels()print (new_df.columns)MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']],           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])

#1


3  

You need remove_unused_levels what is new functionality in pandas 0.20.0, you can also check docs:

你需要remove_unused_levels什么是pandas 0.20.0中的新功能,你也可以检查文档:

new_df.columns.remove_unused_levels()

Sample:

np.random.seed(23)cols = pd.MultiIndex.from_tuples([('Iter1','a'), ('Iter1','b'),                                     ('Iter2','c'), ('Iter2','d'),                                     ('Iter3','e'), ('Iter3','f')])idx = pd.date_range('2015-01-01', periods=5)df = pd.DataFrame(np.random.rand(5,6), columns=cols, index=idx)print (df)               Iter1               Iter2               Iter3                             a         b         c         d         e         f2015-01-01  0.517298  0.946963  0.765460  0.282396  0.221045  0.6862222015-01-02  0.167139  0.392442  0.618052  0.411930  0.002465  0.8840322015-01-03  0.884948  0.300410  0.589582  0.978427  0.845094  0.0650752015-01-04  0.294744  0.287934  0.822466  0.626183  0.110478  0.0005292015-01-05  0.942166  0.141501  0.421597  0.346489  0.869785  0.428602

new_df = df[['Iter1','Iter2']].copy()print (new_df)               Iter1               Iter2                             a         b         c         d2015-01-01  0.517298  0.946963  0.765460  0.2823962015-01-02  0.167139  0.392442  0.618052  0.4119302015-01-03  0.884948  0.300410  0.589582  0.9784272015-01-04  0.294744  0.287934  0.822466  0.6261832015-01-05  0.942166  0.141501  0.421597  0.346489print (new_df.columns)MultiIndex(levels=[['Iter1', 'Iter2', 'Iter3'], ['a', 'b', 'c', 'd', 'e', 'f']],           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])print (new_df.columns.remove_unused_levels())MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']],           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])new_df.columns = new_df.columns.remove_unused_levels()print (new_df.columns)MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']],           labels=[[0, 0, 1, 1], [0, 1, 2, 3]])