重塑层次化索引

层次化索引为DataFrame的重排提供了良好的一致性操作，主要方法有

stack ：将数据的列旋转为行

unstack：将数据的行转换为列

用一个dataframe对象举例

In [4]: data = DataFrame(np.arange(6).reshape((2,3)),index = pd.Index(['Ohio','Colorado'],name='state'),columns = pd.Index(['one','two','three'],name = 'number'))

In [5]: data

Out[5]:

number    one  two  three

state

Ohio        0    1      2

Colorado    3    4      5

In [6]: data.stack()#将列索引转换为行索引

Out[6]:

state     number

Ohio      one       0

          two       1

          three     2

Colorado  one       3

          two       4

          three     5

dtype: int32

In [7]: data.unstack()#将行索引转换为列索引

Out[7]:

number  state

one     Ohio        0

        Colorado    3

two     Ohio        1

        Colorado    4

three   Ohio        2

        Colorado    5

dtype: int32

In [9]: data.unstack().index

Out[9]:

MultiIndex(levels=[['one', 'two', 'three'], ['Ohio', 'Colorado']],

           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],

           names=['number', 'state'])

In [10]:

对于DataFrame，无论是使用unstack，还是stack，得到都是一个Series对象

Series对象，只有unstack方法。

默认情况下，unstack操作的是最内层，传入分层级别的编号或名称即可对相应级别的索引做操作。

In [21]: result.unstack(0)

Out[21]:

state   Ohio  Colorado

number

one        0         3

two        1         4

three      2         5

In [22]: result.unstack()

Out[22]:

number    one  two  three

state

Ohio        0    1      2

Colorado    3    4      5

In [23]: result.unstack('state')

Out[23]:

state   Ohio  Colorado

number

one        0         3

two        1         4

three      2         5

如果不是所有的级别的值都能在个分组中找到的话，则unstack会引入缺失值

In [24]: s1 =Series([0,1,2,3],index = ['a','b','c','d'])

In [25]: s2 = Series([4,5,6],index = ['c','d','e'])

In [26]: data2 = pd.concat([s1,s2],keys = ['one','two'])

In [27]: data2

Out[27]:

one  a    0

     b    1

     c    2

     d    3

two  c    4

     d    5

     e    6

dtype: int64

In [28]: data2.unstack()

Out[28]:

       a    b    c    d    e

one  0.0  1.0  2.0  3.0  NaN

two  NaN  NaN  4.0  5.0  6.0

In [29]: data2.unstack(0)

Out[29]:

   one  two

a  0.0  NaN

b  1.0  NaN

c  2.0  4.0

d  3.0  5.0

e  NaN  6.0

而stack默认会滤除缺失值。

在对DataFrame进行旋转操作时，旋转的轴会成为旋转后索引的最低级别。也就是最内层索引。

秒客网

pandas(八)重塑和轴向旋转

重塑层次化索引

相关文章