pandas(八)重塑和轴向旋转

时间:2021-11-27 07:30:08

重塑层次化索引

层次化索引为DataFrame的重排提供了良好的一致性操作,主要方法有

stack :将数据的列旋转为行

unstack:将数据的行转换为列

用一个dataframe对象举例

In [4]: data = DataFrame(np.arange(6).reshape((2,3)),index = pd.Index(['Ohio','Colorado'],name='state'),columns = pd.Index(['one','two','three'],name = 'number'))

In [5]: data
Out[5]:
number one two three
state
Ohio 0 1 2
Colorado 3 4 5 In [6]: data.stack()#将列索引转换为行索引
Out[6]:
state number
Ohio one 0
two 1
three 2
Colorado one 3
two 4
three 5
dtype: int32 In [7]: data.unstack()#将行索引转换为列索引
Out[7]:
number state
one Ohio 0
Colorado 3
two Ohio 1
Colorado 4
three Ohio 2
Colorado 5
dtype: int32 In [9]: data.unstack().index
Out[9]:
MultiIndex(levels=[['one', 'two', 'three'], ['Ohio', 'Colorado']],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
names=['number', 'state']) In [10]:

对于DataFrame,无论是使用unstack,还是stack,得到都是一个Series对象

Series对象,只有unstack方法。

默认情况下,unstack操作的是最内层,传入分层级别的编号或名称即可对相应级别的索引做操作。

In [21]: result.unstack(0)
Out[21]:
state Ohio Colorado
number
one 0 3
two 1 4
three 2 5 In [22]: result.unstack()
Out[22]:
number one two three
state
Ohio 0 1 2
Colorado 3 4 5 In [23]: result.unstack('state')
Out[23]:
state Ohio Colorado
number
one 0 3
two 1 4
three 2 5

如果不是所有的级别的值都能在个分组中找到的话,则unstack会引入缺失值

In [24]: s1 =Series([0,1,2,3],index = ['a','b','c','d'])

In [25]: s2 = Series([4,5,6],index = ['c','d','e'])

In [26]: data2 = pd.concat([s1,s2],keys = ['one','two'])

In [27]: data2
Out[27]:
one a 0
b 1
c 2
d 3
two c 4
d 5
e 6
dtype: int64 In [28]: data2.unstack()
Out[28]:
a b c d e
one 0.0 1.0 2.0 3.0 NaN
two NaN NaN 4.0 5.0 6.0 In [29]: data2.unstack(0)
Out[29]:
one two
a 0.0 NaN
b 1.0 NaN
c 2.0 4.0
d 3.0 5.0
e NaN 6.0

而stack默认会滤除缺失值。

在对DataFrame进行旋转操作时,旋转的轴会成为旋转后索引的最低级别。也就是最内层索引。