pandas小记:pandas高级功能

时间:2023-03-09 08:05:24
pandas小记:pandas高级功能

http://blog.csdn.net/pipisorry/article/details/53486777

pandas高级功能:面板数据、字符串方法、分类、可视化。

面板数据

{pandas数据结构有一维Series,二维DataFrame,这是三维Panel}
pandas有一个Panel数据结构,可以将其看做一个三维版的,可以用一个由DataFrame对象组成的字典或一个三维ndarray来创建Panel对象:
import pandas.io.data as web
pdata = pd.Panel(dict((stk, web.get_data_yahoo(stk, '1/1/2009', '6/1/2012')) for stk in ['AAPL', 'GOOG', 'MSFT','DELL']))
Note: stk代表指标,6个指标;三维:stk,company,time.
Panel中的每一项(类似于DataFrame的列)都是一个DataFrame
>>> pdata
<class 'pandas.core.panel.Panel'>
Dimensions: 4 (items) x 868 (major_axis) x 6 (minor_axis)
Items axis: AAPL to MSFT
Major_axis axis: 2009-01-02 00:00:00 to 2012-06-01 00:00:00
Minor_axis axis: Open to Adj Close
>>> pdata = pdata.swapaxes('items', 'minor')
>>>pdata['Adj Close']

三维度ix标签索引

基于ix的标签索引被推广到了三个维度,因此可以选取指定日期或日期范围的所有数据,如下所示:
>>> pdata.ix[:,'6/1/2012',:]
>>>pdata.ix['Adj Close', '5/22/2012':,:]
另一个用于呈现面板数据(尤其是对拟合统计模型)的办法是“堆积式的” DataFrame 形式:
>>> stacked=pdata.ix[:,'5/30/2012':,:].to_frame()
>>>stacked
DataFrame有一个相应的to_panel方法,它是to_frame的逆运算:
>>> stacked.to_panel()
<class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 3 (major_axis) x 4 (minor_axis)
Items axis: Open to Adj Close
Major_axis axis: 2012-05-30 00:00:00 to 2012-06-01 00:00:00
Minor_axis axis: AAPL to MSFT
皮皮Blog

字符串方法String Methods

Series is equipped with a set of string processing methods in the strattribute that make it easy to operate on each element of the array, as in thecode snippet below. Note that pattern-matching instr generally usesregularexpressions by default (and insome cases always uses them). See more atVectorized String Methods.

In [71]: s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

In [72]: s.str.lower()
Out[72]:
0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

皮皮Blog

,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})