重拾Python(3):Pandas之Series对象的使用

Pandas是Python下最强大的数据分析和探索库，是基于Numpy库构建的，支持类似SQL的结构化数据的增、删、查、改，具有丰富的数据处理函数。Pandas有两大数据结构：Series和DataFrame，本文主要对Series的常用用法进行总结梳理。

约定：

import pandas as pd

1.什么是Series对象?

Series对象本质上类似于一个一维数组，由一列元素（由值和对应的索引）组成。

2.Series对象的创建

Series对象的创建主要是使用pd.Series方法。具体又分为两种：

（1）通过列表创建

向pd.Series方法中传入一个列表，未指定索引时，默认从0到N-1。

ser1=pd.Series([11,22,33,44])
ser1
Out[60]:
0    11
1    22
2    33
3    44
dtype: int64

也可以使用index参数指定索引：

ser2=pd.Series([11,22,33,44],index=['a','b','c','d'])
ser2
Out[61]:
a    11
b    22
c    33
d    44
dtype: int64

（2）通过字典创建

向传入一个字典，字典的键就是索引，值就是值。

ser3=pd.Series({'a':11,'d':22,'c':33})
ser3
Out[62]:
a    11
d    22
c    33
dtype: int64

3.Series对象的三个要素

Series对象的三个要素：索引、值、名称。

（1）索引

a.索引的查看
通过Series对象的index属性查看索引，返回一个Index对象。

ser1.index
Out[63]: RangeIndex(start=0, stop=4, step=1)
ser2.index
Out[64]: Index([u'a', u'b', u'c', u'd'], dtype='object')

索引允许有重复，可使用Index对象的is_unique属性查看是否有重复。

ser1.index.is_unique
Out[65]: True

b.索引的修改
索引对象是一个不可变数组，不能修改其中的值。

ser1.index[0]=5
Traceback (most recent call last):
  File "<ipython-input-68-2029117c9570>", line 1, in <module>
    ser1.index[0]=5
  File "/usr/local/share/anaconda2/lib/python2.7/site-packages/pandas/indexes/base.py", line 1404, in __setitem__
    raise TypeError("Index does not support mutable operations")
TypeError: Index does not support mutable operations

如果想修改Series对象的索引，只能将其重定向到一个新的索引对象上。

ser1.index=[5,6,7,8]
ser1.index
Out[70]: Int64Index([5, 6, 7, 8], dtype='int64')

c.索引的重排
使用reindex方法对索引进行重排。

ser2.reindex(['b','a','c','d'])
Out[73]:
b    22
a    11
c    33
d    44
dtype: int64

重排产生一个新Series对象，原对象不发生改变。
索引重排可实现3个目的：
① 对现有索引进行顺序指定，即重新排列原来的元素顺序；
② 删除某个旧索引，即删除对应元素；

ser2.reindex(['b','a','d'])
Out[74]:
b    22
a    11
d    44
dtype: int64

③ 增加某个新索引，即增加新元素，值为NaN。

ser2.reindex(['b','a','e','c','d'])
Out[75]:
b    22.0
a    11.0
e     NaN
c    33.0
d    44.0
dtype: float64

d.索引的排序
使用sort_index方法根据现有索引进行升序、降序排列。

ser3.sort_index()
Out[80]:
a    11
c    33
d    22
dtype: int64

默认按索引取值升序排列，排序后产生一个新Series对象，原对象不发生改变。

（2）值

a.值的查看
通过Series对象的values属性查看值，返回一个数组对象。

ser1.values
Out[81]: array([11, 22, 33, 44])

b.值的修改
可以通过直接对values属性返回的数组对象进行修改来修改Series对象的值。这种修改是对原对象的直接修改。

ser1.values[1]=23
ser1
Out[83]:
5    11
6    23
7    33
8    44
dtype: int64

c.值的排序
使用sort_values方法按照值进行升序、降序排列。

ser3.sort_values()
Out[84]:
a    11
d    22
c    33
dtype: int64

默认按索引取值升序排列，排序后产生一个新Series对象，原对象不发生改变。

（3）名称

Series对象有名称，可通过name属性获得。
Series对象的索引对象也有名称，可通过Index对象的name属性获得。

4.元素操作

（1）元素选取

选择一个元素：
a.以对应的索引选取

ser2['b']
Out[90]: 22

b.以对应的索引序号选取

ser2[1]
Out[91]: 22

选择多个元素：
a.以对应的索引组成的列表选取

ser2[['a','c']]
Out[93]:
a    11
c    33
dtype: int64

b.以对应的索引组成的切片选取

ser2['a':'d']
Out[94]:
a    11
b    22
c    33
d    44
dtype: int64

c.以对应的索引序号组成的切片选取

ser2[0:3]
Out[92]:
a    11
b    22
c    33
dtype: int64

注意：a和c的区别是，前者包括右端点的元素，后者不包括右端点的元素。

（2）元素过滤

可直接使用基于值的比较运算条件进行过滤。

ser2[ser2>30]
Out[95]:
c    33
d    44
dtype: int64

（3）元素新增

a.使用赋值新增

ser2['e']=55
ser2
Out[97]:
a    11
b    22
c    33
d    44
e    55
dtype: int64

b.使用索引重排新增（注意reindex方法产生新对象，不会修改原对象）

ser2=ser2.reindex(['a','c','f'])
ser2
Out[100]:
a    11.0
c    33.0
f     NaN
dtype: float64

（4）元素删除

使用drop方法删除，drop方法产生新对象，不会修改原对象。

ser2=ser2.drop('f')
ser2
Out[106]:
a    11.0
c    33.0
dtype: float64

（5）算术运算

Series对象支持直接进行算术运算。

ser2+2
Out[107]:
a    13.0
c    35.0
dtype: float64
ser2*2
Out[108]:
a    22.0
c    66.0
dtype: float64

（6）判断是否存在某元素

使用in判断元素是否存在，实质是判断某索引是否存在。

'a' in ser3
Out[110]: True
11 in ser3
Out[111]: False

（7）判断是否有空值

使用isnull或者notnull方法判断是否有空值。

ser3.isnull()
Out[114]:
a    False
c    False
d    False
dtype: bool
ser3.notnull()
Out[115]:
a    True
c    True
d    True
dtype: bool

（8）缺失值填充

使用fillna方法进行空值填充。fillna方法产生新对象，不会修改原对象。

ser2=ser2.reindex(['a','c','h'])
ser2=ser2.fillna(99)
ser2
Out[125]:
a    11.0
c    33.0
h    99.0
dtype: float64

5.Series对象之间的操作

（1）Series之间算术运算

自动按索引进行对齐，对应元素与元素之间进行算术运算，未对齐的索引，最后的运算结果为NaN。

ser4=pd.Series([11,22,44],index=['a','b','d'])
ser5=pd.Series([11,33,44],index=['a','c','d'])
ser4+ser5
Out[126]:
a    22.0
b     NaN
c     NaN
d    88.0
dtype: float64

（2）Series之间连接

使用append方法进行Series对象的连接，对二者的数据类型不做要求，索引也可以重复。结果为一个新对象，不会修改原对象。

ser4.append(ser5)
Out[127]:
a    11
b    22
d    44
a    11
c    33
d    44
dtype: int64

6.参考与感谢

[1] 利用Python进行数据分析

秒客网