Python的高级应用（二）常用模块学习

本章学习要点：

Python模块的定义
time &datetime模块
random模块
os模块
sys模块
shutil模块
ConfigParser模块
shelve模块
xml处理
re正则表达式

一、Python模块的定义

　　有过C语言编程经验的朋友都知道在C语言中如果要引用sqrt这个函数，必须用语句"#include<math.h>"引入math.h这个头文件，否则是无法正常进行调用的。那么在Python中，如果要引用一些内置的函数，该怎么处理呢？在Python中有一个概念叫做模块（module），这个和C语言中的头文件以及Java中的包很类似，比如在Python中要调用sqrt函数，必须用import关键字引入math这个模块，下面就来了解一下Python中的模块。

模块分为三种

自定义模块
内置标准模块（又称标准库）
开源模块

模块的调用

　　在Python中用关键字import来引入某个模块，比如要引用模块math，就可以在文件最开始的地方用import math来引入。在调用math模块中的函数时，必须这样引用：

模块名.函数名

　　为什么必须加上模块名这样调用呢？因为可能存在这样一种情况：在多个模块中含有相同名称的函数，此时如果只是通过函数名来调用，解释器无法知道到底要调用哪个函数。所以如果像上述这样引入模块的时候，调用函数必须加上模块名。

　　有时候我们只需要用到模块中的某个函数，只需要引入该函数即可，此时可以通过语句

from 模块名 import 函数名1,函数名2....

自定义模块

　　在Python中，每个Python文件都可以作为一个模块，模块的名字就是文件的名字。

　　比如有这样一个文件test.py，在test.py中定义了函数add：

#test.py

def add(a,b):

    return a+b

　　那么在其他文件中就可以先import test，然后通过test.add(a,b)来调用了，当然也可以通过from test import add来引入。

在引入模块的时候发生了什么

　　先看一个例子，在文件test.py中的代码：

#test.py

def display():

    print 'hello world'

display()

　　在test1.py中引入模块test:

#test1.py

import test

　　然后运行test1.py，会输出"hello world"。也就是说在用import引入模块时，会将引入的模块文件中的代码执行一次。但是注意，只在第一次引入时才会执行模块文件中的代码，因为只在第一次引入时进行加载，这样做很容易理解，不仅可以节约时间还可以节约内存。

二、time &datetime模块

time模块方法：
time.time():获取当前时间的时间戳

time.localtime():接受一个时间戳，并把它转化为一个当前时间的元组。不给参数的话就会默认将time.time()作为参数传入

time.localtime():
索引	属性	含义
0	tm_year	年
1	tm_mon	月
2	tm_mday	日
3	tm_hour	时
4	tm_min	分
5	tm_sec	秒
6	tm_wday	一周中的第几天
7	tm_yday	一年中的第几天
8	tm_isdst	夏令时

time.mktime():和time.localtime()相反，它把一个时间元组转换成时间戳（这个必须要给一个参数）
time.asctime():把一个时间元组表示为：“Sun Jul 28 03:35:26 2013”这种格式，不给参数的话就会默认将time.localtime()作为参数传入
time.ctime():把一个时间戳转换为time.asctime()的表达格式，不给参数的话就会默认将time.time()作为参数传入
time.gmtime():将一个时间戳转换为UTC+0时区（中国应该是+8时区，相差8个小时）的时间元组，不给参数的话就会默认将time.time()作为参数传入
time.strftime(format,time.localtime()):将一个时间元组转换为格式化的时间字符,不给时间元组参数的话就会默认将time.localtime()作为参数传入

例如web日志里面的时间格式就是time.strftime('%d/%b/%Y:%X')

返回结果：

>>> time.strftime('%d/%b/%Y:%X')

'21/Aug/2016:20:06:56'

format：

属性	格式	含义	取值范围（格式）
年份	%y	去掉世纪的年份	00-99
	%Y	完整的年份
	%j	一年中的第几天	001-366
月份	%m	月份	1月12日
	%b	本地简化月份的名称	简写英文月份
	%B	本地完整月份的名称	完整英文月份
日期	%d	一个月中的第几天	1月31日
小时	%H	一天中的第几个小时（24小时制）	00-23
小时	%l	第几个小时（12小时制）	“01-12”
分钟	%M	分钟数	00-59
秒	%S	秒	00-59
星期	%U	一年中的星期数（从星期天开始算）	00-53
	%W	一年中的星期数（从星期一开始算）
	%w	一个星期的第几天	0-6
时区	%Z	中国：应该是GMT+8（中国标准时间）	求大神扫盲
其他	%x	本地相应日期	日/月/年
	%X	本地相印时间	时:分:秒
	%c	详细日期时间	日/月/年时:分:秒
	%%	‘%'字符	‘%'字符
	%p	本地am或者pm的相应符	AM or PM

time.strptime(stringtime,format):将时间字符串根据指定的格式化符转换成数组形式的时间,
例如：

>>> time.strptime('21/Aug/2016:20:06:56', '%d/%b/%Y:%X')

time.struct_time(tm_year=2016, tm_mon=8, tm_mday=21, tm_hour=20, tm_min=6, tm_sec=56, tm_wday=6, tm_yday=234, tm_isdst=-1)

time.clock():返回处理器时钟时间,一般用于性能测试和基准测试等，因为他们反映了程序使用的实际时间，平常用不到这个。

time.sleep():推迟指定的时间运行，单位为秒

import time

print(time.time()) #打印时间戳

print(time.localtime())#打印本地时间元组

print(time.gmtime())#答应UTC+0时区的时间元组

print(time.ctime())#打印asctime格式化时间

print(time.mktime(time.localtime()))#将时间元组转换为时间戳

print(time.asctime())#打印格式化时间

print(time.strftime('%d/%b/%Y:%X'))#打印指定格式的时间格式

#把时间字符串和它的格式翻译成时间元组

print(time.strptime('21/Aug/2016:20:06:56', '%d/%b/%Y:%X'))

print('%0.5f'%time.clock()) #打印处理器时间

for i in range(100000):

 pass

print('%0.5f'%time.clock())#打印处理器时间

结果：

1471781504.4066072

time.struct_time(tm_year=2016, tm_mon=8, tm_mday=21, tm_hour=20, tm_min=11, tm_sec=44, tm_wday=6, tm_yday=234, tm_isdst=0)

time.struct_time(tm_year=2016, tm_mon=8, tm_mday=21, tm_hour=12, tm_min=11, tm_sec=44, tm_wday=6, tm_yday=234, tm_isdst=0)

Sun Aug 21 20:11:44 2016

1471781504.0

Sun Aug 21 20:11:44 2016

21/Aug/2016:20:11:44

time.struct_time(tm_year=2016, tm_mon=8, tm_mday=21, tm_hour=20, tm_min=6, tm_sec=56, tm_wday=6, tm_yday=234, tm_isdst=-1)

0.00000

0.00337

datetime模块
　　datetime.time():生成一个时间对象。这个时间可以由我们来设置，默认都是0(这个类只针对时间)

#coding:utf-8

import datetime

print datetime.time()

t = datetime.time(1, 3, 5, 25)

print t

print t.hour #时

print t.minute #分

print t.second #秒

print t.microsecond #毫秒

print datetime.time.max #一天的结束时间

print datetime.time.min #一天的开始时间

结果：

00:00:00

01:03:05.000025

23:59:59.999999

00:00:00

　　datetime.date():生成一个日期对象。这个日期要由我们来设置，(这个类只针对日期)

import datetime

#设置日期

t = datetime.date(2016, 8, 21)

#打印设置日期的和元组

print(t.timetuple())#日期元组

print(t)

print(t.year) #年

print(t.month) #月

print(t.day) #日

#获取今天的日期

today = datetime.date.today()

print(today)

print(datetime.datetime.now())#这个打印到毫秒级别

#获取今天日期的元组

t1 = today.timetuple()

print(t1)

#打印成ctime格式(time.ctime()格式)

#'%a %b %d %H:%M:%S %Y'

print(t.ctime())

print(today.ctime()

　　结果:

time.struct_time(tm_year=2016, tm_mon=8, tm_mday=21, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=234, tm_isdst=-1)

2016-08-21

2016

8

21

2016-08-21

2016-08-21 20:18:10.550607

time.struct_time(tm_year=2016, tm_mon=8, tm_mday=21, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=234, tm_isdst=-1)

Sun Aug 21 00:00:00 2016

Sun Aug 21 00:00:00 2016

三、random模块

　　获取一个小于1的浮点数

>>> import random

>>> print(random.random())

0.958520379647631

　　获取一个从1到10的整数

>>> print(random.randint(1,10))

5

　　获取一个大于0小于2的浮点数

>>> print(random.uniform(0,2))

1.6072811216275604

　　获取一个从1到10步长为4的随机数

>>> print(random.randrange(1,10,4))

9

　　从列表a中以随机顺序取出3个元素(一个元素只能取出一次,所以取出的个数不能大于列表所含元素的个数)

>>> a=[1,2,3,4,5]

>>> print(random.sample(a,3))

[5, 2, 1]

四、OS模块

一、os模块概述

Python os模块包含普遍的操作系统功能。如果你希望你的程序能够与平台无关的话，这个模块是尤为重要的。(一语中的)

二、常用方法

1、os.name

输出字符串指示正在使用的平台。如果是window 则用'nt'表示，对于Linux/Unix用户，它是'posix'。

2、os.getcwd()

函数得到当前工作目录，即当前Python脚本工作的目录路径。

3、os.listdir()

>>> import os

>>> os.listdir(os.getcwd())

['.idea', 'day01', 'day02', 'day02_note', 'day03', 'day04', 'day04_atm', 'day05']

4、os.remove()

删除一个文件。

5、os.system()

运行shell命令。

>>> os.system('dir')

0

>>> os.system('cmd') #启动dos

6、os.sep 可以取代操作系统特定的路径分割符。

7、os.linesep字符串给出当前平台使用的行终止符

>>> os.linesep

'\r\n'            #Windows使用'\r\n'，Linux使用'\n'而Mac使用'\r'。

>>> os.sep

'\\'              #Windows

>>>

8、os.path.split()

函数返回一个路径的目录名和文件名

>>> os.path.split('C:\\Python35\\abc.txt')

('C:\\Python35', 'abc.txt')

9、os.path.isfile()和os.path.isdir()函数分别检验给出的路径是一个文件还是目录。

>>> os.path.isdir(os.getcwd())

True

>>> os.path.isfile('a.txt')

False

10、os.path.exists()函数用来检验给出的路径是否真地存在

>>> os.path.exists('C:\\Python25\\abc.txt')

False

>>> os.path.exists('C:\\Python25')

True

>>>

11、os.path.abspath(name):获得绝对路径

12、os.path.normpath(path):规范path字符串形式

13、os.path.getsize(name):获得文件大小，如果name是目录返回0L

14、os.path.splitext():分离文件名与扩展名

>>> os.path.splitext('a.txt')

('a', '.txt')

15、os.path.join(path,name):连接目录与文件名或目录

>>> os.path.join('c:\\Python','a.txt')

'c:\\Python\\a.txt'

>>> os.path.join('c:\\Python','f1')

'c:\\Python\\f1'

>>>

16、os.path.basename(path):返回文件名

>>> os.path.basename('a.txt')

'a.txt'

>>> os.path.basename('c:\\Python\\a.txt')

'a.txt'

>>>

17、os.path.dirname(path):返回文件路径

>>> os.path.dirname('c:\\Python\\a.txt')

'c:\\Python'

五、sys模块

sys.argv           命令行参数List，第一个元素是程序本身路径

sys.exit(n)        退出程序，正常退出时exit(0)

sys.version        获取Python解释程序的版本信息

sys.maxint         最大的Int值

sys.path           返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值

sys.platform       返回操作系统平台名称

sys.stdout.write('please:')

val = sys.stdin.readline()[:-1]

六、shutil模块

七、ConfigParser模块

用于生成和修改常见配置文档，当前模块的名称在 python 3.x 版本中变更为 configparser

Linux的配置文件基本都是下面的格式，怎么用Python生成下面的配置文件呢？

[DEFAULT]

ServerAliveInterval = 45

Compression = yes

CompressionLevel = 9

ForwardX11 = yes

[bitbucket.org]

User = hg

[topsecret.server.com]

Port = 50022

ForwardX11 = no

用ConfigParser模块即可

import configparser

config = configparser.ConfigParser()

config["DEFAULT"] = {'ServerAliveInterval': '45',

                      'Compression': 'yes',

                     'CompressionLevel': '9'}

config['bitbucket.org'] = {}

config['bitbucket.org']['User'] = 'hg'

config['topsecret.server.com'] = {}

topsecret = config['topsecret.server.com']

topsecret['Host Port'] = '50022'     # mutates the parser

topsecret['ForwardX11'] = 'no'  # same here

config['DEFAULT']['ForwardX11'] = 'yes'

with open('example.ini', 'w') as configfile:

   config.write(configfile)

八、shelve 模块

shelve是一个简单的数据存储方案，他只有一个函数就是open()，这个函数接收一个参数就是文件名，然后返回一个shelf对象，你可以用他来存储东西，就可以简单的把他当作一个字典，当你存储完毕的时候，就调用close函数来关闭

import shelve

d = shelve.open('shelve_test') #打开一个文件

class Test(object):

    def __init__(self,n):

        self.n = n

t = Test(123)

t2 = Test(123334)

name = ["alex","rain","test"]

d["test"] = name #持久化列表

d["t1"] = t      #持久化类

d["t2"] = t2

d.close()

九、XML文件处理

什么是xml？

xml即可扩展标记语言，它可以用来标记数据、定义数据类型，是一种允许用户对自己的标记语言进行定义的源语言。

解析的xml文件（country.xml）：

<?xml version="1.0"?>

<data>

  <country name="Singapore">

    <rank>4</rank>

    <year>2011</year>

    <gdppc>59900</gdppc>

    <neighbor name="Malaysia" direction="N"/>

  </country>

  <country name="Panama">

    <rank>68</rank>

    <year>2011</year>

    <gdppc>13600</gdppc>

    <neighbor name="Costa Rica" direction="W"/>

    <neighbor name="Colombia" direction="E"/>

  </country>

</data>

使用xml.etree.ElementTree处理XML

　　ElementTree生来就是为了处理XML，它在Python标准库中有两种实现：一种是纯Python实现的，如xml.etree.ElementTree，另一种是速度快一点的xml.etree.cElementTree。注意：尽量使用C语言实现的那种，因为它速度更快，而且消耗的内存更少。

#!/usr/bin/evn python

#coding:utf-8 

try:

  import xml.etree.cElementTree as ET

except ImportError:

  import xml.etree.ElementTree as ET

import sys 

try:

  tree = ET.parse("country.xml")     #打开xml文档

  #root = ET.fromstring(country_string) #从字符串传递xml

  root = tree.getroot()         #获得root节点

except Exception, e:

  print("Error:cannot parse file:country.xml.")

  sys.exit(1)

print(root.tag, "---", root.attrib)

for child in root:

  print(child.tag, "---", child.attrib) 

print("*"*10)

print(root[0][1].text)   #通过下标访问

print(root[0].tag, root[0].text)

print("*"*10)

for country in root.findall('country'): #找到root节点下的所有country节点

  rank = country.find('rank').text   #子节点下节点rank的值

  name = country.get('name')      #子节点下属性name的值

  print(name, rank) 

#修改xml文件

for country in root.findall('country'):

  rank = int(country.find('rank').text)

  if rank > 50:

    root.remove(country) 

tree.write('output.xml')

创建XML文件

import xml.etree.ElementTree as ET

new_xml = ET.Element("namelist")

name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})

age = ET.SubElement(name,"age",attrib={"checked":"no"})

sex = ET.SubElement(name,"sex")

sex.text = '33'

name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})

age = ET.SubElement(name2,"age")

age.text = '19'

et = ET.ElementTree(new_xml) #生成文档对象

et.write("test.xml", encoding="utf-8",xml_declaration=True)

ET.dump(new_xml) #打印生成的格式

九、re正则表达式

常用的正则表达符号

'.'     默认匹配除\n之外的任意一个字符，若指定flag DOTALL,则匹配任意字符，包括换行

'^'     匹配字符开头，若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)

'$'     匹配字符结尾，或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也可以

'*'     匹配*号前的字符0次或多次，re.findall("ab*","cabb3abcbbac")  结果为['abb', 'ab', 'a']

'+'     匹配前一个字符1次或多次，re.findall("ab+","ab+cd+abb+bba") 结果['ab', 'abb']

'?'     匹配前一个字符1次或0次

'{m}'   匹配前一个字符m次

'{n,m}' 匹配前一个字符n到m次，re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb']

'|'     匹配|左或|右的字符，re.search("abc|ABC","ABCBabcCD").group() 结果'ABC'

'(...)' 分组匹配，re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c

'\A'    只从字符开头匹配，re.search("\Aabc","alexabc") 是匹配不到的

'\Z'    匹配字符结尾，同$

'\d'    匹配数字0-9

'\D'    匹配非数字

'\w'    匹配[A-Za-z0-9]

'\W'    匹配非[A-Za-z0-9]

's'     匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 '\t'

'(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 结果{'province': '3714', 'city': '81', 'birthday': '1993'}

re的主要功能函数

常用的功能函数包括：compile、search、match、split、findall（finditer）、sub（subn）
compile
re.compile(pattern[, flags])
作用：把正则表达式语法转化成正则表达式对象
flags定义包括：
re.I：忽略大小写
re.L：表示特殊字符集 \w, \W, \b, \B, \s, \S 依赖于当前环境
re.M：多行模式
re.S：' . '并且包括换行符在内的任意字符（注意：' . '不包括换行符）
re.U：表示特殊字符集 \w, \W, \b, \B, \d, \D, \s, \S 依赖于 Unicode 字符属性数据库

search
re.search(pattern, string[, flags])
search (string[, pos[, endpos]])
作用：在字符串中查找匹配正则表达式模式的位置，返回 MatchObject 的实例，如果没有找到匹配的位置，则返回 None。

match
re.match(pattern, string[, flags])
match(string[, pos[, endpos]])
作用：match() 函数只在字符串的开始位置尝试匹配正则表达式，也就是只报告从位置 0 开始的匹配情况，而 search() 函数是扫描整个字符串来查找匹配。如果想要搜索整个字符串来寻找匹配，应当用 search()。

下面是几个例子：

#!/usr/bin/env python

import re

r1 = re.compile(r'world')

if r1.match('helloworld'):

    print 'match succeeds'

else:

    print 'match fails'

if r1.search('helloworld'):

    print 'search succeeds'

else:

    print 'search fails'

说明一下：r是raw(原始)的意思。因为在表示字符串中有一些转义符，如表示回车'\n'。如果要表示\表需要写为'\\'。但如果我就是需要表示一个'\'+'n'，不用r方式要写为:'\\n'。但使用r方式则为r'\n'这样清晰多了。

秒客网

进击的Python【第五章】：Python的高级应用（二）常用模块