fileinput
模块允许你循环一个或多个文本文件的内容
使用 fileinput 模块循环一个文本文件
import fileinput import sys for line in fileinput.input("samples/sample.txt"): sys.stdout.write("-> ") sys.stdout.write(line) -> We will perhaps eventually be writing only small -> modules which are identified by name as they are -> used to build larger ones, so that devices like -> indentation, rather than delimiters, might become -> feasible for expressing local structure in the -> source language. -> -- Donald E. Knuth, December 1974
你也可以使用 fileinput
模块获得当前行的元信息 (meta information). 其中包括 isfirstline
, filename
, lineno
使用 fileinput 模块处理多个文本文件
import fileinput import glob import string, sys for line in fileinput.input(glob.glob("samples/*.txt")): if fileinput.isfirstline(): # first in a file? sys.stderr.write("-- reading %s --\n" % fileinput.filename()) sys.stdout.write(str(fileinput.lineno()) + " " + string.upper(line)) -- reading samples\sample.txt -- 1 WE WILL PERHAPS EVENTUALLY BE WRITING ONLY SMALL 2 MODULES WHICH ARE IDENTIFIED BY NAME AS THEY ARE 3 USED TO BUILD LARGER ONES, SO THAT DEVICES LIKE 4 INDENTATION, RATHER THAN DELIMITERS, MIGHT BECOME 5 FEASIBLE FOR EXPRESSING LOCAL STRUCTURE IN THE 6 SOURCE LANGUAGE. 7 -- DONALD E. KNUTH, DECEMBER 1974
文本文件的替换操作很简单. 只需要把 inplace
关键字参数设置为 1 , 传递给 input
函数, 该模块会帮你做好一切.
使用 fileinput 模块将 CRLF 改为 LF
import fileinput, sys for line in fileinput.input(inplace=1): # convert Windows/DOS text files to Unix files if line[-2:] == "\r\n": line = line[:-2] + "\n" sys.stdout.write(line)
shutil
实用模块包含了一些用于复制文件和文件夹的函数.
使用 shutil 复制文件
import shutil import os for file in os.listdir("."): if os.path.splitext(file)[1] == ".py": print file shutil.copy(file, os.path.join("backup", file)) aifc-example-1.py anydbm-example-1.py array-example-1.py ...
copytree
函数用于复制整个目录树 (与 cp -r
相同), 而 rmtree
函数用于删除整个目录树 (与 rm -r
)
使用 shutil 模块复制/删除目录树
import shutil import os SOURCE = "samples" BACKUP = "samples-bak" # create a backup directory shutil.copytree(SOURCE, BACKUP) print os.listdir(BACKUP) # remove it shutil.rmtree(BACKUP) print os.listdir(BACKUP) ['sample.wav', 'sample.jpg', 'sample.au', 'sample.msg', 'sample.tgz', ... Traceback (most recent call last): File "shutil-example-2.py", line 17, in ? print os.listdir(BACKUP) os.error: No such file or directory
tempfile
模块允许你快速地创建名称唯一的临时文件供使用.
使用 tempfile 模块创建临时文件
import tempfile import os tempfile = tempfile.mktemp() print "tempfile", "=>", tempfile file = open(tempfile, "w+b") file.write("*" * 1000) file.seek(0) print len(file.read()), "bytes" file.close() try: # must remove file when done os.remove(tempfile) except OSError: pass tempfile => C:\TEMP\~160-1 1000 bytes
TemporaryFile
函数会自动挑选合适的文件名, 并打开文件而且它会确保该文件在关闭的时候会被删除. (在 Unix 下, 你可以删除一个已打开的文件, 这 时文件关闭时它会被自动删除. 在其他平台上, 这通过一个特殊的封装类实现.)
使用 tempfile 模块打开临时文件
import tempfile file = tempfile.TemporaryFile() for i in range(100): file.write("*" * 100) file.close() # removes the file!
StringIO
模块的使用. 它实现了一个工作在内存的文件对象 (内存文件). 在大多需要标准文件对象的地方都可以使用它来替换.
使用 StringIO 模块从内存文件读入内容
import StringIO MESSAGE = "That man is depriving a village somewhere of a computer scientist." file = StringIO.StringIO(MESSAGE) print file.read() That man is depriving a village somewhere of a computer scientist.
StringIO
类实现了内建文件对象的所有方法, 此外还有 getvalue
方法用来返回它内部的字符串值
使用 StringIO 模块向内存文件写入内容
import StringIO file = StringIO.StringIO() file.write("This man is no ordinary man. ") file.write("This is Mr. F. G. Superman.") print file.getvalue() This man is no ordinary man. This is Mr. F. G. Superman.
使用 StringIO 模块捕获输出
import StringIO import string, sys stdout = sys.stdout sys.stdout = file = StringIO.StringIO() print """ According to Gbaya folktales, trickery and guile are the best ways to defeat the python, king of snakes, which was hatched from a dragon at the world's start. -- National Geographic, May 1997 """ sys.stdout = stdout print string.upper(file.getvalue()) ACCORDING TO GBAYA FOLKTALES, TRICKERY AND GUILE ARE THE BEST WAYS TO DEFEAT THE PYTHON, KING OF SNAKES, WHICH WAS HATCHED FROM A DRAGON AT THE WORLD'S START. -- NATIONAL GEOGRAPHIC, MAY 1997
cStringIO
是一个可选的模块, 是 StringIO
的更快速实现. 它的工作方式和 StringIO
基本相同, 但是它不可以被继承
使用 cStringIO 模块
import cStringIO MESSAGE = "That man is depriving a village somewhere of a computer scientist." file = cStringIO.StringIO(MESSAGE) print file.read() That man is depriving a village somewhere of a computer scientist.
为了让你的代码尽可能快, 但同时保证兼容低版本的 Python ,你可以使用一个小技巧在 cStringIO
不可用时启用 StringIO
模块,
后退至 StringIO
try: import cStringIO StringIO = cStringIO except ImportError: import StringIO print StringIO <module 'StringIO' (built-in)>
mmap
模块提供了操作系统内存映射函数的接口, 映射区域的行为和字符串对象类似, 但数据是直接从文件读取的.
使用 mmap 模块
import mmap import os filename = "samples/sample.txt" file = open(filename, "r+") size = os.path.getsize(filename) data = mmap.mmap(file.fileno(), size) # basics print data print len(data), size # use slicing to read from the file # 使用切片操作读取文件 print repr(data[:10]), repr(data[:10]) # or use the standard file interface # 或使用标准的文件接口 print repr(data.read(10)), repr(data.read(10)) <mmap object at 008A2A10> 302 302 'We will pe' 'We will pe' 'We will pe' 'rhaps even'
在 Windows 下, 这个文件必须以既可读又可写的模式打开( `r+` , `w+` , 或 `a+` ), 否则 mmap
调用会失败.
对映射区域使用字符串方法和正则表达式
mport mmap import os, string, re def mapfile(filename): file = open(filename, "r+") size = os.path.getsize(filename) return mmap.mmap(file.fileno(), size) data = mapfile("samples/sample.txt") # search index = data.find("small") print index, repr(data[index-5:index+15]) # regular expressions work too! m = re.search("small", data) print m.start(), m.group() 43 'only small\015\012modules ' 43 small