Python 超大文件处理

参考：http://www.nikhilgopal.com/2010/12/dealing-with-large-files-in-python.html 及*

line by line的：

1.使用with关键词：

1 with open('somefile.txt', 'r') as FILE: 2     for line in FILE: 3         # operation

类似于：

1 for line in open('somefile.txt'): 2     process_data(line)

2.使用模块fileinput：

1 import fileinput 2 for i in fileinput.input('somefile.txt'): 3     # operation

3.建立缓存，精确控制缓冲区大小（readlines 和 read 均可）：

1 BUFFER = int(10E6) #10 megabyte buffer
2 file = open('somefile.txt', 'r') 3 text = file.readlines(BUFFER) 4 while text != []: 5     for t in text: 6         # operation
7     text = file.readlines(BUFFER)

4.结合方法3使用yield：

 1 def read_in_chunks(file_object, chunk_size=1024):  2     """Lazy function (generator) to read a file piece by piece.  3  Default chunk size: 1k."""
 4     while True:  5         data = file_object.read(chunk_size)  6         if not data:  7             break
 8         yield data  9 
10 
11 f = open('really_big_file.dat') 12 for piece in read_in_chunks(f): 13     process_data(piece)

5.使用iter：

1 f = open('really_big_file.dat')
2 def read1k():
3     return f.read(1024)
4 
5 for piece in iter(read1k, ''):
6     process_data(piece)

再比如：

1 f = ... # file-like object, i.e. supporting read(size) function and 
2         # returning empty string '' when there is nothing to read
3 
4 def chunked(file, chunk_size):
5     return iter(lambda: file.read(chunk_size), '')
6 
7 for data in chunked(f, 65536):
8     # process the data

秒客网

Python 超大文件处理

相关文章