参考:http://www.nikhilgopal.com/2010/12/dealing-with-large-files-in-python.html 及*
line by line的:
1.使用with关键词:
1 with open('somefile.txt', 'r') as FILE: 2 for line in FILE: 3 # operation
类似于:
1 for line in open('somefile.txt'): 2 process_data(line)
2.使用模块fileinput:
1 import fileinput 2 for i in fileinput.input('somefile.txt'): 3 # operation
3.建立缓存,精确控制缓冲区大小(readlines 和 read 均可):
1 BUFFER = int(10E6) #10 megabyte buffer
2 file = open('somefile.txt', 'r') 3 text = file.readlines(BUFFER) 4 while text != []: 5 for t in text: 6 # operation
7 text = file.readlines(BUFFER)
4.结合方法3使用yield:
1 def read_in_chunks(file_object, chunk_size=1024): 2 """Lazy function (generator) to read a file piece by piece. 3 Default chunk size: 1k."""
4 while True: 5 data = file_object.read(chunk_size) 6 if not data: 7 break
8 yield data 9
10
11 f = open('really_big_file.dat') 12 for piece in read_in_chunks(f): 13 process_data(piece)
5.使用iter:
1 f = open('really_big_file.dat') 2 def read1k(): 3 return f.read(1024) 4 5 for piece in iter(read1k, ''): 6 process_data(piece)
再比如:
1 f = ... # file-like object, i.e. supporting read(size) function and 2 # returning empty string '' when there is nothing to read 3 4 def chunked(file, chunk_size): 5 return iter(lambda: file.read(chunk_size), '') 6 7 for data in chunked(f, 65536): 8 # process the data