How to read a file in reverse order using python? I want to read a file from last line to first line.
如何使用python读取反序文件?我想从最后一行到第一行读一个文件。
13 个解决方案
#1
55
for line in reversed(open("filename").readlines()):
print line.rstrip()
And in Python 3:
在Python 3:
for line in reversed(list(open("filename"))):
print(line.rstrip())
#2
93
A correct, efficient answer written as a generator.
一个正确的、有效的答案。
import os
def reverse_readline(filename, buf_size=8192):
"""a generator that returns the lines of a file in reverse order"""
with open(filename) as fh:
segment = None
offset = 0
fh.seek(0, os.SEEK_END)
file_size = remaining_size = fh.tell()
while remaining_size > 0:
offset = min(file_size, offset + buf_size)
fh.seek(file_size - offset)
buffer = fh.read(min(remaining_size, buf_size))
remaining_size -= buf_size
lines = buffer.split('\n')
# the first line of the buffer is probably not a complete line so
# we'll save it and append it to the last line of the next buffer
# we read
if segment is not None:
# if the previous chunk starts right from the beginning of line
# do not concact the segment to the last line of new chunk
# instead, yield the segment first
if buffer[-1] is not '\n':
lines[-1] += segment
else:
yield segment
segment = lines[0]
for index in range(len(lines) - 1, 0, -1):
if len(lines[index]):
yield lines[index]
# Don't yield None if the file was empty
if segment is not None:
yield segment
#3
14
How about something like this:
比如这样:
import os
def readlines_reverse(filename):
with open(filename) as qfile:
qfile.seek(0, os.SEEK_END)
position = qfile.tell()
line = ''
while position >= 0:
qfile.seek(position)
next_char = qfile.read(1)
if next_char == "\n":
yield line[::-1]
line = ''
else:
line += next_char
position -= 1
yield line[::-1]
if __name__ == '__main__':
for qline in readlines_reverse(raw_input()):
print qline
Since the file is read character by character in reverse order, it will work even on very large files, as long as individual lines fit into memory.
由于该文件以相反的顺序读取字符,因此即使是在非常大的文件中也会起作用,只要单个行适合内存。
#4
#5
8
import re
def filerev(somefile, buffer=0x20000):
somefile.seek(0, os.SEEK_END)
size = somefile.tell()
lines = ['']
rem = size % buffer
pos = max(0, (size // buffer - 1) * buffer)
while pos >= 0:
somefile.seek(pos, os.SEEK_SET)
data = somefile.read(rem + buffer) + lines[0]
rem = 0
lines = re.findall('[^\n]*\n?', data)
ix = len(lines) - 2
while ix > 0:
yield lines[ix]
ix -= 1
pos -= buffer
else:
yield lines[0]
with open(sys.argv[1], 'r') as f:
for line in filerev(f):
sys.stdout.write(line)
#6
7
You can also use python module file_read_backwards
.
您还可以使用python模块file_read_backwards。
After installing it, via pip install file_read_backwards
(v1.2.1), you can read the entire file backwards (line-wise) in a memory efficient manner via:
在安装它之后,通过pip安装file_read_backwards (v1.2.1),您可以通过以下方式将整个文件向后(行-wise)读取:
#!/usr/bin/env python2.7
from file_read_backwards import FileReadBackwards
with FileReadBackwards("/path/to/file", encoding="utf-8") as frb:
for l in frb:
print l
It supports "utf-8","latin-1", and "ascii" encodings.
它支持“utf-8”、“latin-1”和“ascii”编码。
Support is also available for python3. Further documentation can be found at http://file-read-backwards.readthedocs.io/en/latest/readme.html
对python3也有支持。进一步的文档可以在http://fileread-backwards.readthedocs.io/en/latest/readme .html中找到。
#7
2
Here you can find my my implementation, you can limit the ram usage by changing the "buffer" variable, there is a bug that the program prints an empty line in the beginning.
在这里,您可以找到我的实现,您可以通过更改“缓冲区”变量来限制ram的使用,有一个bug,程序在开始时打印了一条空行。
And also ram usage may be increase if there is no new lines for more than buffer bytes, "leak" variable will increase until seeing a new line ("\n").
而且,如果没有新的行用于缓冲区字节,那么内存使用量也会增加,“泄漏”变量将增加,直到看到一条新行(“\n”)。
This is also working for 16 GB files which is bigger then my total memory.
这也适用于16gb的文件,它比我的内存大。
import os,sys
buffer = 1024*1024 # 1MB
f = open(sys.argv[1])
f.seek(0, os.SEEK_END)
filesize = f.tell()
division, remainder = divmod(filesize, buffer)
line_leak=''
for chunk_counter in range(1,division + 2):
if division - chunk_counter < 0:
f.seek(0, os.SEEK_SET)
chunk = f.read(remainder)
elif division - chunk_counter >= 0:
f.seek(-(buffer*chunk_counter), os.SEEK_END)
chunk = f.read(buffer)
chunk_lines_reversed = list(reversed(chunk.split('\n')))
if line_leak: # add line_leak from previous chunk to beginning
chunk_lines_reversed[0] += line_leak
# after reversed, save the leakedline for next chunk iteration
line_leak = chunk_lines_reversed.pop()
if chunk_lines_reversed:
print "\n".join(chunk_lines_reversed)
# print the last leaked line
if division - chunk_counter < 0:
print line_leak
#8
2
a simple function to create a second file reversed (linux only):
一个简单的函数来创建第二个文件(只有linux):
import os
def tac(file1, file2):
print(os.system('tac %s > %s' % (file1,file2)))
how to use
如何使用
tac('ordered.csv', 'reversed.csv')
f = open('reversed.csv')
#9
1
Thanks for the answer @srohde. It has a small bug checking for newline character with 'is' operator, and I could not comment on the answer with 1 reputation. Also I'd like to manage file open outside because that enables me to embed my ramblings for luigi tasks.
谢谢你的回答@srohde。它有一个小的bug检查新行字符与'is'操作符,我不能评论的答案有1个信誉。另外,我还想在外部管理文件,因为这使我能够嵌入到luigi任务的漫游。
What I needed to change has the form:
我需要改变的是:
with open(filename) as fp:
for line in fp:
#print line, # contains new line
print '>{}<'.format(line)
I'd love to change to:
我想改一下:
with open(filename) as fp:
for line in reversed_fp_iter(fp, 4):
#print line, # contains new line
print '>{}<'.format(line)
Here is a modified answer that wants a file handle and keeps newlines:
这里有一个修改后的答案,它想要一个文件句柄,并保持新行:
def reversed_fp_iter(fp, buf_size=8192):
"""a generator that returns the lines of a file in reverse order
ref: https://*.com/a/23646049/8776239
"""
segment = None # holds possible incomplete segment at the beginning of the buffer
offset = 0
fp.seek(0, os.SEEK_END)
file_size = remaining_size = fp.tell()
while remaining_size > 0:
offset = min(file_size, offset + buf_size)
fp.seek(file_size - offset)
buffer = fp.read(min(remaining_size, buf_size))
remaining_size -= buf_size
lines = buffer.splitlines(True)
# the first line of the buffer is probably not a complete line so
# we'll save it and append it to the last line of the next buffer
# we read
if segment is not None:
# if the previous chunk starts right from the beginning of line
# do not concat the segment to the last line of new chunk
# instead, yield the segment first
if buffer[-1] == '\n':
#print 'buffer ends with newline'
yield segment
else:
lines[-1] += segment
#print 'enlarged last line to >{}<, len {}'.format(lines[-1], len(lines))
segment = lines[0]
for index in range(len(lines) - 1, 0, -1):
if len(lines[index]):
yield lines[index]
# Don't yield None if the file was empty
if segment is not None:
yield segment
#10
0
def reverse_lines(filename):
y=open(filename).readlines()
return y[::-1]
#11
0
Always use with
when working with files as it handles everything for you:
在处理文件时,要经常使用它,因为它为您处理所有事情:
with open('filename', 'r') as f:
for line in reversed(f.readlines()):
print line
Or in Python 3:
或在Python 3:
with open('filename', 'r') as f:
for line in reversed(list(f.readlines())):
print(line)
#12
0
If you are concerned about file size / memory usage, memory-mapping the file and scanning backwards for newlines is a solution:
如果您关心文件大小/内存使用情况,内存映射文件和向后扫描换行是一个解决方案:
How to search for a string in text files?
如何在文本文件中搜索字符串?
#13
-2
I had to do this some time ago and used the below code. It pipes to the shell. I am afraid i do not have the complete script anymore. If you are on a unixish operating system, you can use "tac", however on e.g. Mac OSX tac command does not work, use tail -r. The below code snippet tests for which platform you're on, and adjusts the command accordingly
我在一段时间之前就必须这样做,并使用下面的代码。它管壳。恐怕我已经没有完整的脚本了。如果您使用的是unixish操作系统,您可以使用“tac”,但是,例如Mac OSX tac命令不工作,使用tail -r。下面的代码片段测试了您所使用的平台,并相应地调整命令。
# We need a command to reverse the line order of the file. On Linux this
# is 'tac', on OSX it is 'tail -r'
# 'tac' is not supported on osx, 'tail -r' is not supported on linux.
if sys.platform == "darwin":
command += "|tail -r"
elif sys.platform == "linux2":
command += "|tac"
else:
raise EnvironmentError('Platform %s not supported' % sys.platform)
#1
55
for line in reversed(open("filename").readlines()):
print line.rstrip()
And in Python 3:
在Python 3:
for line in reversed(list(open("filename"))):
print(line.rstrip())
#2
93
A correct, efficient answer written as a generator.
一个正确的、有效的答案。
import os
def reverse_readline(filename, buf_size=8192):
"""a generator that returns the lines of a file in reverse order"""
with open(filename) as fh:
segment = None
offset = 0
fh.seek(0, os.SEEK_END)
file_size = remaining_size = fh.tell()
while remaining_size > 0:
offset = min(file_size, offset + buf_size)
fh.seek(file_size - offset)
buffer = fh.read(min(remaining_size, buf_size))
remaining_size -= buf_size
lines = buffer.split('\n')
# the first line of the buffer is probably not a complete line so
# we'll save it and append it to the last line of the next buffer
# we read
if segment is not None:
# if the previous chunk starts right from the beginning of line
# do not concact the segment to the last line of new chunk
# instead, yield the segment first
if buffer[-1] is not '\n':
lines[-1] += segment
else:
yield segment
segment = lines[0]
for index in range(len(lines) - 1, 0, -1):
if len(lines[index]):
yield lines[index]
# Don't yield None if the file was empty
if segment is not None:
yield segment
#3
14
How about something like this:
比如这样:
import os
def readlines_reverse(filename):
with open(filename) as qfile:
qfile.seek(0, os.SEEK_END)
position = qfile.tell()
line = ''
while position >= 0:
qfile.seek(position)
next_char = qfile.read(1)
if next_char == "\n":
yield line[::-1]
line = ''
else:
line += next_char
position -= 1
yield line[::-1]
if __name__ == '__main__':
for qline in readlines_reverse(raw_input()):
print qline
Since the file is read character by character in reverse order, it will work even on very large files, as long as individual lines fit into memory.
由于该文件以相反的顺序读取字符,因此即使是在非常大的文件中也会起作用,只要单个行适合内存。
#4
8
for line in reversed(open("file").readlines()):
print line.rstrip()
If you are on linux, you can use tac
command.
如果您在linux上,您可以使用tac命令。
$ tac file
2 recipes you can find in ActiveState here and here
你可以在这里和这里找到两个食谱。
#5
8
import re
def filerev(somefile, buffer=0x20000):
somefile.seek(0, os.SEEK_END)
size = somefile.tell()
lines = ['']
rem = size % buffer
pos = max(0, (size // buffer - 1) * buffer)
while pos >= 0:
somefile.seek(pos, os.SEEK_SET)
data = somefile.read(rem + buffer) + lines[0]
rem = 0
lines = re.findall('[^\n]*\n?', data)
ix = len(lines) - 2
while ix > 0:
yield lines[ix]
ix -= 1
pos -= buffer
else:
yield lines[0]
with open(sys.argv[1], 'r') as f:
for line in filerev(f):
sys.stdout.write(line)
#6
7
You can also use python module file_read_backwards
.
您还可以使用python模块file_read_backwards。
After installing it, via pip install file_read_backwards
(v1.2.1), you can read the entire file backwards (line-wise) in a memory efficient manner via:
在安装它之后,通过pip安装file_read_backwards (v1.2.1),您可以通过以下方式将整个文件向后(行-wise)读取:
#!/usr/bin/env python2.7
from file_read_backwards import FileReadBackwards
with FileReadBackwards("/path/to/file", encoding="utf-8") as frb:
for l in frb:
print l
It supports "utf-8","latin-1", and "ascii" encodings.
它支持“utf-8”、“latin-1”和“ascii”编码。
Support is also available for python3. Further documentation can be found at http://file-read-backwards.readthedocs.io/en/latest/readme.html
对python3也有支持。进一步的文档可以在http://fileread-backwards.readthedocs.io/en/latest/readme .html中找到。
#7
2
Here you can find my my implementation, you can limit the ram usage by changing the "buffer" variable, there is a bug that the program prints an empty line in the beginning.
在这里,您可以找到我的实现,您可以通过更改“缓冲区”变量来限制ram的使用,有一个bug,程序在开始时打印了一条空行。
And also ram usage may be increase if there is no new lines for more than buffer bytes, "leak" variable will increase until seeing a new line ("\n").
而且,如果没有新的行用于缓冲区字节,那么内存使用量也会增加,“泄漏”变量将增加,直到看到一条新行(“\n”)。
This is also working for 16 GB files which is bigger then my total memory.
这也适用于16gb的文件,它比我的内存大。
import os,sys
buffer = 1024*1024 # 1MB
f = open(sys.argv[1])
f.seek(0, os.SEEK_END)
filesize = f.tell()
division, remainder = divmod(filesize, buffer)
line_leak=''
for chunk_counter in range(1,division + 2):
if division - chunk_counter < 0:
f.seek(0, os.SEEK_SET)
chunk = f.read(remainder)
elif division - chunk_counter >= 0:
f.seek(-(buffer*chunk_counter), os.SEEK_END)
chunk = f.read(buffer)
chunk_lines_reversed = list(reversed(chunk.split('\n')))
if line_leak: # add line_leak from previous chunk to beginning
chunk_lines_reversed[0] += line_leak
# after reversed, save the leakedline for next chunk iteration
line_leak = chunk_lines_reversed.pop()
if chunk_lines_reversed:
print "\n".join(chunk_lines_reversed)
# print the last leaked line
if division - chunk_counter < 0:
print line_leak
#8
2
a simple function to create a second file reversed (linux only):
一个简单的函数来创建第二个文件(只有linux):
import os
def tac(file1, file2):
print(os.system('tac %s > %s' % (file1,file2)))
how to use
如何使用
tac('ordered.csv', 'reversed.csv')
f = open('reversed.csv')
#9
1
Thanks for the answer @srohde. It has a small bug checking for newline character with 'is' operator, and I could not comment on the answer with 1 reputation. Also I'd like to manage file open outside because that enables me to embed my ramblings for luigi tasks.
谢谢你的回答@srohde。它有一个小的bug检查新行字符与'is'操作符,我不能评论的答案有1个信誉。另外,我还想在外部管理文件,因为这使我能够嵌入到luigi任务的漫游。
What I needed to change has the form:
我需要改变的是:
with open(filename) as fp:
for line in fp:
#print line, # contains new line
print '>{}<'.format(line)
I'd love to change to:
我想改一下:
with open(filename) as fp:
for line in reversed_fp_iter(fp, 4):
#print line, # contains new line
print '>{}<'.format(line)
Here is a modified answer that wants a file handle and keeps newlines:
这里有一个修改后的答案,它想要一个文件句柄,并保持新行:
def reversed_fp_iter(fp, buf_size=8192):
"""a generator that returns the lines of a file in reverse order
ref: https://*.com/a/23646049/8776239
"""
segment = None # holds possible incomplete segment at the beginning of the buffer
offset = 0
fp.seek(0, os.SEEK_END)
file_size = remaining_size = fp.tell()
while remaining_size > 0:
offset = min(file_size, offset + buf_size)
fp.seek(file_size - offset)
buffer = fp.read(min(remaining_size, buf_size))
remaining_size -= buf_size
lines = buffer.splitlines(True)
# the first line of the buffer is probably not a complete line so
# we'll save it and append it to the last line of the next buffer
# we read
if segment is not None:
# if the previous chunk starts right from the beginning of line
# do not concat the segment to the last line of new chunk
# instead, yield the segment first
if buffer[-1] == '\n':
#print 'buffer ends with newline'
yield segment
else:
lines[-1] += segment
#print 'enlarged last line to >{}<, len {}'.format(lines[-1], len(lines))
segment = lines[0]
for index in range(len(lines) - 1, 0, -1):
if len(lines[index]):
yield lines[index]
# Don't yield None if the file was empty
if segment is not None:
yield segment
#10
0
def reverse_lines(filename):
y=open(filename).readlines()
return y[::-1]
#11
0
Always use with
when working with files as it handles everything for you:
在处理文件时,要经常使用它,因为它为您处理所有事情:
with open('filename', 'r') as f:
for line in reversed(f.readlines()):
print line
Or in Python 3:
或在Python 3:
with open('filename', 'r') as f:
for line in reversed(list(f.readlines())):
print(line)
#12
0
If you are concerned about file size / memory usage, memory-mapping the file and scanning backwards for newlines is a solution:
如果您关心文件大小/内存使用情况,内存映射文件和向后扫描换行是一个解决方案:
How to search for a string in text files?
如何在文本文件中搜索字符串?
#13
-2
I had to do this some time ago and used the below code. It pipes to the shell. I am afraid i do not have the complete script anymore. If you are on a unixish operating system, you can use "tac", however on e.g. Mac OSX tac command does not work, use tail -r. The below code snippet tests for which platform you're on, and adjusts the command accordingly
我在一段时间之前就必须这样做,并使用下面的代码。它管壳。恐怕我已经没有完整的脚本了。如果您使用的是unixish操作系统,您可以使用“tac”,但是,例如Mac OSX tac命令不工作,使用tail -r。下面的代码片段测试了您所使用的平台,并相应地调整命令。
# We need a command to reverse the line order of the file. On Linux this
# is 'tac', on OSX it is 'tail -r'
# 'tac' is not supported on osx, 'tail -r' is not supported on linux.
if sys.platform == "darwin":
command += "|tail -r"
elif sys.platform == "linux2":
command += "|tac"
else:
raise EnvironmentError('Platform %s not supported' % sys.platform)