如何使用python一次从文件中读取两行

I am coding a python script that parses a text file. The format of this text file is such that each element in the file uses two lines and for convenience I would like to read both lines before parsing. Can this be done in Python?

我正在编写一个解析文本文件的python脚本。这个文本文件的格式是这样的，文件中的每个元素都使用两行，为了方便起见，我想在解析之前读两行。这能用Python实现吗?

I would like to some something like:

我想说的是:

f = open(filename, "r")
for line in f:
    line1 = line
    line2 = f.readline()

f.close

But this breaks saying that:

但这句话的意思是:

ValueError: Mixing iteration and read methods would lose data

ValueError:混合迭代和读取方法会丢失数据

What is the most “pythonic” way to iterate over a list in chunks?
以块形式迭代列表的最“python化”的方式是什么?

14 个解决方案

#1

Similar question here. You can't mix iteration and readline so you need to use one or the other.

类似的问题。您不能混合迭代和readline，因此需要使用其中之一。

while True:
    line1 = f.readline()
    line2 = f.readline()
    if not line2: break  # EOF
    ...

#2

import itertools
with open('a') as f:
    for line1,line2 in itertools.izip_longest(*[f]*2):
        print(line1,line2)

izip_longest returns an iterator, so it should work well even if the file is very large.

izip_longest返回迭代器，因此即使文件很大，它也应该运行良好。

If there are an odd number of lines, then line2 gets the value None on the last iteration.

如果有奇数行，那么line2在最后一次迭代中没有值。

izip_longest is in itertools if you have python 2.6 or later. If you use a prior version, you can pick up a python implementation of izip_longest here. In Python3, itertools.izip_longest is renamed itertools.zip_longest.

如果您有python 2.6或更高版本的python, izip_long在itertools中。如果您使用以前的版本，您可以在这里获得izip_long的python实现。出现在Python3 itertools。izip_longest更名为itertools.zip_longest。

In the comments, it has been asked if this solution reads the whole file first, and then iterates over the file a second time. I believe that it does not. The with open('a') as f line opens a file handle, but does not read the file. f is an iterator, so its contents are not read until requested. izip_longest takes iterators as arguments, and returns an iterator.

在注释中，询问这个解决方案是否先读取整个文件，然后再对该文件进行第二次迭代。我认为事实并非如此。在f行中打开一个文件句柄，但不读取文件。f是一个迭代器，所以直到被请求时才读取它的内容。izip_long将迭代器作为参数，并返回一个迭代器。

izip_longest is indeed fed the same iterator, f, twice. But what ends up happening is that f.next() (or next(f) in Python3) is called on the first argument and then on the second argument. Since next() is being called on the same underlying iterator, successive lines are yielded. This is very different than reading in the whole file. Indeed the purpose of using iterators is precisely to avoid reading in the whole file.

izip_longest被赋予了同一个迭代器f两次。但是最后发生的是，f.next()(或Python3中的next(f))在第一个参数上被调用，然后在第二个参数上被调用。因为next()在相同的底层迭代器上被调用，所以会产生连续的行。这与在整个文件中阅读完全不同。实际上，使用迭代器的目的正是为了避免在整个文件中读取。

I therefore believe the solution works as desired -- the file is only read once by the for-loop.

因此，我相信这个解决方案可以按需要工作——for循环只读取一次文件。

To corroborate this, I ran the izip_longest solution versus a solution using f.readlines(). I put a raw_input() at the end to pause the scripts, and ran ps axuw on each:

为了证实这一点，我使用了izip_long解决方案和使用f.readlines()的解决方案。我在末尾输入了一个raw_input()来暂停脚本，并在每个脚本上运行ps axuw:

% ps axuw | grep izip_longest_method.py

unutbu 11119 2.2 0.2 4520 2712 pts/0 S+ 21:14 0:00 python /home/unutbu/pybin/izip_longest_method.py bigfile

unutbu 11119 2.2 0.2 4520 2712 pts/0 S+ 21:14 0:00 python /home/unutbu/pybin/izip_longest_method。py bigfile

% ps axuw | grep readlines_method.py

unutbu 11317 6.5 8.8 93908 91680 pts/0 S+ 21:16 0:00 python /home/unutbu/pybin/readlines_method.py bigfile

unutbu 11317 6.5 8.8 93908 91680 pts/0 S+ 21:16 0:00 python /home/unutbu/pybin/readlines_method。py bigfile

The readlines clearly reads in the whole file at once. Since the izip_longest_method uses much less memory, I think it is safe to conclude it is not reading in the whole file at once.

readlines可以同时清楚地读取整个文件。由于izip_longest_method使用的内存少得多，我认为可以有把握地得出结论，它不是一次读取整个文件。

#3

use line.next(), eg

使用line.next(),如

f=open("file")
for line in f:
    print line
    nextline=f.next()
    print "next line", nextline
    ....
f.close()

#4

I would proceed in a similar way as ghostdog74, only with the try outside and a few modifications:

我将以类似于ghostdog74的方式进行，只有在外部尝试和一些修改:

try:
    with open(filename) as f:
        for line1 in f:
            line2 = f.next()
            # process line1 and line2 here
except StopIteration:
    print "(End)" # do whatever you need to do with line1 alone

This keeps the code simple and yet robust. Using the with closes the file if something else happens, or just closes the resources once you have exhausted it and exit the loop.

这使得代码既简单又健壮。如果发生了其他事情，使用with close文件，或者在耗尽资源后关闭资源并退出循环。

Note that with needs 2.6, or 2.5 with the with_statement feature enabled.

注意，对于需要2.6，或2.5，启用了with_statement特性。

#5

Works for even and odd-length files. It just ignores the unmatched last line.

适用于偶数和奇数长度的文件。它忽略了最后一行。

f=file("file")

lines = f.readlines()
for even, odd in zip(lines[0::2], lines[1::2]):
    print "even : ", even
    print "odd : ", odd
    print "end cycle"
f.close()

If you have large files, this is not the correct approach. You are loading all the file in memory with readlines(). I once wrote a class that read the file saving the fseek position of each start of line. This allows you to get specific lines without having all the file in memory, and you can also go forward and backward.

如果您有大文件，这不是正确的方法。使用readlines()加载内存中的所有文件。我曾经编写过一个类，它读取每个行开始的fseek位置的文件。这允许您获得特定的行，而无需在内存中保存所有文件，您还可以向前和向后移动。

I paste it here. License is Public domain, meaning, do what you want with it. Please note that this class has been written 6 years ago and I haven't touched or checked it since. I think it's not even file compliant. Caveat emptor. Also, note that this is overkill for your problem. I'm not claiming you should definitely go this way, but I had this code and I enjoy sharing it if you need more complex access.

我在这里粘贴。许可是公共领域，意思是，用它做你想做的。请注意，这门课是6年前写的，从那以后我就再也没有碰过或检查过。我认为它甚至都不兼容文件。购者自慎。另外，请注意，这对您的问题来说是多余的。我并不是说你一定要这么做，但是我有这个代码，如果你需要更复杂的访问，我很喜欢分享它。

import string
import re

class FileReader:
    """ 
    Similar to file class, but allows to access smoothly the lines 
    as when using readlines(), with no memory payload, going back and forth,
    finding regexps and so on.
    """
    def __init__(self,filename): # fold>>
        self.__file=file(filename,"r")
        self.__currentPos=-1
        # get file length
        self.__file.seek(0,0)
        counter=0
        line=self.__file.readline()
        while line != '':
            counter = counter + 1
            line=self.__file.readline()
        self.__length = counter
        # collect an index of filedescriptor positions against
        # the line number, to enhance search
        self.__file.seek(0,0)
        self.__lineToFseek = []

        while True:
            cur=self.__file.tell()
            line=self.__file.readline()
            # if it's not null the cur is valid for
            # identifying a line, so store
            self.__lineToFseek.append(cur)
            if line == '':
                break
    # <<fold
    def __len__(self): # fold>>
        """
        member function for the operator len()
        returns the file length
        FIXME: better get it once when opening file
        """
        return self.__length
        # <<fold
    def __getitem__(self,key): # fold>>
        """ 
        gives the "key" line. The syntax is

        import FileReader
        f=FileReader.FileReader("a_file")
        line=f[2]

        to get the second line from the file. The internal
        pointer is set to the key line
        """

        mylen = self.__len__()
        if key < 0:
            self.__currentPos = -1
            return ''
        elif key > mylen:
            self.__currentPos = mylen
            return ''

        self.__file.seek(self.__lineToFseek[key],0)
        counter=0
        line = self.__file.readline()
        self.__currentPos = key
        return line
        # <<fold
    def next(self): # fold>>
        if self.isAtEOF():
            raise StopIteration
        return self.readline()
    # <<fold
    def __iter__(self): # fold>>
        return self
    # <<fold
    def readline(self): # fold>>
        """
        read a line forward from the current cursor position.
        returns the line or an empty string when at EOF
        """
        return self.__getitem__(self.__currentPos+1)
        # <<fold
    def readbackline(self): # fold>>
        """
        read a line backward from the current cursor position.
        returns the line or an empty string when at Beginning of
        file.
        """
        return self.__getitem__(self.__currentPos-1)
        # <<fold
    def currentLine(self): # fold>>
        """
        gives the line at the current cursor position
        """
        return self.__getitem__(self.__currentPos)
        # <<fold
    def currentPos(self): # fold>>
        """ 
        return the current position (line) in the file
        or -1 if the cursor is at the beginning of the file
        or len(self) if it's at the end of file
        """
        return self.__currentPos
        # <<fold
    def toBOF(self): # fold>>
        """
        go to beginning of file
        """
        self.__getitem__(-1)
        # <<fold
    def toEOF(self): # fold>>
        """
        go to end of file
        """
        self.__getitem__(self.__len__())
        # <<fold
    def toPos(self,key): # fold>>
        """
        go to the specified line
        """
        self.__getitem__(key)
        # <<fold
    def isAtEOF(self): # fold>>
        return self.__currentPos == self.__len__()
        # <<fold
    def isAtBOF(self): # fold>>
        return self.__currentPos == -1
        # <<fold
    def isAtPos(self,key): # fold>>
        return self.__currentPos == key
        # <<fold

    def findString(self, thestring, count=1, backward=0): # fold>>
        """
        find the count occurrence of the string str in the file
        and return the line catched. The internal cursor is placed
        at the same line.
        backward is the searching flow.
        For example, to search for the first occurrence of "hello
        starting from the beginning of the file do:

        import FileReader
        f=FileReader.FileReader("a_file")
        f.toBOF()
        f.findString("hello",1,0)

        To search the second occurrence string from the end of the
        file in backward movement do:

        f.toEOF()
        f.findString("hello",2,1)

        to search the first occurrence from a given (or current) position
        say line 150, going forward in the file 

        f.toPos(150)
        f.findString("hello",1,0)

        return the string where the occurrence is found, or an empty string
        if nothing is found. The internal counter is placed at the corresponding
        line number, if the string was found. In other case, it's set at BOF
        if the search was backward, and at EOF if the search was forward.

        NB: the current line is never evaluated. This is a feature, since
        we can so traverse occurrences with a

        line=f.findString("hello")
        while line == '':
            line.findString("hello")

        instead of playing with a readline every time to skip the current
        line.
        """
        internalcounter=1
        if count < 1:
            count = 1
        while 1:
            if backward == 0:
                line=self.readline()
            else:
                line=self.readbackline()

            if line == '':
                return ''
            if string.find(line,thestring) != -1 :
                if count == internalcounter:
                    return line
                else:
                    internalcounter = internalcounter + 1
                    # <<fold
    def findRegexp(self, theregexp, count=1, backward=0): # fold>>
        """
        find the count occurrence of the regexp in the file
        and return the line catched. The internal cursor is placed
        at the same line.
        backward is the searching flow.
        You need to pass a regexp string as theregexp.
        returns a tuple. The fist element is the matched line. The subsequent elements
        contains the matched groups, if any.
        If no match returns None
        """
        rx=re.compile(theregexp)
        internalcounter=1
        if count < 1:
            count = 1
        while 1:
            if backward == 0:
                line=self.readline()
            else:
                line=self.readbackline()

            if line == '':
                return None
            m=rx.search(line)
            if m != None :
                if count == internalcounter:
                    return (line,)+m.groups()
                else:
                    internalcounter = internalcounter + 1
    # <<fold
    def skipLines(self,key): # fold>>
        """
        skip a given number of lines. Key can be negative to skip
        backward. Return the last line read.
        Please note that skipLines(1) is equivalent to readline()
        skipLines(-1) is equivalent to readbackline() and skipLines(0)
        is equivalent to currentLine()
        """
        return self.__getitem__(self.__currentPos+key)
    # <<fold
    def occurrences(self,thestring,backward=0): # fold>>
        """
        count how many occurrences of str are found from the current
        position (current line excluded... see skipLines()) to the
        begin (or end) of file.
        returns a list of positions where each occurrence is found,
        in the same order found reading the file.
        Leaves unaltered the cursor position.
        """
        curpos=self.currentPos()
        list = []
        line = self.findString(thestring,1,backward)
        while line != '':
            list.append(self.currentPos())
            line = self.findString(thestring,1,backward)
        self.toPos(curpos)
        return list
        # <<fold
    def close(self): # fold>>
        self.__file.close()
    # <<fold

#6

how about this one, anybody seeing a problem with it

这个呢，有人觉得有问题吗

f=open('file_name')

for line,line2 in zip(f,f):
  print line,line2

#7

file_name = 'your_file_name'
file_open = open(file_name, 'r')

def handler(line_one, line_two):
    print(line_one, line_two)

while file_open:
    try:
        one = file_open.next()
        two = file_open.next() 
        handler(one, two)
    except(StopIteration):
        file_open.close()
        break

#8

def readnumlines(file, num=2):
    f = iter(file)
    while True:
        lines = [None] * num
        for i in range(num):
            try:
                lines[i] = f.next()
            except StopIteration: # EOF or not enough lines available
                return
        yield lines

# use like this
f = open("thefile.txt", "r")
for line1, line2 in readnumlines(f):
    # do something with line1 and line2

# or
for line1, line2, line3, ..., lineN in readnumlines(f, N):
    # do something with N lines

#9

f = open(filename, "r")
for line in f:
    line1 = line
    f.next()

f.close

Right now, you can read file every two line. If you like you can also check the f status before f.next()

现在，你可以每两行读一次文件。如果您愿意，也可以在f.next()之前检查f状态

#10

My idea is to create a generator that reads two lines from the file at a time, and returns this as a 2-tuple, This means you can then iterate over the results.

我的想法是创建一个生成器，每次从文件中读取两行，并将其作为一个2元组返回，这意味着您可以对结果进行迭代。

from cStringIO import StringIO

def read_2_lines(src):   
    while True:
        line1 = src.readline()
        if not line1: break
        line2 = src.readline()
        if not line2: break
        yield (line1, line2)


data = StringIO("line1\nline2\nline3\nline4\n")
for read in read_2_lines(data):
    print read

If you have an odd number of lines, it won't work perfectly, but this should give you a good outline.

如果你有奇数行，它不会完美地工作，但这应该给你一个好的轮廓。

#11

I have worked on a similar problem last month. I tried a while loop with f.readline() as well as f.readlines(). My data file is not huge, so I finally chose f.readlines(), which gives me more control of the index, otherwise I have to use f.seek() to move back and forth the file pointer.

我上个月曾研究过一个类似的问题。我使用f.readline()和f.readlines()尝试了一个while循环。我的数据文件不太大，所以我最后选择了f.readlines()，这让我对索引有了更多的控制，否则我必须使用f.seek()来来回移动文件指针。

My case is more complicated than OP. Because my data file is more flexible on how many lines to be parsed each time, so I have to check a few conditions before I can parse the data.

我的情况比opo复杂，因为我的数据文件在每次解析多少行时更灵活，所以在解析数据之前，我必须检查一些条件。

Another problem I found out about f.seek() is that it doesn't handle utf-8 very well when I use codecs.open('', 'r', 'utf-8'), (not exactly sure about the culprit, eventually I gave up this approach.)

我发现的另一个关于f.seek()的问题是，当我使用编解码时，它不能很好地处理utf-8。open(“r”、“utf-8”)，(不确定是谁干的，最后我放弃了这个方法)。

#12

Simple little reader. It will pull lines in pairs of two and return them as a tuple as you iterate over the object. You can close it manually or it will close itself when it falls out of scope.

简单的小读者。当您在对象上进行迭代时，它将以成对的方式提取行并将它们作为元组返回。您可以手动关闭它，或者当它超出范围时关闭它自己。

class doublereader:
    def __init__(self,filename):
        self.f = open(filename, 'r')
    def __iter__(self):
        return self
    def next(self):
        return self.f.next(), self.f.next()
    def close(self):
        if not self.f.closed:
            self.f.close()
    def __del__(self):
        self.close()

#example usage one
r = doublereader(r"C:\file.txt")
for a, h in r:
    print "x:%s\ny:%s" % (a,h)
r.close()

#example usage two
for x,y in doublereader(r"C:\file.txt"):
    print "x:%s\ny:%s" % (x,y)
#closes itself as soon as the loop goes out of scope

#13

If the file is of reasonable size, another approach that uses list-comprehension to read the entire file into a list of 2-tuples, is this:

如果文件大小合理，另一种使用列表理解将整个文件读入2元组列表的方法是:

filaname = '/path/to/file/name'

with open(filename, 'r') as f:
    list_of_2tuples = [ (line,f.readline()) for line in f ]

for (line1,line2) in list_of_2tuples: # Work with them in pairs.
    print('%s :: %s', (line1,line2))

#14

-2

This Python code will print the first two lines:

本Python代码将打印前两行:

import linecache  
filename = "ooxx.txt"  
print(linecache.getline(filename,2))

#1