如何在Python中逐行读取文件?

时间:2022-06-25 20:44:14

In pre-historic times (Python 1.4) we did:

在史前时代(Python 1.4),我们做到了:

fp = open('filename.txt')
while 1:
    line = fp.readline()
    if not line:
        break
    print line

after Python 2.1, we did:

在Python 2.1之后,我们做到了:

for line in open('filename.txt').xreadlines():
    print line

before we got the convenient iterator protocol in Python 2.3, and could do:

在我们得到Python 2.3中方便的迭代器协议之前,可以:

for line in open('filename.txt'):
    print line

I've seen some examples using the more verbose:

我看到过一些使用更冗长的例子:

with open('filename.txt') as fp:
    for line in fp:
        print line

is this the preferred method going forwards?

这是未来的首选方法吗?

[edit] I get that the with statement ensures closing of the file... but why isn't that included in the iterator protocol for file objects?

[编辑]我得到with语句确保文件关闭…但是为什么文件对象的迭代器协议中没有包含这个呢?

4 个解决方案

#1


180  

There is exactly one reason why the following is prefered:

有一个确切的原因,为什么以下是首选的:

with open('filename.txt') as fp:
    for line in fp:
        print line

We are all spoiled by CPython's relatively deterministic reference-counting scheme for garbage collection. Other, hypothetical implementations of Python will not necessarily close the file "quickly enough" without the with block if they use some other scheme to reclaim memory.

我们都被CPython相对确定的垃圾收集引用计数方案惯坏了。另外,假设的Python实现在没有with块的情况下不会“足够快”地关闭文件,如果它们使用其他方案来回收内存的话。

In such an implementation, you might get a "too many files open" error from the OS if your code opens files faster than the garbage collector calls finalizers on orphaned file handles. The usual workaround is to trigger the GC immediately, but this is a nasty hack and it has to be done by every function that could encounter the error, including those in libraries. What a nightmare.

在这样的实现中,如果您的代码打开文件的速度比垃圾收集器对孤立的文件处理程序的终结速度快,那么您可能会从操作系统中得到“太多的文件打开”错误。通常的解决方案是立即触发GC,但这是一个令人讨厌的操作,必须由每个可能遇到错误的函数来执行,包括库中的函数。一场噩梦。

Or you could just use the with block.

或者你可以用with块。

Bonus Question

(Stop reading now if are only interested in the objective aspects of the question.)

(如果你只对问题的客观方面感兴趣,现在就停止阅读。)

Why isn't that included in the iterator protocol for file objects?

为什么它没有包含在文件对象的迭代器协议中呢?

This is a subjective question about API design, so I have a subjective answer in two parts.

这是一个关于API设计的主观问题,所以我有两个主观答案。

On a gut level, this feels wrong, because it makes iterator protocol do two separate things—iterate over lines and close the file handle—and it's often a bad idea to make a simple-looking function do two actions. In this case, it feels especially bad because iterators relate in a quasi-functional, value-based way to the contents of a file, but managing file handles is a completely separate task. Squashing both, invisibly, into one action, is surprising to humans who read the code and makes it more difficult to reason about program behavior.

在本质上,这感觉是错误的,因为它让iterator协议做两件事—遍历行并关闭文件handlder—并且让看起来简单的函数做两个操作通常是一个坏主意。在这种情况下,它感觉特别糟糕,因为迭代器以一种准功能的、基于值的方式与文件的内容相关联,但是管理文件句柄是一个完全独立的任务。不知不觉地将两者都压缩为一个动作,对于阅读代码的人来说是令人惊讶的,这使得对程序行为的推理变得更加困难。

Other languages have essentially come to the same conclusion. Haskell briefly flirted with so-called "lazy IO" which allows you to iterate over a file and have it automatically closed when you get to the end of the stream, but it's almost universally discouraged to use lazy IO in Haskell these days, and Haskell users have mostly moved to more explicit resource management like Conduit which behaves more like the with block in Python.

其他语言基本上也得出了同样的结论。Haskell短暂地所谓的“懒惰的IO”你可以遍历一个文件,它会自动关闭,当你到达流的结束,但它是普遍不使用懒惰IO Haskell这些天,和Haskell用户大多搬到更明确的资源管理像管道与块在Python中表现得更像。

On a technical level, there are some things you may want to do with a file handle in Python which would not work as well if iteration closed the file handle. For example, suppose I need to iterate over the file twice:

在技术层面上,您可能想要对Python中的文件句柄做一些事情,如果迭代关闭了文件句柄,那么这些事情就不能正常工作了。例如,假设我需要遍历文件两次:

with open('filename.txt') as fp:
    for line in fp:
        ...
    fp.seek(0)
    for line in fp:
        ...

While this is a less common use case, consider the fact that I might have just added the three lines of code at the bottom to an existing code base which originally had the top three lines. If iteration closed the file, I wouldn't be able to do that. So keeping iteration and resource management separate makes it easier to compose chunks of code into a larger, working Python program.

虽然这是一个不太常见的用例,但是请考虑这样一个事实:我可能刚刚将下面的三行代码添加到一个已有的代码库中,该代码库最初只有前三行。如果迭代关闭了文件,我就不能这样做了。因此,将迭代和资源管理分离开来,可以更容易地将代码块组合成一个更大的、可工作的Python程序。

Composability is one of the most important usability features of a language or API.

可组合性是语言或API最重要的可用性特性之一。

#2


18  

Yes,

是的,

with open('filename.txt') as fp:
    for line in fp:
        print line

is the way to go.

这是条路。

It is not more verbose. It is more safe.

它并不太冗长。它更安全。

#3


3  

if you're turned off by the extra line, you can use a wrapper function like so:

如果您被额外的行关闭,您可以使用包装函数如下:

def with_iter(iterable):
    with iterable as iter:
        for item in iter:
            yield item

for line in with_iter(open('...')):
    ...

in Python 3.3, the yield from statement would make this even shorter:

在Python 3.3中,from语句的收益会使其更短:

def with_iter(iterable):
    with iterable as iter:
        yield from iter

#4


-2  

f = open('test.txt','r')
for line in f.xreadlines():
    print line
f.close()

#1


180  

There is exactly one reason why the following is prefered:

有一个确切的原因,为什么以下是首选的:

with open('filename.txt') as fp:
    for line in fp:
        print line

We are all spoiled by CPython's relatively deterministic reference-counting scheme for garbage collection. Other, hypothetical implementations of Python will not necessarily close the file "quickly enough" without the with block if they use some other scheme to reclaim memory.

我们都被CPython相对确定的垃圾收集引用计数方案惯坏了。另外,假设的Python实现在没有with块的情况下不会“足够快”地关闭文件,如果它们使用其他方案来回收内存的话。

In such an implementation, you might get a "too many files open" error from the OS if your code opens files faster than the garbage collector calls finalizers on orphaned file handles. The usual workaround is to trigger the GC immediately, but this is a nasty hack and it has to be done by every function that could encounter the error, including those in libraries. What a nightmare.

在这样的实现中,如果您的代码打开文件的速度比垃圾收集器对孤立的文件处理程序的终结速度快,那么您可能会从操作系统中得到“太多的文件打开”错误。通常的解决方案是立即触发GC,但这是一个令人讨厌的操作,必须由每个可能遇到错误的函数来执行,包括库中的函数。一场噩梦。

Or you could just use the with block.

或者你可以用with块。

Bonus Question

(Stop reading now if are only interested in the objective aspects of the question.)

(如果你只对问题的客观方面感兴趣,现在就停止阅读。)

Why isn't that included in the iterator protocol for file objects?

为什么它没有包含在文件对象的迭代器协议中呢?

This is a subjective question about API design, so I have a subjective answer in two parts.

这是一个关于API设计的主观问题,所以我有两个主观答案。

On a gut level, this feels wrong, because it makes iterator protocol do two separate things—iterate over lines and close the file handle—and it's often a bad idea to make a simple-looking function do two actions. In this case, it feels especially bad because iterators relate in a quasi-functional, value-based way to the contents of a file, but managing file handles is a completely separate task. Squashing both, invisibly, into one action, is surprising to humans who read the code and makes it more difficult to reason about program behavior.

在本质上,这感觉是错误的,因为它让iterator协议做两件事—遍历行并关闭文件handlder—并且让看起来简单的函数做两个操作通常是一个坏主意。在这种情况下,它感觉特别糟糕,因为迭代器以一种准功能的、基于值的方式与文件的内容相关联,但是管理文件句柄是一个完全独立的任务。不知不觉地将两者都压缩为一个动作,对于阅读代码的人来说是令人惊讶的,这使得对程序行为的推理变得更加困难。

Other languages have essentially come to the same conclusion. Haskell briefly flirted with so-called "lazy IO" which allows you to iterate over a file and have it automatically closed when you get to the end of the stream, but it's almost universally discouraged to use lazy IO in Haskell these days, and Haskell users have mostly moved to more explicit resource management like Conduit which behaves more like the with block in Python.

其他语言基本上也得出了同样的结论。Haskell短暂地所谓的“懒惰的IO”你可以遍历一个文件,它会自动关闭,当你到达流的结束,但它是普遍不使用懒惰IO Haskell这些天,和Haskell用户大多搬到更明确的资源管理像管道与块在Python中表现得更像。

On a technical level, there are some things you may want to do with a file handle in Python which would not work as well if iteration closed the file handle. For example, suppose I need to iterate over the file twice:

在技术层面上,您可能想要对Python中的文件句柄做一些事情,如果迭代关闭了文件句柄,那么这些事情就不能正常工作了。例如,假设我需要遍历文件两次:

with open('filename.txt') as fp:
    for line in fp:
        ...
    fp.seek(0)
    for line in fp:
        ...

While this is a less common use case, consider the fact that I might have just added the three lines of code at the bottom to an existing code base which originally had the top three lines. If iteration closed the file, I wouldn't be able to do that. So keeping iteration and resource management separate makes it easier to compose chunks of code into a larger, working Python program.

虽然这是一个不太常见的用例,但是请考虑这样一个事实:我可能刚刚将下面的三行代码添加到一个已有的代码库中,该代码库最初只有前三行。如果迭代关闭了文件,我就不能这样做了。因此,将迭代和资源管理分离开来,可以更容易地将代码块组合成一个更大的、可工作的Python程序。

Composability is one of the most important usability features of a language or API.

可组合性是语言或API最重要的可用性特性之一。

#2


18  

Yes,

是的,

with open('filename.txt') as fp:
    for line in fp:
        print line

is the way to go.

这是条路。

It is not more verbose. It is more safe.

它并不太冗长。它更安全。

#3


3  

if you're turned off by the extra line, you can use a wrapper function like so:

如果您被额外的行关闭,您可以使用包装函数如下:

def with_iter(iterable):
    with iterable as iter:
        for item in iter:
            yield item

for line in with_iter(open('...')):
    ...

in Python 3.3, the yield from statement would make this even shorter:

在Python 3.3中,from语句的收益会使其更短:

def with_iter(iterable):
    with iterable as iter:
        yield from iter

#4


-2  

f = open('test.txt','r')
for line in f.xreadlines():
    print line
f.close()