How do I read every line of a file in Python and store each line as an element in a list?
如何读取Python中的文件的每一行,并将每一行存储为列表中的元素?
I want to read the file line by line and append each line to the end of the list.
我想逐行读取文件,并将每一行附加到列表的末尾。
35 个解决方案
#1
1606
with open(fname) as f:
content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
content = [x.strip() for x in content]
I'm guessing that you meant list
and not array.
我猜你指的是列表而不是数组。
#2
765
See Input and Ouput:
输入和输出:
with open('filename') as f:
lines = f.readlines()
or with stripping the newline character:
或用剥离换行字符:
lines = [line.rstrip('\n') for line in open('filename')]
Editor's note: This answer's original whitespace-stripping command, line.strip()
, as implied by Janus Troelsen's comment, would remove all leading and trailing whitespace, not just the trailing \n
.
编者注:这个答案是原始的whitespace-命令,line.strip(),正如Janus Troelsen的评论所暗示的那样,它将删除所有的前导和尾随空格,而不只是拖尾的\n。
#3
355
This is more explicit than necessary, but does what you want.
这比必要的更明确,但可以实现您想要的。
with open("file.txt", "r") as ins:
array = []
for line in ins:
array.append(line)
#4
201
This will yield an "array" of lines from the file.
这将产生来自文件的“数组”。
lines = tuple(open(filename, 'r'))
#5
148
If you want the \n
included:
如果你想要包含\n:
with open(fname) as f:
content = f.readlines()
If you do not want \n
included:
如果你不想要,包括:
with open(fname) as f:
content = f.read().splitlines()
#6
89
You could simply do the following, as has been suggested:
您可以简单地按照建议的方式进行以下操作:
with open('/your/path/file') as f:
my_lines = f.readlines()
Note that this approach has 2 downsides:
注意,这种方法有两个缺点:
1) You store all the lines in memory. In the general case, this is a very bad idea. The file could be very large, and you could run out of memory. Even if it's not large, it is simply a waste of memory.
1)将所有的行存储在内存中。在一般情况下,这是一个非常糟糕的想法。文件可能非常大,您可能会耗尽内存。即使它不是很大,也只是浪费内存。
2) This does not allow processing of each line as you read them. So if you process your lines after this, it is not efficient (requires two passes rather than one).
这是不允许处理每一行当你读它们。因此,如果您在此之后处理您的行,它是无效的(需要两个传递而不是一个)。
A better approach for the general case would be the following:
对一般情况的一个更好的办法是:
with open('/your/path/file') as f:
for line in f:
process(line)
Where you define your process function any way you want. For example:
你可以任意定义你的过程函数。例如:
def process(line):
if 'save the world' in line.lower():
superman.save_the_world()
(The implementation of the Superman
class is left as an exercise for you).
(超人类的实现留给您作为练习)。
This will work nicely for any file size and you go through your file in just 1 pass. This is typically how generic parsers will work.
这对任何文件大小都适用,您只需通过1次就可以检查文件。这通常是通用解析器的工作方式。
#7
59
If you don't care about closing the file, this one-liner works:
如果您不关心关闭文件,这一行代码可以工作:
lines = open('file.txt').read().split("\n")
The traditional way:
传统的方法:
fp = open('file.txt') # Open file on read mode
lines = fp.read().split("\n") # Create a list containing all lines
fp.close() # Close file
Using with
(recommended):
使用(推荐):
with open('file.txt') as fp:
lines = fp.read().split("\n")
#8
36
This should encapsulate the open command.
这应该封装open命令。
array = []
with open("file.txt", "r") as f:
for line in f:
array.append(line)
#9
35
Data into list
数据列表
Assume that we have a text file with our data like in the following lines:
假设我们有一个文本文件,其数据如下所示:
Text file content:
line 1
line 2
line 3
- Open the cmd in the same directory (right click the mouse and choose cmd or PowerShell)
- 在同一目录中打开cmd(右键单击鼠标,选择cmd或PowerShell)
- Run
python
and in the interpreter write: - 运行python并在解释器中写入:
The Python script
>>> with open("myfile.txt", encoding="utf-8") as file:
... x = [l.strip() for l in file]
>>> x
['line 1','line 2','line 3']
Using append
x = []
with open("myfile.txt") as file:
for l in file:
x.append(l.strip())
Or...
>>> x = open("myfile.txt").read().splitlines()
>>> x
['line 1', 'line 2', 'line 3']
Or...
>>> y = [x.rstrip() for x in open("my_file.txt")]
>>> y
['line 1','line 2','line 3']
#10
31
Clean and Pythonic Way of Reading the Lines of a File Into a List
将文件的行读入列表的干净且python化的方式
First and foremost, you should focus on opening your file and reading its contents in an efficient and pythonic way. Here is an example of the way I personally DO NOT prefer:
首先,您应该专注于打开文件并以一种高效且python化的方式阅读其内容。下面是我个人不喜欢的一个例子:
infile = open('my_file.txt', 'r') # Open the file for reading.
data = infile.read() # Read the contents of the file.
infile.close() # Close the file since we're done using it.
Instead, I prefer the below method of opening files for both reading and writing as it is very clean, and does not require an extra step of closing the file once you are done using it. In the statement below, we're opening the file for reading, and assigning it to the variable 'infile.' Once the code within this statement has finished running, the file will be automatically closed.
相反,我更喜欢下面的打开文件的方法,因为它是非常干净的,并且不需要额外的步骤来关闭文件一旦你使用它。在下面的语句中,我们打开文件进行读取,并将其分配给变量'infile。一旦语句中的代码运行完毕,文件将自动关闭。
# Open the file for reading.
with open('my_file.txt', 'r') as infile:
data = infile.read() # Read the contents of the file into memory.
Now we need to focus on bringing this data into a Python List because they are iterable, efficient, and flexible. In your case, the desired goal is to bring each line of the text file into a separate element. To accomplish this, we will use the splitlines() method as follows:
现在我们需要将这些数据集中到Python列表中,因为它们是可迭代的、高效的和灵活的。在您的示例中,期望的目标是将文本文件的每一行都放到一个单独的元素中。为此,我们将使用splitlines()方法如下:
# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()
The Final Product:
最终产品:
# Open the file for reading.
with open('my_file.txt', 'r') as infile:
data = infile.read() # Read the contents of the file into memory.
# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()
Testing Our Code:
测试代码:
- Contents of the text file:
- 文本文件内容:
A fost odatã ca-n povesti,
A fost ca niciodatã,
Din rude mãri împãrãtesti,
O prea frumoasã fatã.
- Print statements for testing purposes:
- 用于测试目的的打印语句:
print my_list # Print the list.
# Print each line in the list.
for line in my_list:
print line
# Print the fourth element in this list.
print my_list[3]
- Output (different-looking because of unicode characters):
- 输出(由于unicode字符而看起来不同):
['A fost odat\xc3\xa3 ca-n povesti,', 'A fost ca niciodat\xc3\xa3,',
'Din rude m\xc3\xa3ri \xc3\xaemp\xc3\xa3r\xc3\xa3testi,', 'O prea
frumoas\xc3\xa3 fat\xc3\xa3.']
A fost odatã ca-n povesti, A fost ca niciodatã, Din rude mãri
împãrãtesti, O prea frumoasã fatã.
O prea frumoasã fatã.
#11
27
I'd do it like this.
我这样做。
lines = []
with open("myfile.txt") as f:
for line in f:
lines.append(line)
#12
26
To read a file into a list you need to do three things:
要将文件读入列表,你需要做三件事:
- Open the file
- 打开文件
- Read the file
- 读取文件
- Store the contents as list
- 将内容存储为列表
Fortunately Python makes it very easy to do these things so the shortest way to read a file into a list is:
幸运的是,Python很容易做到这些,所以将文件读入列表的最短方法是:
lst = list(open(filename))
However I'll add some more explanation.
然而,我将添加更多的解释。
Opening the file
I assume that you want to open a specific file and you don't deal directly with a file-handle (or a file-like-handle). The most commonly used function to open a file in Python is open
, it takes one mandatory argument and two optional ones in Python 2.7:
我假设您希望打开一个特定的文件,而不直接处理文件句柄(或类似文件的句柄)。在Python中,打开文件最常用的函数是open,在Python 2.7中,它接受一个强制参数和两个可选参数:
- Filename
- 文件名
- Mode
- 模式
- Buffering (I'll ignore this argument in this answer)
- 缓冲(我将在这个答案中忽略这个参数)
The filename should be a string that represents the path to the file. For example:
文件名应该是表示文件路径的字符串。例如:
open('afile') # opens the file named afile in the current working directory
open('adir/afile') # relative path (relative to the current working directory)
open('C:/users/aname/afile') # absolute path (windows)
open('/usr/local/afile') # absolute path (linux)
Note that the file extension needs to be specified. This is especially important for Windows users because file extensions like .txt
or .doc
, etc. are hidden by default when viewed in the explorer.
注意,需要指定文件扩展名。对于Windows用户来说,这一点尤其重要,因为在浏览器中查看文件扩展,比如.txt或.doc等,默认情况下是隐藏的。
The second argument is the mode
, it's r
by default which means "read-only". That's exactly what you need in your case.
第二个参数是模式,默认是r,意思是只读。这正是你需要的。
But in case you actually want to create a file and/or write to a file you'll need a different argument here. There is an excellent answer if you want an overview.
但是,如果您确实想要创建一个文件并/或写入一个文件,那么这里需要使用不同的参数。如果你想要一个概述,有一个很好的答案。
For reading a file you can omit the mode
or pass it in explicitly:
对于读取文件,您可以省略模式或显式地传递它:
open(filename)
open(filename, 'r')
Both will open the file in read-only mode. In case you want to read in a binary file on Windows you need to use the mode rb
:
两者都将以只读模式打开文件。如果你想在Windows上读取二进制文件,你需要使用模式rb:
open(filename, 'rb')
On other platforms the 'b'
(binary mode) is simply ignored.
在其他平台上,二进制模式被忽略。
Now that I've shown how to open
the file, let's talk about the fact that you always need to close
it again. Otherwise it will keep an open file-handle to the file until the process exits (or Python garbages the file-handle).
现在我已经展示了如何打开文件,让我们讨论一下您总是需要再次关闭它的事实。否则,它将对文件保留一个打开的文件句柄,直到进程退出(或者Python对文件句柄进行垃圾处理)。
While you could use:
虽然你可以使用:
f = open(filename)
# ... do stuff with f
f.close()
That will fail to close the file when something between open
and close
throws an exception. You could avoid that by using a try
and finally
:
当打开和关闭之间发生异常时,将无法关闭文件。你可以通过尝试,最后:
f = open(filename)
# nothing in between!
try:
# do stuff with f
finally:
f.close()
However Python provides context managers that have a prettier syntax (but for open
it's almost identical to the try
and finally
above):
然而,Python提供的上下文管理器具有更漂亮的语法(但对于open,它几乎与上面的try和finally相同):
with open(filename) as f:
# do stuff with f
# The file is always closed after the with-scope ends.
The last approach is the recommended approach to open a file in Python!
最后一种方法是建议在Python中打开文件的方法!
Reading the file
Okay, you've opened the file, now how to read it?
好的,你打开了文件,现在怎么读?
The open
function returns a file
object and it supports Pythons iteration protocol. Each iteration will give you a line:
open函数返回一个文件对象,它支持python迭代协议。每次迭代都会给你一条线:
with open(filename) as f:
for line in f:
print(line)
This will print each line of the file. Note however that each line will contain a newline character \n
at the end (you might want to check if your Python is built with universal newlines support - otherwise you could also have \r\n
on Windows or \r
on Mac as newlines). If you don't want that you can could simply remove the last character (or the last two characters on Windows):
这将打印文件的每一行。但是请注意,每一行最后都将包含一个换行字符\n(您可能想要检查您的Python是否构建具有通用换行支持——否则您也可以在Windows上使用\r\n,或者在Mac上使用\r作为换行)。如果您不想删除最后一个字符(或Windows上的最后两个字符):
with open(filename) as f:
for line in f:
print(line[:-1])
But the last line doesn't necessarily has a trailing newline, so one shouldn't use that. One could check if it ends with a trailing newline and if so remove it:
但是最后一行不一定有尾换行,所以不应该用它。你可以检查它是否以尾随的换行结束,如果是这样的话:
with open(filename) as f:
for line in f:
if line.endswith('\n'):
line = line[:-1]
print(line)
But you could simply remove all whitespaces (including the \n
character) from the end of the string, this will also remove all other trailing whitespaces so you have to be careful if these are important:
但是您可以简单地从字符串的末尾删除所有的白空间(包括\n字符),这也将删除所有其他的拖尾白空间,因此您必须小心,如果这些是重要的:
with open(filename) as f:
for line in f:
print(f.rstrip())
However if the lines end with \r\n
(Windows "newlines") that .rstrip()
will also take care of the \r
!
然而,如果一行以\r\n (Windows“newlines”)结尾,那么.rstrip()也会处理\r!
Store the contents as list
Now that you know how to open the file and read it, it's time to store the contents in a list. The simplest option would be to use the list
function:
既然您已经知道如何打开文件并读取它,现在就可以将内容存储到列表中了。最简单的选择是使用列表函数:
with open(filename) as f:
lst = list(f)
In case you want to strip the trailing newlines you could use a list comprehension instead:
如果你想去掉后面的换行符,你可以使用列表理解:
with open(filename) as f:
lst = [line.rstrip() for line in f]
Or even simpler: The .readlines()
method of the file
object by default returns a list
of the lines:
或者更简单:file对象的.readlines()方法默认返回行列表:
with open(filename) as f:
lst = f.readlines()
This will also include the trailing newline characters, if you don't want them I would recommend the [line.rstrip() for line in f]
approach because it avoids keeping two lists containing all the lines in memory.
这也将包括末尾的换行字符,如果您不需要它们,我建议使用[line.rstrip() for line in f]方法,因为它避免在内存中保留两个包含所有行的列表。
There's an additional option to get the desired output, however it's rather "suboptimal": read
the complete file in a string and then split on newlines:
有一个额外的选项可以获得所需的输出,但是它是“次优化”:在字符串中读取完整的文件,然后在换行符上拆分:
with open(filename) as f:
lst = f.read().split('\n')
or:
或者:
with open(filename) as f:
lst = f.read().splitlines()
These take care of the trailing newlines automatically because the split
character isn't included. However they are not ideal because you keep the file as string and as a list of lines in memory!
由于不包含分割字符,因此自动处理尾行。但是它们并不理想,因为您将文件保存为字符串,并将其作为内存中的行列表。
Summary
- Use
with open(...) as f
when opening files because you don't need to take care of closing the file yourself and it closes the file even if some exception happens. - 在打开文件时使用open(…)作为f,因为您不需要自己关闭文件,即使发生异常,它也会关闭文件。
-
file
objects support the iteration protocol so reading a file line-by-line is as simple asfor line in the_file_object:
. - 文件对象支持迭代协议,因此逐行读取文件就像在the_file_object:中读取一行一样简单。
- Always browse the documentation for the available functions/classes. Most of the time there's a perfect match for the task or at least one or two good ones. The obvious choice in this case would be
readlines()
but if you want to process the lines before storing them in the list I would recommend a simple list-comprehension. - 始终浏览文档以获取可用的函数/类。大多数情况下,有一个完美的匹配的任务,或者至少有一两个好的。在这种情况下,最明显的选择是readlines(),但是如果您希望在将这些行存储到列表之前对它们进行处理,我建议您使用简单的列表理解。
#13
20
Here's one more option by using list comprehensions on files;
这里还有一个选项,通过在文件上使用列表理解;
lines = [line.rstrip() for line in open('file.txt')]
This should be more efficient way as the most of the work is done inside the Python interpreter.
这应该是一种更有效的方式,因为大部分工作都是在Python解释器中完成的。
#14
20
Another option is numpy.genfromtxt
, for example:
另一个选择是numpy。genfromtxt,例如:
import numpy as np
data = np.genfromtxt("yourfile.dat",delimiter="\n")
This will make data
a NumPy array with as many rows as are in your file.
这将使数据成为一个具有与文件中一样多行的NumPy数组。
#15
18
If you'd like to read a file from the command line or from stdin, you can also use the fileinput
module:
如果您想从命令行或stdin中读取文件,也可以使用fileinput模块:
# reader.py
import fileinput
content = []
for line in fileinput.input():
content.append(line.strip())
fileinput.close()
Pass files to it like so:
将文件传递给它:
$ python reader.py textfile.txt
Read more here: http://docs.python.org/2/library/fileinput.html
阅读更多:http://docs.python.org/2/library/fileinput.html
#16
15
The simplest way to do it
最简单的方法
A simple way is to:
一个简单的方法是:
- Read the whole file as a string
- 以字符串形式读取整个文件
- Split the string line by line
- 将字符串按行分割
In one line, that would give:
一句话,那就是:
lines = open('C:/path/file.txt').read().splitlines()
#17
14
f = open("your_file.txt",'r')
out = f.readlines() # will append in the list out
Now variable out is a list (array) of what you want. You could either do:
现在变量out是你想要的列表(数组)。你可以做的:
for line in out:
print line
or
或
for line in f:
print line
you'll get the same results.
你会得到同样的结果。
#18
13
Just use the splitlines() functions. Here is an example.
只需使用splitlines()函数。这是一个例子。
inp = "file.txt"
data = open(inp)
dat = data.read()
lst = dat.splitlines()
print lst
# print(lst) # for python 3
In the output you will have the list of lines.
在输出中,您将拥有行列表。
#19
13
A real easy way:
一个真正简单的方法:
with open(file) as g:
stuff = g.readlines()
If you want to make it a fully-fledged program, type this in:
如果你想让它成为一个成熟的程序,输入以下内容:
file = raw_input ("Enter EXACT file name: ")
with open(file) as g:
stuff = g.readlines()
print (stuff)
exit = raw_input("Press enter when you are done.")
For some reason, it doesn't read .py files properly.
由于某些原因,它不能正确读取.py文件。
#20
10
You can just open your file for reading using:
你可以打开你的文件阅读使用:
file1 = open("filename","r")
# And for reading use
lines = file1.readlines()
file1.close()
The list lines
will contain all your lines as individual elements, and you can call a specific element using lines["linenumber-1"]
as Python starts its counting from 0.
列表行将包含作为单个元素的所有行,当Python从0开始计数时,您可以使用line[“linenumber-1”]调用特定的元素。
#21
9
If you want to are faced with a very large / huge file and want to read faster (imagine you are in a Topcoder/Hackerrank coding competition), you might read a considerably bigger chunk of lines into a memory buffer at one time, rather than just iterate line by line at file level.
如果你想要面临一个非常大的/大量文件和想读得更快(想象你在一个Topcoder / Hackerrank编码竞争),你可能会相当大一部分行读入内存缓冲区,而不是在文件级别逐行进行迭代。
buffersize = 2**16
with open(path) as f:
while True:
lines_buffer = f.readlines(buffersize)
if not lines_buffer:
break
for line in lines_buffer:
process(line)
#22
8
Read and write text files with Python 2 and Python 3; it works with Unicode
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Define data
lines = [' A first string ',
'A Unicode sample: €',
'German: äöüß']
# Write text file
with open('file.txt', 'w') as fp:
fp.write('\n'.join(lines))
# Read text file
with open('file.txt', 'r') as fp:
read_lines = fp.readlines()
read_lines = [line.rstrip('\n') for line in read_lines]
print(lines == read_lines)
Things to notice:
事情要注意:
-
with
is a so-called context manager. It makes sure that the opened file is closed again. - with是所谓的上下文管理器。它确保打开的文件再次被关闭。
- All solutions here which simply make
.strip()
or.rstrip()
will fail to reproduce thelines
as they also strip the white space. - 这里的所有只生成.strip()或.rstrip()的解决方案都无法复制这些行,因为它们也会剥离空白。
Common file endings
.txt
. txt
More advanced file writing / reading
- CSV: Super simple format (read & write)
- CSV:超简单格式(读写)
- JSON: Nice for writing human-readable data; VERY commonly used (read & write)
- JSON:适合编写人类可读数据;非常常用(读写)
- YAML: YAML is a superset of JSON, but easier to read (read & write, comparison of JSON and YAML)
- YAML是JSON的超集,但更容易阅读(读和写,比较JSON和YAML)
- pickle: A Python serialization format (read & write)
- pickle: Python的序列化格式(读和写)
- MessagePack (Python package): More compact representation (read & write)
- MessagePack (Python包):更紧凑的表示(读和写)
- HDF5 (Python package): Nice for matrices (read & write)
- HDF5 (Python包):适用于矩阵(读写)
- XML: exists too *sigh* (read & write)
- XML:存在太*叹气*(读和写)
For your application, the following might be important:
对于您的申请,以下内容可能很重要:
- Support by other programming languages
- 支持其他编程语言
- Reading / writing performance
- 读/写性能
- Compactness (file size)
- 密实度(文件大小)
See also: Comparison of data serialization formats
参见:数据序列化格式的比较
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python.
如果您正在寻找一种生成配置文件的方法,您可能想要在Python中阅读我的文章配置文件。
#23
7
To my knowledge Python doesn't have a native array data structure. But it does support the list data structure which is much simpler to use than an array.
据我所知,Python没有一个本机数组数据结构。但是它确实支持列表数据结构,它比数组更容易使用。
array = [] #declaring a list with name '**array**'
with open(PATH,'r') as reader :
for line in reader :
array.append(line)
#24
5
Use this:
用这个:
import pandas as pd
data = pd.read_csv(filename) # You can also add parameters such as header, sep, etc.
array = data.values
data
is a dataframe type, and uses values to get ndarray. You can also get a list by using array.tolist()
.
数据是dataframe类型,并使用值获取ndarray。还可以使用array.tolist()获取列表。
#25
4
You can easily do it by the following piece of code:
您可以通过以下代码轻松完成:
lines = open(filePath).readlines()
#26
4
You could also use the loadtxt command in NumPy. This checks for fewer conditions than genfromtxt, so it may be faster.
您还可以在NumPy中使用loadtxt命令。这种检查的条件比genfromtxt更少,因此可能更快。
import numpy
data = numpy.loadtxt(filename, delimiter="\n")
#27
3
Introduced in Python 3.4, pathlib
has a really convenient method for reading in text from files, as follows:
在Python 3.4中引入,pathlib有一个非常方便的方法,可以从文件中读取文本,如下所示:
from pathlib import Path
p = Path('my_text_file')
lines = p.read_text().splitlines()
(The splitlines
call is what turns it from a string containing the whole contents of the file to a list of lines in the file).
(splitlines调用将它从包含文件全部内容的字符串转换为文件中的行列表)。
pathlib
has a lot of handy conveniences in it. read_text
is nice and concise, and you don't have to worry about opening and closing the file. If all you need to do with the file is read it all in in one go, it's a good choice.
pathlib有很多便利的功能。read_text非常漂亮和简洁,您不必担心打开和关闭文件。如果你需要做的就是一口气把文件读完,这是一个很好的选择。
#28
2
Command line version
#!/bin/python3
import os
import sys
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
filename = dname + sys.argv[1]
arr = open(filename).read().split("\n")
print(arr)
Run with:
python3 somefile.py input_file_name.txt
#29
2
I like to use the following. Reading the lines immediately.
我喜欢用以下方法。立即读取行。
contents = []
for line in open(filepath, 'r').readlines():
contents.append(line.strip())
Or using list comprehension:
或者使用列表理解:
contents = [line.strip() for line in open(filepath, 'r').readlines()]
#30
0
Outline and Summary
With a filename
, handling the file from a Path(filename)
object, or directly with open(filename) as f
, do one of the following:
使用文件名,从路径(文件名)对象处理文件,或直接使用open(文件名)作为f,执行以下操作之一:
list(fileinput.input(filename))
- 列表(fileinput.input(文件名)
- using
with path.open() as f
, callf.readlines()
- 使用path.open()作为f,调用f.readlines()
list(f)
- 列表(f)
path.read_text().splitlines()
- .splitlines path.read_text()()
path.read_text().splitlines(keepends=True)
- path.read_text().splitlines(keepends = True)
- iterate over
fileinput.input
orf
andlist.append
each line one at a time - 遍历fileinput。输入或f和列表。每次增加一行
- pass
f
to a boundlist.extend
method - 将f传递给一个有界列表。扩展方法
- use
f
in a list comprehension - 在列表理解中使用f
I explain the use-case for each below.
我对下面的每个用例进行了解释。
In Python, how do I read a file line-by-line?
This is an excellent question. First, let's create some sample data:
这是一个很好的问题。首先,让我们创建一些示例数据:
from pathlib import Path
Path('filename').write_text('foo\nbar\nbaz')
File objects are lazy iterators, so just iterate over it.
文件对象是惰性迭代器,所以只需对其进行迭代。
filename = 'filename'
with open(filename) as f:
for line in f:
line # do something with the line
Alternatively, if you have multiple files, use fileinput.input
, another lazy iterator. With just one file:
另外,如果您有多个文件,请使用fileinput。输入,另一个懒惰的迭代器。只有一个文件:
import fileinput
for line in fileinput.input(filename):
line # process the line
or for multiple files, pass it a list of filenames:
或者对于多个文件,给它一个文件名列表:
for line in fileinput.input([filename]*2):
line # process the line
Again, f
and fileinput.input
above both are/return lazy iterators. You can only use an iterator one time, so to provide functional code while avoiding verbosity I'll use the slightly more terse fileinput.input(filename)
where apropos from here.
f和fileinput。上面的输入都是/返回惰性迭代器。您只能使用迭代器一次,因此为了在提供函数代码的同时避免冗长,我将使用更简洁的file .input(文件名),从这里开始。
In Python, how do I read a file line-by-line into a list?
Ah but you want it in a list for some reason? I'd avoid that if possible. But if you insist... just pass the result of fileinput.input(filename)
to list
:
啊,但是你为什么要把它列在单子上呢?如果可能的话,我尽量避免。但是如果你坚持…只需将文件输入.input(文件名)的结果传递给列表:
list(fileinput.input(filename))
Another direct answer is to call f.readlines
, which returns the contents of the file (up to an optional hint
number of characters, so you could break this up into multiple lists that way).
另一个直接的答案是调用f。readline,它返回文件的内容(最多可以有一个可选的提示字符数,这样您就可以将其分解为多个列表)。
You can get to this file object two ways. One way is to pass the filename to the open
builtin:
您可以通过两种方式访问这个文件对象。一种方法是将文件名传递给打开的build:
filename = 'filename'
with open(filename) as f:
f.readlines()
or using the new Path object from the pathlib
module (which I have become quite fond of, and will use from here on):
或者使用来自pathlib模块的新Path对象(我非常喜欢这个对象,从这里开始使用):
from pathlib import Path
path = Path(filename)
with path.open() as f:
f.readlines()
list
will also consume the file iterator and return a list - a quite direct method as well:
list也会使用文件迭代器并返回一个list——一个非常直接的方法:
with path.open() as f:
list(f)
If you don't mind reading the entire text into memory as a single string before splitting it, you can do this as a one-liner with the Path
object and the splitlines()
string method. By default, splitlines
removes the newlines:
如果您不介意在分割文本之前将整个文本作为单个字符串读取到内存中,那么可以使用Path对象和splitlines() string方法将其作为一行代码来完成。默认情况下,splitlines删除新行:
path.read_text().splitlines()
If you want to keep the newlines, pass keepends=True
:
如果你想保留新线路,pass keepends=True:
path.read_text().splitlines(keepends=True)
I want to read the file line by line and append each line to the end of the list.
我想逐行读取文件,并将每一行附加到列表的末尾。
Now this is a bit silly to ask for, given that we've demonstrated the end result easily with several methods. But you might need to filter or operate on the lines as you make your list, so let's humor this request.
考虑到我们已经用几种方法很容易地演示了最终结果,现在要求这样做有点傻。但是,您可能需要在列出列表时对行进行过滤或操作,所以让我们对这个请求进行处理。
Using list.append
would allow you to filter or operate on each line before you append it:
使用列表。在添加前,你可以对每一行进行过滤或操作:
line_list = []
for line in fileinput.input(filename):
line_list.append(line)
line_list
Using list.extend
would be a bit more direct, and perhaps useful if you have a preexisting list:
使用列表。扩展将会更直接一些,如果你有一个预先存在的列表,也许会更有用。
line_list = []
line_list.extend(fileinput.input(filename))
line_list
Or more idiomatically, we could instead use a list comprehension, and map and filter inside it if desirable:
或者更习惯地说,我们可以使用列表理解,如果需要的话,可以在里面映射和过滤:
[line for line in fileinput.input(filename)]
Or even more directly, to close the circle, just pass it to list to create a new list directly without operating on the lines:
或者更直接地说,要关闭圆,只需将它传递给list,直接创建一个新的list,而不需要对line进行操作:
list(fileinput.input(filename))
Conclusion
You've seen many ways to get lines from a file into a list, but I'd recommend you avoid materializing large quantities of data into a list and instead use Python's lazy iteration to process the data if possible.
您已经看到了许多将文件中的行转换为列表的方法,但是我建议您避免将大量数据转换为列表,而是尽可能使用Python的惰性迭代来处理数据。
That is, prefer fileinput.input
or with path.open() as f
.
也就是说,喜欢fileinput。输入或使用path.open()作为f。
#1
1606
with open(fname) as f:
content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
content = [x.strip() for x in content]
I'm guessing that you meant list
and not array.
我猜你指的是列表而不是数组。
#2
765
See Input and Ouput:
输入和输出:
with open('filename') as f:
lines = f.readlines()
or with stripping the newline character:
或用剥离换行字符:
lines = [line.rstrip('\n') for line in open('filename')]
Editor's note: This answer's original whitespace-stripping command, line.strip()
, as implied by Janus Troelsen's comment, would remove all leading and trailing whitespace, not just the trailing \n
.
编者注:这个答案是原始的whitespace-命令,line.strip(),正如Janus Troelsen的评论所暗示的那样,它将删除所有的前导和尾随空格,而不只是拖尾的\n。
#3
355
This is more explicit than necessary, but does what you want.
这比必要的更明确,但可以实现您想要的。
with open("file.txt", "r") as ins:
array = []
for line in ins:
array.append(line)
#4
201
This will yield an "array" of lines from the file.
这将产生来自文件的“数组”。
lines = tuple(open(filename, 'r'))
#5
148
If you want the \n
included:
如果你想要包含\n:
with open(fname) as f:
content = f.readlines()
If you do not want \n
included:
如果你不想要,包括:
with open(fname) as f:
content = f.read().splitlines()
#6
89
You could simply do the following, as has been suggested:
您可以简单地按照建议的方式进行以下操作:
with open('/your/path/file') as f:
my_lines = f.readlines()
Note that this approach has 2 downsides:
注意,这种方法有两个缺点:
1) You store all the lines in memory. In the general case, this is a very bad idea. The file could be very large, and you could run out of memory. Even if it's not large, it is simply a waste of memory.
1)将所有的行存储在内存中。在一般情况下,这是一个非常糟糕的想法。文件可能非常大,您可能会耗尽内存。即使它不是很大,也只是浪费内存。
2) This does not allow processing of each line as you read them. So if you process your lines after this, it is not efficient (requires two passes rather than one).
这是不允许处理每一行当你读它们。因此,如果您在此之后处理您的行,它是无效的(需要两个传递而不是一个)。
A better approach for the general case would be the following:
对一般情况的一个更好的办法是:
with open('/your/path/file') as f:
for line in f:
process(line)
Where you define your process function any way you want. For example:
你可以任意定义你的过程函数。例如:
def process(line):
if 'save the world' in line.lower():
superman.save_the_world()
(The implementation of the Superman
class is left as an exercise for you).
(超人类的实现留给您作为练习)。
This will work nicely for any file size and you go through your file in just 1 pass. This is typically how generic parsers will work.
这对任何文件大小都适用,您只需通过1次就可以检查文件。这通常是通用解析器的工作方式。
#7
59
If you don't care about closing the file, this one-liner works:
如果您不关心关闭文件,这一行代码可以工作:
lines = open('file.txt').read().split("\n")
The traditional way:
传统的方法:
fp = open('file.txt') # Open file on read mode
lines = fp.read().split("\n") # Create a list containing all lines
fp.close() # Close file
Using with
(recommended):
使用(推荐):
with open('file.txt') as fp:
lines = fp.read().split("\n")
#8
36
This should encapsulate the open command.
这应该封装open命令。
array = []
with open("file.txt", "r") as f:
for line in f:
array.append(line)
#9
35
Data into list
数据列表
Assume that we have a text file with our data like in the following lines:
假设我们有一个文本文件,其数据如下所示:
Text file content:
line 1
line 2
line 3
- Open the cmd in the same directory (right click the mouse and choose cmd or PowerShell)
- 在同一目录中打开cmd(右键单击鼠标,选择cmd或PowerShell)
- Run
python
and in the interpreter write: - 运行python并在解释器中写入:
The Python script
>>> with open("myfile.txt", encoding="utf-8") as file:
... x = [l.strip() for l in file]
>>> x
['line 1','line 2','line 3']
Using append
x = []
with open("myfile.txt") as file:
for l in file:
x.append(l.strip())
Or...
>>> x = open("myfile.txt").read().splitlines()
>>> x
['line 1', 'line 2', 'line 3']
Or...
>>> y = [x.rstrip() for x in open("my_file.txt")]
>>> y
['line 1','line 2','line 3']
#10
31
Clean and Pythonic Way of Reading the Lines of a File Into a List
将文件的行读入列表的干净且python化的方式
First and foremost, you should focus on opening your file and reading its contents in an efficient and pythonic way. Here is an example of the way I personally DO NOT prefer:
首先,您应该专注于打开文件并以一种高效且python化的方式阅读其内容。下面是我个人不喜欢的一个例子:
infile = open('my_file.txt', 'r') # Open the file for reading.
data = infile.read() # Read the contents of the file.
infile.close() # Close the file since we're done using it.
Instead, I prefer the below method of opening files for both reading and writing as it is very clean, and does not require an extra step of closing the file once you are done using it. In the statement below, we're opening the file for reading, and assigning it to the variable 'infile.' Once the code within this statement has finished running, the file will be automatically closed.
相反,我更喜欢下面的打开文件的方法,因为它是非常干净的,并且不需要额外的步骤来关闭文件一旦你使用它。在下面的语句中,我们打开文件进行读取,并将其分配给变量'infile。一旦语句中的代码运行完毕,文件将自动关闭。
# Open the file for reading.
with open('my_file.txt', 'r') as infile:
data = infile.read() # Read the contents of the file into memory.
Now we need to focus on bringing this data into a Python List because they are iterable, efficient, and flexible. In your case, the desired goal is to bring each line of the text file into a separate element. To accomplish this, we will use the splitlines() method as follows:
现在我们需要将这些数据集中到Python列表中,因为它们是可迭代的、高效的和灵活的。在您的示例中,期望的目标是将文本文件的每一行都放到一个单独的元素中。为此,我们将使用splitlines()方法如下:
# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()
The Final Product:
最终产品:
# Open the file for reading.
with open('my_file.txt', 'r') as infile:
data = infile.read() # Read the contents of the file into memory.
# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()
Testing Our Code:
测试代码:
- Contents of the text file:
- 文本文件内容:
A fost odatã ca-n povesti,
A fost ca niciodatã,
Din rude mãri împãrãtesti,
O prea frumoasã fatã.
- Print statements for testing purposes:
- 用于测试目的的打印语句:
print my_list # Print the list.
# Print each line in the list.
for line in my_list:
print line
# Print the fourth element in this list.
print my_list[3]
- Output (different-looking because of unicode characters):
- 输出(由于unicode字符而看起来不同):
['A fost odat\xc3\xa3 ca-n povesti,', 'A fost ca niciodat\xc3\xa3,',
'Din rude m\xc3\xa3ri \xc3\xaemp\xc3\xa3r\xc3\xa3testi,', 'O prea
frumoas\xc3\xa3 fat\xc3\xa3.']
A fost odatã ca-n povesti, A fost ca niciodatã, Din rude mãri
împãrãtesti, O prea frumoasã fatã.
O prea frumoasã fatã.
#11
27
I'd do it like this.
我这样做。
lines = []
with open("myfile.txt") as f:
for line in f:
lines.append(line)
#12
26
To read a file into a list you need to do three things:
要将文件读入列表,你需要做三件事:
- Open the file
- 打开文件
- Read the file
- 读取文件
- Store the contents as list
- 将内容存储为列表
Fortunately Python makes it very easy to do these things so the shortest way to read a file into a list is:
幸运的是,Python很容易做到这些,所以将文件读入列表的最短方法是:
lst = list(open(filename))
However I'll add some more explanation.
然而,我将添加更多的解释。
Opening the file
I assume that you want to open a specific file and you don't deal directly with a file-handle (or a file-like-handle). The most commonly used function to open a file in Python is open
, it takes one mandatory argument and two optional ones in Python 2.7:
我假设您希望打开一个特定的文件,而不直接处理文件句柄(或类似文件的句柄)。在Python中,打开文件最常用的函数是open,在Python 2.7中,它接受一个强制参数和两个可选参数:
- Filename
- 文件名
- Mode
- 模式
- Buffering (I'll ignore this argument in this answer)
- 缓冲(我将在这个答案中忽略这个参数)
The filename should be a string that represents the path to the file. For example:
文件名应该是表示文件路径的字符串。例如:
open('afile') # opens the file named afile in the current working directory
open('adir/afile') # relative path (relative to the current working directory)
open('C:/users/aname/afile') # absolute path (windows)
open('/usr/local/afile') # absolute path (linux)
Note that the file extension needs to be specified. This is especially important for Windows users because file extensions like .txt
or .doc
, etc. are hidden by default when viewed in the explorer.
注意,需要指定文件扩展名。对于Windows用户来说,这一点尤其重要,因为在浏览器中查看文件扩展,比如.txt或.doc等,默认情况下是隐藏的。
The second argument is the mode
, it's r
by default which means "read-only". That's exactly what you need in your case.
第二个参数是模式,默认是r,意思是只读。这正是你需要的。
But in case you actually want to create a file and/or write to a file you'll need a different argument here. There is an excellent answer if you want an overview.
但是,如果您确实想要创建一个文件并/或写入一个文件,那么这里需要使用不同的参数。如果你想要一个概述,有一个很好的答案。
For reading a file you can omit the mode
or pass it in explicitly:
对于读取文件,您可以省略模式或显式地传递它:
open(filename)
open(filename, 'r')
Both will open the file in read-only mode. In case you want to read in a binary file on Windows you need to use the mode rb
:
两者都将以只读模式打开文件。如果你想在Windows上读取二进制文件,你需要使用模式rb:
open(filename, 'rb')
On other platforms the 'b'
(binary mode) is simply ignored.
在其他平台上,二进制模式被忽略。
Now that I've shown how to open
the file, let's talk about the fact that you always need to close
it again. Otherwise it will keep an open file-handle to the file until the process exits (or Python garbages the file-handle).
现在我已经展示了如何打开文件,让我们讨论一下您总是需要再次关闭它的事实。否则,它将对文件保留一个打开的文件句柄,直到进程退出(或者Python对文件句柄进行垃圾处理)。
While you could use:
虽然你可以使用:
f = open(filename)
# ... do stuff with f
f.close()
That will fail to close the file when something between open
and close
throws an exception. You could avoid that by using a try
and finally
:
当打开和关闭之间发生异常时,将无法关闭文件。你可以通过尝试,最后:
f = open(filename)
# nothing in between!
try:
# do stuff with f
finally:
f.close()
However Python provides context managers that have a prettier syntax (but for open
it's almost identical to the try
and finally
above):
然而,Python提供的上下文管理器具有更漂亮的语法(但对于open,它几乎与上面的try和finally相同):
with open(filename) as f:
# do stuff with f
# The file is always closed after the with-scope ends.
The last approach is the recommended approach to open a file in Python!
最后一种方法是建议在Python中打开文件的方法!
Reading the file
Okay, you've opened the file, now how to read it?
好的,你打开了文件,现在怎么读?
The open
function returns a file
object and it supports Pythons iteration protocol. Each iteration will give you a line:
open函数返回一个文件对象,它支持python迭代协议。每次迭代都会给你一条线:
with open(filename) as f:
for line in f:
print(line)
This will print each line of the file. Note however that each line will contain a newline character \n
at the end (you might want to check if your Python is built with universal newlines support - otherwise you could also have \r\n
on Windows or \r
on Mac as newlines). If you don't want that you can could simply remove the last character (or the last two characters on Windows):
这将打印文件的每一行。但是请注意,每一行最后都将包含一个换行字符\n(您可能想要检查您的Python是否构建具有通用换行支持——否则您也可以在Windows上使用\r\n,或者在Mac上使用\r作为换行)。如果您不想删除最后一个字符(或Windows上的最后两个字符):
with open(filename) as f:
for line in f:
print(line[:-1])
But the last line doesn't necessarily has a trailing newline, so one shouldn't use that. One could check if it ends with a trailing newline and if so remove it:
但是最后一行不一定有尾换行,所以不应该用它。你可以检查它是否以尾随的换行结束,如果是这样的话:
with open(filename) as f:
for line in f:
if line.endswith('\n'):
line = line[:-1]
print(line)
But you could simply remove all whitespaces (including the \n
character) from the end of the string, this will also remove all other trailing whitespaces so you have to be careful if these are important:
但是您可以简单地从字符串的末尾删除所有的白空间(包括\n字符),这也将删除所有其他的拖尾白空间,因此您必须小心,如果这些是重要的:
with open(filename) as f:
for line in f:
print(f.rstrip())
However if the lines end with \r\n
(Windows "newlines") that .rstrip()
will also take care of the \r
!
然而,如果一行以\r\n (Windows“newlines”)结尾,那么.rstrip()也会处理\r!
Store the contents as list
Now that you know how to open the file and read it, it's time to store the contents in a list. The simplest option would be to use the list
function:
既然您已经知道如何打开文件并读取它,现在就可以将内容存储到列表中了。最简单的选择是使用列表函数:
with open(filename) as f:
lst = list(f)
In case you want to strip the trailing newlines you could use a list comprehension instead:
如果你想去掉后面的换行符,你可以使用列表理解:
with open(filename) as f:
lst = [line.rstrip() for line in f]
Or even simpler: The .readlines()
method of the file
object by default returns a list
of the lines:
或者更简单:file对象的.readlines()方法默认返回行列表:
with open(filename) as f:
lst = f.readlines()
This will also include the trailing newline characters, if you don't want them I would recommend the [line.rstrip() for line in f]
approach because it avoids keeping two lists containing all the lines in memory.
这也将包括末尾的换行字符,如果您不需要它们,我建议使用[line.rstrip() for line in f]方法,因为它避免在内存中保留两个包含所有行的列表。
There's an additional option to get the desired output, however it's rather "suboptimal": read
the complete file in a string and then split on newlines:
有一个额外的选项可以获得所需的输出,但是它是“次优化”:在字符串中读取完整的文件,然后在换行符上拆分:
with open(filename) as f:
lst = f.read().split('\n')
or:
或者:
with open(filename) as f:
lst = f.read().splitlines()
These take care of the trailing newlines automatically because the split
character isn't included. However they are not ideal because you keep the file as string and as a list of lines in memory!
由于不包含分割字符,因此自动处理尾行。但是它们并不理想,因为您将文件保存为字符串,并将其作为内存中的行列表。
Summary
- Use
with open(...) as f
when opening files because you don't need to take care of closing the file yourself and it closes the file even if some exception happens. - 在打开文件时使用open(…)作为f,因为您不需要自己关闭文件,即使发生异常,它也会关闭文件。
-
file
objects support the iteration protocol so reading a file line-by-line is as simple asfor line in the_file_object:
. - 文件对象支持迭代协议,因此逐行读取文件就像在the_file_object:中读取一行一样简单。
- Always browse the documentation for the available functions/classes. Most of the time there's a perfect match for the task or at least one or two good ones. The obvious choice in this case would be
readlines()
but if you want to process the lines before storing them in the list I would recommend a simple list-comprehension. - 始终浏览文档以获取可用的函数/类。大多数情况下,有一个完美的匹配的任务,或者至少有一两个好的。在这种情况下,最明显的选择是readlines(),但是如果您希望在将这些行存储到列表之前对它们进行处理,我建议您使用简单的列表理解。
#13
20
Here's one more option by using list comprehensions on files;
这里还有一个选项,通过在文件上使用列表理解;
lines = [line.rstrip() for line in open('file.txt')]
This should be more efficient way as the most of the work is done inside the Python interpreter.
这应该是一种更有效的方式,因为大部分工作都是在Python解释器中完成的。
#14
20
Another option is numpy.genfromtxt
, for example:
另一个选择是numpy。genfromtxt,例如:
import numpy as np
data = np.genfromtxt("yourfile.dat",delimiter="\n")
This will make data
a NumPy array with as many rows as are in your file.
这将使数据成为一个具有与文件中一样多行的NumPy数组。
#15
18
If you'd like to read a file from the command line or from stdin, you can also use the fileinput
module:
如果您想从命令行或stdin中读取文件,也可以使用fileinput模块:
# reader.py
import fileinput
content = []
for line in fileinput.input():
content.append(line.strip())
fileinput.close()
Pass files to it like so:
将文件传递给它:
$ python reader.py textfile.txt
Read more here: http://docs.python.org/2/library/fileinput.html
阅读更多:http://docs.python.org/2/library/fileinput.html
#16
15
The simplest way to do it
最简单的方法
A simple way is to:
一个简单的方法是:
- Read the whole file as a string
- 以字符串形式读取整个文件
- Split the string line by line
- 将字符串按行分割
In one line, that would give:
一句话,那就是:
lines = open('C:/path/file.txt').read().splitlines()
#17
14
f = open("your_file.txt",'r')
out = f.readlines() # will append in the list out
Now variable out is a list (array) of what you want. You could either do:
现在变量out是你想要的列表(数组)。你可以做的:
for line in out:
print line
or
或
for line in f:
print line
you'll get the same results.
你会得到同样的结果。
#18
13
Just use the splitlines() functions. Here is an example.
只需使用splitlines()函数。这是一个例子。
inp = "file.txt"
data = open(inp)
dat = data.read()
lst = dat.splitlines()
print lst
# print(lst) # for python 3
In the output you will have the list of lines.
在输出中,您将拥有行列表。
#19
13
A real easy way:
一个真正简单的方法:
with open(file) as g:
stuff = g.readlines()
If you want to make it a fully-fledged program, type this in:
如果你想让它成为一个成熟的程序,输入以下内容:
file = raw_input ("Enter EXACT file name: ")
with open(file) as g:
stuff = g.readlines()
print (stuff)
exit = raw_input("Press enter when you are done.")
For some reason, it doesn't read .py files properly.
由于某些原因,它不能正确读取.py文件。
#20
10
You can just open your file for reading using:
你可以打开你的文件阅读使用:
file1 = open("filename","r")
# And for reading use
lines = file1.readlines()
file1.close()
The list lines
will contain all your lines as individual elements, and you can call a specific element using lines["linenumber-1"]
as Python starts its counting from 0.
列表行将包含作为单个元素的所有行,当Python从0开始计数时,您可以使用line[“linenumber-1”]调用特定的元素。
#21
9
If you want to are faced with a very large / huge file and want to read faster (imagine you are in a Topcoder/Hackerrank coding competition), you might read a considerably bigger chunk of lines into a memory buffer at one time, rather than just iterate line by line at file level.
如果你想要面临一个非常大的/大量文件和想读得更快(想象你在一个Topcoder / Hackerrank编码竞争),你可能会相当大一部分行读入内存缓冲区,而不是在文件级别逐行进行迭代。
buffersize = 2**16
with open(path) as f:
while True:
lines_buffer = f.readlines(buffersize)
if not lines_buffer:
break
for line in lines_buffer:
process(line)
#22
8
Read and write text files with Python 2 and Python 3; it works with Unicode
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Define data
lines = [' A first string ',
'A Unicode sample: €',
'German: äöüß']
# Write text file
with open('file.txt', 'w') as fp:
fp.write('\n'.join(lines))
# Read text file
with open('file.txt', 'r') as fp:
read_lines = fp.readlines()
read_lines = [line.rstrip('\n') for line in read_lines]
print(lines == read_lines)
Things to notice:
事情要注意:
-
with
is a so-called context manager. It makes sure that the opened file is closed again. - with是所谓的上下文管理器。它确保打开的文件再次被关闭。
- All solutions here which simply make
.strip()
or.rstrip()
will fail to reproduce thelines
as they also strip the white space. - 这里的所有只生成.strip()或.rstrip()的解决方案都无法复制这些行,因为它们也会剥离空白。
Common file endings
.txt
. txt
More advanced file writing / reading
- CSV: Super simple format (read & write)
- CSV:超简单格式(读写)
- JSON: Nice for writing human-readable data; VERY commonly used (read & write)
- JSON:适合编写人类可读数据;非常常用(读写)
- YAML: YAML is a superset of JSON, but easier to read (read & write, comparison of JSON and YAML)
- YAML是JSON的超集,但更容易阅读(读和写,比较JSON和YAML)
- pickle: A Python serialization format (read & write)
- pickle: Python的序列化格式(读和写)
- MessagePack (Python package): More compact representation (read & write)
- MessagePack (Python包):更紧凑的表示(读和写)
- HDF5 (Python package): Nice for matrices (read & write)
- HDF5 (Python包):适用于矩阵(读写)
- XML: exists too *sigh* (read & write)
- XML:存在太*叹气*(读和写)
For your application, the following might be important:
对于您的申请,以下内容可能很重要:
- Support by other programming languages
- 支持其他编程语言
- Reading / writing performance
- 读/写性能
- Compactness (file size)
- 密实度(文件大小)
See also: Comparison of data serialization formats
参见:数据序列化格式的比较
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python.
如果您正在寻找一种生成配置文件的方法,您可能想要在Python中阅读我的文章配置文件。
#23
7
To my knowledge Python doesn't have a native array data structure. But it does support the list data structure which is much simpler to use than an array.
据我所知,Python没有一个本机数组数据结构。但是它确实支持列表数据结构,它比数组更容易使用。
array = [] #declaring a list with name '**array**'
with open(PATH,'r') as reader :
for line in reader :
array.append(line)
#24
5
Use this:
用这个:
import pandas as pd
data = pd.read_csv(filename) # You can also add parameters such as header, sep, etc.
array = data.values
data
is a dataframe type, and uses values to get ndarray. You can also get a list by using array.tolist()
.
数据是dataframe类型,并使用值获取ndarray。还可以使用array.tolist()获取列表。
#25
4
You can easily do it by the following piece of code:
您可以通过以下代码轻松完成:
lines = open(filePath).readlines()
#26
4
You could also use the loadtxt command in NumPy. This checks for fewer conditions than genfromtxt, so it may be faster.
您还可以在NumPy中使用loadtxt命令。这种检查的条件比genfromtxt更少,因此可能更快。
import numpy
data = numpy.loadtxt(filename, delimiter="\n")
#27
3
Introduced in Python 3.4, pathlib
has a really convenient method for reading in text from files, as follows:
在Python 3.4中引入,pathlib有一个非常方便的方法,可以从文件中读取文本,如下所示:
from pathlib import Path
p = Path('my_text_file')
lines = p.read_text().splitlines()
(The splitlines
call is what turns it from a string containing the whole contents of the file to a list of lines in the file).
(splitlines调用将它从包含文件全部内容的字符串转换为文件中的行列表)。
pathlib
has a lot of handy conveniences in it. read_text
is nice and concise, and you don't have to worry about opening and closing the file. If all you need to do with the file is read it all in in one go, it's a good choice.
pathlib有很多便利的功能。read_text非常漂亮和简洁,您不必担心打开和关闭文件。如果你需要做的就是一口气把文件读完,这是一个很好的选择。
#28
2
Command line version
#!/bin/python3
import os
import sys
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
filename = dname + sys.argv[1]
arr = open(filename).read().split("\n")
print(arr)
Run with:
python3 somefile.py input_file_name.txt
#29
2
I like to use the following. Reading the lines immediately.
我喜欢用以下方法。立即读取行。
contents = []
for line in open(filepath, 'r').readlines():
contents.append(line.strip())
Or using list comprehension:
或者使用列表理解:
contents = [line.strip() for line in open(filepath, 'r').readlines()]
#30
0
Outline and Summary
With a filename
, handling the file from a Path(filename)
object, or directly with open(filename) as f
, do one of the following:
使用文件名,从路径(文件名)对象处理文件,或直接使用open(文件名)作为f,执行以下操作之一:
list(fileinput.input(filename))
- 列表(fileinput.input(文件名)
- using
with path.open() as f
, callf.readlines()
- 使用path.open()作为f,调用f.readlines()
list(f)
- 列表(f)
path.read_text().splitlines()
- .splitlines path.read_text()()
path.read_text().splitlines(keepends=True)
- path.read_text().splitlines(keepends = True)
- iterate over
fileinput.input
orf
andlist.append
each line one at a time - 遍历fileinput。输入或f和列表。每次增加一行
- pass
f
to a boundlist.extend
method - 将f传递给一个有界列表。扩展方法
- use
f
in a list comprehension - 在列表理解中使用f
I explain the use-case for each below.
我对下面的每个用例进行了解释。
In Python, how do I read a file line-by-line?
This is an excellent question. First, let's create some sample data:
这是一个很好的问题。首先,让我们创建一些示例数据:
from pathlib import Path
Path('filename').write_text('foo\nbar\nbaz')
File objects are lazy iterators, so just iterate over it.
文件对象是惰性迭代器,所以只需对其进行迭代。
filename = 'filename'
with open(filename) as f:
for line in f:
line # do something with the line
Alternatively, if you have multiple files, use fileinput.input
, another lazy iterator. With just one file:
另外,如果您有多个文件,请使用fileinput。输入,另一个懒惰的迭代器。只有一个文件:
import fileinput
for line in fileinput.input(filename):
line # process the line
or for multiple files, pass it a list of filenames:
或者对于多个文件,给它一个文件名列表:
for line in fileinput.input([filename]*2):
line # process the line
Again, f
and fileinput.input
above both are/return lazy iterators. You can only use an iterator one time, so to provide functional code while avoiding verbosity I'll use the slightly more terse fileinput.input(filename)
where apropos from here.
f和fileinput。上面的输入都是/返回惰性迭代器。您只能使用迭代器一次,因此为了在提供函数代码的同时避免冗长,我将使用更简洁的file .input(文件名),从这里开始。
In Python, how do I read a file line-by-line into a list?
Ah but you want it in a list for some reason? I'd avoid that if possible. But if you insist... just pass the result of fileinput.input(filename)
to list
:
啊,但是你为什么要把它列在单子上呢?如果可能的话,我尽量避免。但是如果你坚持…只需将文件输入.input(文件名)的结果传递给列表:
list(fileinput.input(filename))
Another direct answer is to call f.readlines
, which returns the contents of the file (up to an optional hint
number of characters, so you could break this up into multiple lists that way).
另一个直接的答案是调用f。readline,它返回文件的内容(最多可以有一个可选的提示字符数,这样您就可以将其分解为多个列表)。
You can get to this file object two ways. One way is to pass the filename to the open
builtin:
您可以通过两种方式访问这个文件对象。一种方法是将文件名传递给打开的build:
filename = 'filename'
with open(filename) as f:
f.readlines()
or using the new Path object from the pathlib
module (which I have become quite fond of, and will use from here on):
或者使用来自pathlib模块的新Path对象(我非常喜欢这个对象,从这里开始使用):
from pathlib import Path
path = Path(filename)
with path.open() as f:
f.readlines()
list
will also consume the file iterator and return a list - a quite direct method as well:
list也会使用文件迭代器并返回一个list——一个非常直接的方法:
with path.open() as f:
list(f)
If you don't mind reading the entire text into memory as a single string before splitting it, you can do this as a one-liner with the Path
object and the splitlines()
string method. By default, splitlines
removes the newlines:
如果您不介意在分割文本之前将整个文本作为单个字符串读取到内存中,那么可以使用Path对象和splitlines() string方法将其作为一行代码来完成。默认情况下,splitlines删除新行:
path.read_text().splitlines()
If you want to keep the newlines, pass keepends=True
:
如果你想保留新线路,pass keepends=True:
path.read_text().splitlines(keepends=True)
I want to read the file line by line and append each line to the end of the list.
我想逐行读取文件,并将每一行附加到列表的末尾。
Now this is a bit silly to ask for, given that we've demonstrated the end result easily with several methods. But you might need to filter or operate on the lines as you make your list, so let's humor this request.
考虑到我们已经用几种方法很容易地演示了最终结果,现在要求这样做有点傻。但是,您可能需要在列出列表时对行进行过滤或操作,所以让我们对这个请求进行处理。
Using list.append
would allow you to filter or operate on each line before you append it:
使用列表。在添加前,你可以对每一行进行过滤或操作:
line_list = []
for line in fileinput.input(filename):
line_list.append(line)
line_list
Using list.extend
would be a bit more direct, and perhaps useful if you have a preexisting list:
使用列表。扩展将会更直接一些,如果你有一个预先存在的列表,也许会更有用。
line_list = []
line_list.extend(fileinput.input(filename))
line_list
Or more idiomatically, we could instead use a list comprehension, and map and filter inside it if desirable:
或者更习惯地说,我们可以使用列表理解,如果需要的话,可以在里面映射和过滤:
[line for line in fileinput.input(filename)]
Or even more directly, to close the circle, just pass it to list to create a new list directly without operating on the lines:
或者更直接地说,要关闭圆,只需将它传递给list,直接创建一个新的list,而不需要对line进行操作:
list(fileinput.input(filename))
Conclusion
You've seen many ways to get lines from a file into a list, but I'd recommend you avoid materializing large quantities of data into a list and instead use Python's lazy iteration to process the data if possible.
您已经看到了许多将文件中的行转换为列表的方法,但是我建议您避免将大量数据转换为列表,而是尽可能使用Python的惰性迭代来处理数据。
That is, prefer fileinput.input
or with path.open() as f
.
也就是说,喜欢fileinput。输入或使用path.open()作为f。