逐行读取大文件或将其行存储在数组中。

时间:2022-01-09 15:45:25

I have a large file, 100,000 lines. I can read each line and process it, or I can store the lines in an array then process them. I would prefer to use the array for extra features, but I'm really concerned about the memory usage associated with storing that many lines in an array, and if it's worth it.

我有一个大文件,100,000行。我可以读取每一行并对其进行处理,或者将这些行存储在一个数组中,然后对它们进行处理。我更愿意将数组用于额外的特性,但是我非常关心与在数组中存储这么多行相关的内存使用情况,如果值得的话。

2 个解决方案

#1


3  

There are two functions you should familiarize yourself with.

你应该熟悉两种功能。

The first is file(), which reads an entire file into an array, with each line as an array element. This is good for shorter files, and probably isn't what you want to be using on a 100k line file. This function handles its own file management, so you don't need to explicitly open and close the file yourself.

第一个是file(),它将整个文件读入一个数组,每一行都作为一个数组元素。这对较短的文件很好,而且可能不是您希望在100k行文件中使用的内容。这个函数处理它自己的文件管理,所以您不需要自己显式地打开和关闭文件。

The second is fgets(), which you can use to read a file one line at a time. You can use this to loop for as long as there are more lines to process, and run your line processing inside the loop. You'll need to use fopen() to get a handle on this file, you may want to track the file pointer yourself for recovery management (i.e. so you won't have to restart processing from scratch if something goes sideways and the script fails), etc.

第二个是fgets(),您可以使用它一次读取一行文件。只要有更多的行需要处理,就可以使用这个循环,并在循环中运行行处理。您将需要使用fopen()来获取这个文件的句柄,您可能需要自己跟踪文件指针以进行恢复管理(例如,如果发生了一些事情,并且脚本失败,那么您不必重新从头开始处理),等等。

Hopefully that's enough to get you started.

希望这足够让你们开始了。

#2


1  

How about a combination of the two? Read 1000 lines into an array, process it, delete the array, then read 1000 more, etc. Monitor memory usage and adjust how many you read into an array at a time.

两者结合怎么样?将1000行读入一个数组,处理它,删除数组,然后再读取1000个,等等。监控内存的使用情况,并调整你每次读取一个数组的次数。

#1


3  

There are two functions you should familiarize yourself with.

你应该熟悉两种功能。

The first is file(), which reads an entire file into an array, with each line as an array element. This is good for shorter files, and probably isn't what you want to be using on a 100k line file. This function handles its own file management, so you don't need to explicitly open and close the file yourself.

第一个是file(),它将整个文件读入一个数组,每一行都作为一个数组元素。这对较短的文件很好,而且可能不是您希望在100k行文件中使用的内容。这个函数处理它自己的文件管理,所以您不需要自己显式地打开和关闭文件。

The second is fgets(), which you can use to read a file one line at a time. You can use this to loop for as long as there are more lines to process, and run your line processing inside the loop. You'll need to use fopen() to get a handle on this file, you may want to track the file pointer yourself for recovery management (i.e. so you won't have to restart processing from scratch if something goes sideways and the script fails), etc.

第二个是fgets(),您可以使用它一次读取一行文件。只要有更多的行需要处理,就可以使用这个循环,并在循环中运行行处理。您将需要使用fopen()来获取这个文件的句柄,您可能需要自己跟踪文件指针以进行恢复管理(例如,如果发生了一些事情,并且脚本失败,那么您不必重新从头开始处理),等等。

Hopefully that's enough to get you started.

希望这足够让你们开始了。

#2


1  

How about a combination of the two? Read 1000 lines into an array, process it, delete the array, then read 1000 more, etc. Monitor memory usage and adjust how many you read into an array at a time.

两者结合怎么样?将1000行读入一个数组,处理它,删除数组,然后再读取1000个,等等。监控内存的使用情况,并调整你每次读取一个数组的次数。