I have a text file that has a very long list of items. So I want to sort them alphabetically but I do not want to load all the file into the memory (RAM).
我有一个文本文件,其中包含很长的项目列表。所以我想按字母顺序对它们进行排序,但我不想将所有文件加载到内存(RAM)中。
I tried loading all the contents of the file to an array and sort them just like I do normally. But the system complains that there are no much memory!!
我尝试将文件的所有内容加载到一个数组中,并按照我正常的方式对它们进行排序。但系统抱怨没有太多记忆!!
Thanks, Mohammad
谢谢,*
4 个解决方案
#1
7
You'll need to read up on external sorting. The basic approach is to use some sort of divide-and-conquer routine like merge sort, where you read and sort a portion of the file, then read and sort another portion of the file, etc. and when you get to the end you merge the sorted portions together.
你需要阅读外部排序。基本方法是使用某种分而治之的例程,例如合并排序,在这里您可以读取文件的一部分,然后对文件的另一部分进行读取和排序等等。当您到达最后将排序的部分合并在一起。
#2
4
Maybe the STXXL (Standard Template Library for Extra Large Data Sets) helps.
也许STXXL(超大型数据集的标准模板库)有所帮助。
STXXL offers external sorting amongst others.
STXXL提供外部排序等。
#3
0
You don't have to hold the whole file in memory. If this is a task you don't have to do very often, you can write an application that sorts it very slow. Something like this (pseudo):
您不必将整个文件保存在内存中。如果这是一项您不必经常执行的任务,您可以编写一个非常慢的应用程序。这样的东西(伪):
vector<int> linesProcessed;
for (int i = 0; i < lineCount; i++)
{
if (linesProcessed contains i) continue;
string alphabeticalFirstLine;
int lineIndex;
foreach line in oldFile
{
if (line is before alphabeticalFirstLine)
{
alphabeticalFirstLine = line;
lineIndex = i;
}
}
write alphabeticalFirstLine to newFile;
vector.add(lineIndex);
}
clear vector;
delete oldFile;
rename newFile to oldFile;
#4
0
If you are using some unix-like OS you can use sort command. It will take care about memory consumption. For an example something like "cat large_file | sort" will do the job.
如果您使用的是类似unix的操作系统,则可以使用sort命令。它将关注内存消耗。举个例子,像“cat large_file | sort”这样的东西就可以了。
Or you can write your own / use external sorting from the library. Tell us what language are you using and maybe someone will tell you exact library to use.
或者您可以从库中编写自己的/使用外部排序。告诉我们您使用的语言,也许有人会告诉您使用的确切库。
#1
7
You'll need to read up on external sorting. The basic approach is to use some sort of divide-and-conquer routine like merge sort, where you read and sort a portion of the file, then read and sort another portion of the file, etc. and when you get to the end you merge the sorted portions together.
你需要阅读外部排序。基本方法是使用某种分而治之的例程,例如合并排序,在这里您可以读取文件的一部分,然后对文件的另一部分进行读取和排序等等。当您到达最后将排序的部分合并在一起。
#2
4
Maybe the STXXL (Standard Template Library for Extra Large Data Sets) helps.
也许STXXL(超大型数据集的标准模板库)有所帮助。
STXXL offers external sorting amongst others.
STXXL提供外部排序等。
#3
0
You don't have to hold the whole file in memory. If this is a task you don't have to do very often, you can write an application that sorts it very slow. Something like this (pseudo):
您不必将整个文件保存在内存中。如果这是一项您不必经常执行的任务,您可以编写一个非常慢的应用程序。这样的东西(伪):
vector<int> linesProcessed;
for (int i = 0; i < lineCount; i++)
{
if (linesProcessed contains i) continue;
string alphabeticalFirstLine;
int lineIndex;
foreach line in oldFile
{
if (line is before alphabeticalFirstLine)
{
alphabeticalFirstLine = line;
lineIndex = i;
}
}
write alphabeticalFirstLine to newFile;
vector.add(lineIndex);
}
clear vector;
delete oldFile;
rename newFile to oldFile;
#4
0
If you are using some unix-like OS you can use sort command. It will take care about memory consumption. For an example something like "cat large_file | sort" will do the job.
如果您使用的是类似unix的操作系统,则可以使用sort命令。它将关注内存消耗。举个例子,像“cat large_file | sort”这样的东西就可以了。
Or you can write your own / use external sorting from the library. Tell us what language are you using and maybe someone will tell you exact library to use.
或者您可以从库中编写自己的/使用外部排序。告诉我们您使用的语言,也许有人会告诉您使用的确切库。