如何对包含很长项目列表的文件进行排序?

时间:2021-06-25 20:07:29

I have a text file that has a very long list of items. So I want to sort them alphabetically but I do not want to load all the file into the memory (RAM).

我有一个文本文件,其中包含很长的项目列表。所以我想按字母顺序对它们进行排序,但我不想将所有文件加载到内存(RAM)中。

I tried loading all the contents of the file to an array and sort them just like I do normally. But the system complains that there are no much memory!!

我尝试将文件的所有内容加载到一个数组中,并按照我正常的方式对它们进行排序。但系统抱怨没有太多记忆!!

Thanks, Mohammad

谢谢,*

4 个解决方案

#1


7  

You'll need to read up on external sorting. The basic approach is to use some sort of divide-and-conquer routine like merge sort, where you read and sort a portion of the file, then read and sort another portion of the file, etc. and when you get to the end you merge the sorted portions together.

你需要阅读外部排序。基本方法是使用某种分而治之的例程,例如合并排序,在这里您可以读取文件的一部分,然后对文件的另一部分进行读取和排序等等。当您到达最后将排序的部分合并在一起。

#2


4  

Maybe the STXXL (Standard Template Library for Extra Large Data Sets) helps.

也许STXXL(超大型数据集的标准模板库)有所帮助。

STXXL offers external sorting amongst others.

STXXL提供外部排序等。

#3


0  

You don't have to hold the whole file in memory. If this is a task you don't have to do very often, you can write an application that sorts it very slow. Something like this (pseudo):

您不必将整个文件保存在内存中。如果这是一项您不必经常执行的任务,您可以编写一个非常慢的应用程序。这样的东西(伪):

vector<int> linesProcessed;
for (int i = 0; i < lineCount; i++)
{
   if (linesProcessed contains i) continue;
   string alphabeticalFirstLine;
   int lineIndex;
   foreach line in oldFile
   {
       if (line is before alphabeticalFirstLine)
       {
            alphabeticalFirstLine = line;
            lineIndex = i;
       }
   }
   write alphabeticalFirstLine to newFile;
   vector.add(lineIndex);
}
clear vector;
delete oldFile;
rename newFile to oldFile;

#4


0  

If you are using some unix-like OS you can use sort command. It will take care about memory consumption. For an example something like "cat large_file | sort" will do the job.

如果您使用的是类似unix的操作系统,则可以使用sort命令。它将关注内存消耗。举个例子,像“cat large_file | sort”这样的东西就可以了。

Or you can write your own / use external sorting from the library. Tell us what language are you using and maybe someone will tell you exact library to use.

或者您可以从库中编写自己的/使用外部排序。告诉我们您使用的语言,也许有人会告诉您使用的确切库。

#1


7  

You'll need to read up on external sorting. The basic approach is to use some sort of divide-and-conquer routine like merge sort, where you read and sort a portion of the file, then read and sort another portion of the file, etc. and when you get to the end you merge the sorted portions together.

你需要阅读外部排序。基本方法是使用某种分而治之的例程,例如合并排序,在这里您可以读取文件的一部分,然后对文件的另一部分进行读取和排序等等。当您到达最后将排序的部分合并在一起。

#2


4  

Maybe the STXXL (Standard Template Library for Extra Large Data Sets) helps.

也许STXXL(超大型数据集的标准模板库)有所帮助。

STXXL offers external sorting amongst others.

STXXL提供外部排序等。

#3


0  

You don't have to hold the whole file in memory. If this is a task you don't have to do very often, you can write an application that sorts it very slow. Something like this (pseudo):

您不必将整个文件保存在内存中。如果这是一项您不必经常执行的任务,您可以编写一个非常慢的应用程序。这样的东西(伪):

vector<int> linesProcessed;
for (int i = 0; i < lineCount; i++)
{
   if (linesProcessed contains i) continue;
   string alphabeticalFirstLine;
   int lineIndex;
   foreach line in oldFile
   {
       if (line is before alphabeticalFirstLine)
       {
            alphabeticalFirstLine = line;
            lineIndex = i;
       }
   }
   write alphabeticalFirstLine to newFile;
   vector.add(lineIndex);
}
clear vector;
delete oldFile;
rename newFile to oldFile;

#4


0  

If you are using some unix-like OS you can use sort command. It will take care about memory consumption. For an example something like "cat large_file | sort" will do the job.

如果您使用的是类似unix的操作系统,则可以使用sort命令。它将关注内存消耗。举个例子,像“cat large_file | sort”这样的东西就可以了。

Or you can write your own / use external sorting from the library. Tell us what language are you using and maybe someone will tell you exact library to use.

或者您可以从库中编写自己的/使用外部排序。告诉我们您使用的语言,也许有人会告诉您使用的确切库。