用c ++创建大文件的最快方法?

时间:2021-12-16 21:33:36

Create a flat text file in c++ around 50 - 100 MB with the content 'Added first line' should be inserted in to the file for 4 million times

创建一个大约50 - 100 MB的c ++平面文本文件,内容“添加第一行”应插入文件400万次

5 个解决方案

#1


15  

using old style file io

使用旧样式文件io

fopen the file for write.

fopen文件写。

fseek to the desired file size - 1.

fseek到所需的文件大小 - 1。

fwrite a single byte

写一个字节

fclose the file

fclose文件

#2


11  

The fastest way to create a file of a certain size is to simply create a zero-length file using creat() or open() and then change the size using chsize(). This will simply allocate blocks on the disk for the file, the contents will be whatever happened to be in those blocks(). It's very fast since no buffer writing needs to take place.

创建特定大小文件的最快方法是使用creat()或open()创建一个零长度文件,然后使用chsize()更改大小。这将简单地在磁盘上为文件分配块,内容将是那些块()中发生的任何内容。它非常快,因为不需要进行缓冲区写入。

#3


2  

Not sure I understand the question. Do you want to ensure that every character in the file is a printable ASCII character? If so, what about this? Fills the file with "abcdefghabc...."

我不确定我理解这个问题。是否要确保文件中的每个字符都是可打印的ASCII字符?如果是这样,那怎么样?用“abcdefghabc ....”填充文件。

#include <stdio.h>
int main ()
{
   const int FILE_SiZE = 50000; //size in KB
   const int BUFFER_SIZE = 1024;
   char buffer [BUFFER_SIZE + 1];
   int i;
   for(i = 0; i < BUFFER_SIZE; i++)
      buffer[i] = (char)(i%8 + 'a');
   buffer[BUFFER_SIZE] = '\0';

   FILE *pFile = fopen ("somefile.txt", "w");
   for (i = 0; i < FILE_SIZE; i++)
     fprintf(pFile, buffer);

   fclose(pFile);

   return 0;
}

#4


1  

You haven't mentioned the OS but I'll assume creat/open/close/write are available.

你还没有提到操作系统,但我认为可以使用creat / open / close / write。

For truly efficient writing and assuming, say, a 4k page and disk block size and a repeated string:

为了真正有效的写入和假设,例如,4k页面和磁盘块大小和重复的字符串:

  1. open the file.
  2. 打开文件。

  3. allocate 4k * number of chars in your repeated string, ideally aligned to a page boundary.
  4. 在重复的字符串中分配4k *个字符数,理想情况下与页面边界对齐。

  5. print repeated string into the memory 4k times, filling the blocks precisely.
  6. 将重复的字符串打印到内存中4k次,精确填充块。

  7. Use write() to write out the blocks to disk as many times as necessary. You may wish to write a partial piece for the last block to get the size to come out right.
  8. 使用write()根据需要多次将块写入磁盘。您可能希望为最后一个块写一个部分片段以使大小正确。

  9. close the file.
  10. 关闭文件。

This bypasses the buffering of fopen() and friends, which is good and bad: their buffering means that they're nice and fast, but they are still not going to be as efficient as this, which has no overhead of working with the buffer.

这绕过了fopen()和朋友的缓冲,这是好的和坏的:他们的缓冲意味着它们既好又快,但它们仍然没有那么高效,没有使用缓冲区的开销。

This can easily be written in C++ or C, but does assume that you're going to use POSIX calls rather than iostream or stdio for efficiency's sake, so it's outside the core library specification.

这可以很容易地用C ++或C编写,但是假设你为了效率而打算使用POSIX调用而不是iostream或stdio,所以它不在核心库规范之内。

#5


0  

Fastest way to create large file in c++? Ok. I assume fastest way means the one that takes the smallest run time.

用c ++创建大文件的最快方法?好。我认为最快的方式意味着运行时间最短的那个。

Create a flat text file in c++ around 50 - 100 MB with the content 'Added first line' should be inserted in to the file for 4 million times.

创建一个大约50 - 100 MB的c ++平面文本文件,内容“添加第一行”应插入文件400万次。

preallocate the file using old style file io

使用旧样式文件io预分配文件

fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file

create a string containing the "Added first line\n" a thousand times.
find it's length.

preallocate the file using old style file io

使用旧样式文件io预分配文件

fopen the file for write.
fseek to the the string length * 4000
fwrite a single byte
fclose the file

open the file for read/write
loop 4000 times, 
    writing the string to the file.
close the file.

That's my best guess. I'm sure there are a lot of ways to do it.

这是我最好的猜测。我确信有很多方法可以做到这一点。

#1


15  

using old style file io

使用旧样式文件io

fopen the file for write.

fopen文件写。

fseek to the desired file size - 1.

fseek到所需的文件大小 - 1。

fwrite a single byte

写一个字节

fclose the file

fclose文件

#2


11  

The fastest way to create a file of a certain size is to simply create a zero-length file using creat() or open() and then change the size using chsize(). This will simply allocate blocks on the disk for the file, the contents will be whatever happened to be in those blocks(). It's very fast since no buffer writing needs to take place.

创建特定大小文件的最快方法是使用creat()或open()创建一个零长度文件,然后使用chsize()更改大小。这将简单地在磁盘上为文件分配块,内容将是那些块()中发生的任何内容。它非常快,因为不需要进行缓冲区写入。

#3


2  

Not sure I understand the question. Do you want to ensure that every character in the file is a printable ASCII character? If so, what about this? Fills the file with "abcdefghabc...."

我不确定我理解这个问题。是否要确保文件中的每个字符都是可打印的ASCII字符?如果是这样,那怎么样?用“abcdefghabc ....”填充文件。

#include <stdio.h>
int main ()
{
   const int FILE_SiZE = 50000; //size in KB
   const int BUFFER_SIZE = 1024;
   char buffer [BUFFER_SIZE + 1];
   int i;
   for(i = 0; i < BUFFER_SIZE; i++)
      buffer[i] = (char)(i%8 + 'a');
   buffer[BUFFER_SIZE] = '\0';

   FILE *pFile = fopen ("somefile.txt", "w");
   for (i = 0; i < FILE_SIZE; i++)
     fprintf(pFile, buffer);

   fclose(pFile);

   return 0;
}

#4


1  

You haven't mentioned the OS but I'll assume creat/open/close/write are available.

你还没有提到操作系统,但我认为可以使用creat / open / close / write。

For truly efficient writing and assuming, say, a 4k page and disk block size and a repeated string:

为了真正有效的写入和假设,例如,4k页面和磁盘块大小和重复的字符串:

  1. open the file.
  2. 打开文件。

  3. allocate 4k * number of chars in your repeated string, ideally aligned to a page boundary.
  4. 在重复的字符串中分配4k *个字符数,理想情况下与页面边界对齐。

  5. print repeated string into the memory 4k times, filling the blocks precisely.
  6. 将重复的字符串打印到内存中4k次,精确填充块。

  7. Use write() to write out the blocks to disk as many times as necessary. You may wish to write a partial piece for the last block to get the size to come out right.
  8. 使用write()根据需要多次将块写入磁盘。您可能希望为最后一个块写一个部分片段以使大小正确。

  9. close the file.
  10. 关闭文件。

This bypasses the buffering of fopen() and friends, which is good and bad: their buffering means that they're nice and fast, but they are still not going to be as efficient as this, which has no overhead of working with the buffer.

这绕过了fopen()和朋友的缓冲,这是好的和坏的:他们的缓冲意味着它们既好又快,但它们仍然没有那么高效,没有使用缓冲区的开销。

This can easily be written in C++ or C, but does assume that you're going to use POSIX calls rather than iostream or stdio for efficiency's sake, so it's outside the core library specification.

这可以很容易地用C ++或C编写,但是假设你为了效率而打算使用POSIX调用而不是iostream或stdio,所以它不在核心库规范之内。

#5


0  

Fastest way to create large file in c++? Ok. I assume fastest way means the one that takes the smallest run time.

用c ++创建大文件的最快方法?好。我认为最快的方式意味着运行时间最短的那个。

Create a flat text file in c++ around 50 - 100 MB with the content 'Added first line' should be inserted in to the file for 4 million times.

创建一个大约50 - 100 MB的c ++平面文本文件,内容“添加第一行”应插入文件400万次。

preallocate the file using old style file io

使用旧样式文件io预分配文件

fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file

create a string containing the "Added first line\n" a thousand times.
find it's length.

preallocate the file using old style file io

使用旧样式文件io预分配文件

fopen the file for write.
fseek to the the string length * 4000
fwrite a single byte
fclose the file

open the file for read/write
loop 4000 times, 
    writing the string to the file.
close the file.

That's my best guess. I'm sure there are a lot of ways to do it.

这是我最好的猜测。我确信有很多方法可以做到这一点。