将数据写入文件的最有效方法

时间:2022-06-01 17:51:54

I want to write 2TB data into one file, in the future it might be a petabyte.

我想将2TB数据写入一个文件,将来它可能是一个PB级。

The data is composed of all '1'. For example, 2TB data consisting of "1111111111111......11111" (each byte is represented by '1').

数据由全“1”组成。例如,2TB数据由“1111111111111 ...... 11111”组成(每个字节用'1'表示)。

Following is my way:

以下是我的方式:

File.open("data",File::RDWR||File::CREAT) do |file|
  2*1024*1024*1024*1024.times do
  file.write('1')
  end
end

That means, File.write is called 2TB times. From the point of Ruby, is there a better way to implement it?

这意味着,File.write被称为2TB次。从Ruby的角度来看,有没有更好的方法来实现它?

4 个解决方案

#1


7  

You have a few problems:

你有一些问题:

  1. File::RDWR||File::CREAT always evaluates to File::RDWR. You mean File::RDWR|File::CREAT (| rather than ||).

    File :: RDWR || File :: CREAT始终求值为File :: RDWR。你的意思是File :: RDWR | File :: CREAT(|而不是||)。

  2. 2*1024*1024*1024*1024.times do runs the loop 1024 times then multiplies the result of the loop by the stuff on the left. You mean (2*1024*1024*1024*1024).times do.

    2 * 1024 * 1024 * 1024 * 1024.times运行循环1024次然后将循环的结果乘以左边的东西。你的意思是(2 * 1024 * 1024 * 1024 * 1024)。

Regarding your question, I get significant speedup by writing 1024 bytes at a time:

关于你的问题,我通过一次写入1024个字节来获得显着的加速:

File.open("data",File::RDWR|File::CREAT) do |file|
  buf = "1" * 1024
  (2*1024*1024*1024).times do
    file.write(buf)
  end
end

You might experiment and find a better buffer size than 1024.

您可以尝试并找到比1024更好的缓冲区大小。

#2


0  

Don't know which OS you are using but the fastest approach would be to us a system copy to concatenate files to one big file, you can script that. An example. If you start with a string like "1" and echo it to a file

不知道你使用的操作系统,但最快的方法是将系统副本连接到一个大文件,你可以编写脚本。一个例子。如果以“1”之类的字符串开头并将其回显到文件中

echo "1" > file1

you can concatenate this file with itself a number of time to a new file, in windows you have to use the parameter /b for binary copy to do that.

您可以将此文件与自身连接一段时间到新文件,在Windows中您必须使用参数/ b进行二进制复制才能执行此操作。

copy /b file1+file1 file2

gives you a file2 of 12 bytes (including the CR)

给你一个12字节的文件2(包括CR)

copy file2+file2 file1

gives you 24 bytes etc

给你24个字节等

I will let the math (and the fun of Rubying this) to you but you will reach your size quick enough and probably faster than the accepted answer.

我会把数学(和Rubying的乐趣)给你,但你会达到足够快的速度,并且可能比接受的答案更快。

#3


0  

A related answer, if you want to write binary zeros with any size, just do this using the dd command (Linux/Mac):

一个相关的答案,如果你想用任何大小编写二进制零,只需使用dd命令(Linux / Mac)执行此操作:

dd if=/dev/zero of=output_file bs=128K count=8000

bs is the block size (number of bytes to read/write at a time. count is the number of blocks. The above line writes 1 Gegabyte of zeros in output_file in just 10 seconds on my machine:

bs是块大小(一次读/写的字节数.count是块的数量。上面的行在我的机器上只需10秒就在output_file中写入1 GB的零:

1048576000 bytes (1.0 GB) copied, 10.275 s, 102 MB/s

Could be inspiring to someone!

可能会鼓舞人!

#4


-2  

The data is all ones? Then there is no need to write the ones, just write the number of ones.

数据全部都是?然后就没有必要写那些,只写一些。

file.write( 2*1024*1024*1024*1024 )

Simple, yes?

#1


7  

You have a few problems:

你有一些问题:

  1. File::RDWR||File::CREAT always evaluates to File::RDWR. You mean File::RDWR|File::CREAT (| rather than ||).

    File :: RDWR || File :: CREAT始终求值为File :: RDWR。你的意思是File :: RDWR | File :: CREAT(|而不是||)。

  2. 2*1024*1024*1024*1024.times do runs the loop 1024 times then multiplies the result of the loop by the stuff on the left. You mean (2*1024*1024*1024*1024).times do.

    2 * 1024 * 1024 * 1024 * 1024.times运行循环1024次然后将循环的结果乘以左边的东西。你的意思是(2 * 1024 * 1024 * 1024 * 1024)。

Regarding your question, I get significant speedup by writing 1024 bytes at a time:

关于你的问题,我通过一次写入1024个字节来获得显着的加速:

File.open("data",File::RDWR|File::CREAT) do |file|
  buf = "1" * 1024
  (2*1024*1024*1024).times do
    file.write(buf)
  end
end

You might experiment and find a better buffer size than 1024.

您可以尝试并找到比1024更好的缓冲区大小。

#2


0  

Don't know which OS you are using but the fastest approach would be to us a system copy to concatenate files to one big file, you can script that. An example. If you start with a string like "1" and echo it to a file

不知道你使用的操作系统,但最快的方法是将系统副本连接到一个大文件,你可以编写脚本。一个例子。如果以“1”之类的字符串开头并将其回显到文件中

echo "1" > file1

you can concatenate this file with itself a number of time to a new file, in windows you have to use the parameter /b for binary copy to do that.

您可以将此文件与自身连接一段时间到新文件,在Windows中您必须使用参数/ b进行二进制复制才能执行此操作。

copy /b file1+file1 file2

gives you a file2 of 12 bytes (including the CR)

给你一个12字节的文件2(包括CR)

copy file2+file2 file1

gives you 24 bytes etc

给你24个字节等

I will let the math (and the fun of Rubying this) to you but you will reach your size quick enough and probably faster than the accepted answer.

我会把数学(和Rubying的乐趣)给你,但你会达到足够快的速度,并且可能比接受的答案更快。

#3


0  

A related answer, if you want to write binary zeros with any size, just do this using the dd command (Linux/Mac):

一个相关的答案,如果你想用任何大小编写二进制零,只需使用dd命令(Linux / Mac)执行此操作:

dd if=/dev/zero of=output_file bs=128K count=8000

bs is the block size (number of bytes to read/write at a time. count is the number of blocks. The above line writes 1 Gegabyte of zeros in output_file in just 10 seconds on my machine:

bs是块大小(一次读/写的字节数.count是块的数量。上面的行在我的机器上只需10秒就在output_file中写入1 GB的零:

1048576000 bytes (1.0 GB) copied, 10.275 s, 102 MB/s

Could be inspiring to someone!

可能会鼓舞人!

#4


-2  

The data is all ones? Then there is no need to write the ones, just write the number of ones.

数据全部都是?然后就没有必要写那些,只写一些。

file.write( 2*1024*1024*1024*1024 )

Simple, yes?