Linux:何时使用分散/收集IO (readv, writev)与使用fread的大型缓冲区

时间:2022-12-31 08:30:56

In scatter and gather (i.e. readv and writev), Linux reads into multiple buffers and writes from multiple buffers.

在分散和收集(例如readv和writev)中,Linux读取多个缓冲区,并从多个缓冲区写入。

If say, I have a vector of 3 buffers, I can use readv, OR I can use a single buffer, which is of combined size of 3 buffers and do fread.

如果我有一个3个缓冲区的向量,我可以使用readv,或者我可以使用一个单独的缓冲区,它的组合大小是3个缓冲区,然后做fread。

Hence, I am confused: For which cases should scatter/gather be used and when should a single large buffer be used?

因此,我感到困惑:对于哪些情况应该使用/收集,何时应该使用一个大型缓冲区?

1 个解决方案

#1


86  

The main convenience offered by readv, writev is:

readv, writev提供的主要便利是:

  1. It allows working with non contiguous blocks of data. i.e. buffers need not be part of an array, but separately allocated.
  2. 它允许处理非连续的数据块。例如,缓冲区不需要是数组的一部分,而是单独分配的。
  3. The I/O is 'atomic'. i.e. If you do a writev, all the elements in the vector will be written in one contiguous operation, and writes done by other processes will not occur in between them.
  4. I / O是“原子”。例如,如果您执行一个writev,那么向量中的所有元素都将在一个连续的操作中被写入,而由其他进程执行的写入将不会在它们之间发生。

e.g. say, your data is naturally segmented, and comes from different sources:

例如,你的数据是自然分割的,来自不同的来源:

struct foo *my_foo;
struct bar *my_bar;
struct baz *my_baz;

my_foo = get_my_foo();
my_bar = get_my_bar();
my_baz = get_my_baz();

Now, all three 'buffers' are not one big contiguous block. But you want to write them contiguously into a file, for whatever reason (say for example, they are fields in a file header for a file format).

现在,这三个“缓冲区”都不是一个大的连续块。但是,无论出于什么原因,您希望将它们连续地写入文件(例如,它们是文件格式的文件头中的字段)。

If you use write you have to choose between:

如果你使用书面形式,你必须在以下两者之间做出选择:

  1. Copying them over into one block of memory using, say, memcpy (overhead), followed by a single write call. Then the write will be atomic.
  2. 使用memcpy(开销)将它们复制到一个内存块中,然后进行一次写入调用。那么写入将是原子的。
  3. Making three separate calls to write (overhead). Also, write calls from other processes can intersperse between these writes (not atomic).
  4. 写三个独立的调用(开销)。另外,来自其他进程的写调用可以在这些写(不是原子)之间穿插。

If you use writev instead, its all good:

如果你用writev代替,一切都好:

  1. You make exactly one system call, and no memcpy to make a single buffer from the three.
  2. 您只执行一个系统调用,而没有memcpy从这三个调用中创建一个缓冲区。
  3. Also, the three buffers are written atomically, as one block write. i.e. if other processes also write, then these writes will not come in between the writes of the three vectors.
  4. 另外,三个缓冲区用原子方式编写,就像一个块写的那样。例如,如果其他进程也写,那么这些写将不会出现在三个向量的写之间。

So you would do something like:

所以你会这样做:

struct iovec iov[3];

iov[0].iov_base = my_foo;
iov[0].iov_len = sizeof (struct foo);
iov[1].iov_base = my_bar;
iov[1].iov_len = sizeof (struct bar);
iov[2].iov_base = my_baz;
iov[2].iov_len = sizeof (struct baz);

bytes_written = writev (fd, iov, 3);

Sources:

来源:

  1. http://pubs.opengroup.org/onlinepubs/009604499/functions/writev.html
  2. http://pubs.opengroup.org/onlinepubs/009604499/functions/writev.html
  3. http://linux.die.net/man/2/readv
  4. http://linux.die.net/man/2/readv

#1


86  

The main convenience offered by readv, writev is:

readv, writev提供的主要便利是:

  1. It allows working with non contiguous blocks of data. i.e. buffers need not be part of an array, but separately allocated.
  2. 它允许处理非连续的数据块。例如,缓冲区不需要是数组的一部分,而是单独分配的。
  3. The I/O is 'atomic'. i.e. If you do a writev, all the elements in the vector will be written in one contiguous operation, and writes done by other processes will not occur in between them.
  4. I / O是“原子”。例如,如果您执行一个writev,那么向量中的所有元素都将在一个连续的操作中被写入,而由其他进程执行的写入将不会在它们之间发生。

e.g. say, your data is naturally segmented, and comes from different sources:

例如,你的数据是自然分割的,来自不同的来源:

struct foo *my_foo;
struct bar *my_bar;
struct baz *my_baz;

my_foo = get_my_foo();
my_bar = get_my_bar();
my_baz = get_my_baz();

Now, all three 'buffers' are not one big contiguous block. But you want to write them contiguously into a file, for whatever reason (say for example, they are fields in a file header for a file format).

现在,这三个“缓冲区”都不是一个大的连续块。但是,无论出于什么原因,您希望将它们连续地写入文件(例如,它们是文件格式的文件头中的字段)。

If you use write you have to choose between:

如果你使用书面形式,你必须在以下两者之间做出选择:

  1. Copying them over into one block of memory using, say, memcpy (overhead), followed by a single write call. Then the write will be atomic.
  2. 使用memcpy(开销)将它们复制到一个内存块中,然后进行一次写入调用。那么写入将是原子的。
  3. Making three separate calls to write (overhead). Also, write calls from other processes can intersperse between these writes (not atomic).
  4. 写三个独立的调用(开销)。另外,来自其他进程的写调用可以在这些写(不是原子)之间穿插。

If you use writev instead, its all good:

如果你用writev代替,一切都好:

  1. You make exactly one system call, and no memcpy to make a single buffer from the three.
  2. 您只执行一个系统调用,而没有memcpy从这三个调用中创建一个缓冲区。
  3. Also, the three buffers are written atomically, as one block write. i.e. if other processes also write, then these writes will not come in between the writes of the three vectors.
  4. 另外,三个缓冲区用原子方式编写,就像一个块写的那样。例如,如果其他进程也写,那么这些写将不会出现在三个向量的写之间。

So you would do something like:

所以你会这样做:

struct iovec iov[3];

iov[0].iov_base = my_foo;
iov[0].iov_len = sizeof (struct foo);
iov[1].iov_base = my_bar;
iov[1].iov_len = sizeof (struct bar);
iov[2].iov_base = my_baz;
iov[2].iov_len = sizeof (struct baz);

bytes_written = writev (fd, iov, 3);

Sources:

来源:

  1. http://pubs.opengroup.org/onlinepubs/009604499/functions/writev.html
  2. http://pubs.opengroup.org/onlinepubs/009604499/functions/writev.html
  3. http://linux.die.net/man/2/readv
  4. http://linux.die.net/man/2/readv