Linux C中的read(2)如何工作?

时间:2022-09-14 15:20:50

According to the man page, we can specify the amount of bytes we want to read from a file descriptor.

根据手册页,我们可以指定要从文件描述符中读取的字节数。

But in the read's implementation, how many read requests will be created to perform a read?

但是在read的实现中,将创建多少读取请求来执行读取?

For example, if I want to read 4MB, will it create only one request for 4MB or will it split it into multiple small requests? such as 4KB per request?

例如,如果我想读取4MB,它是否只会创建一个4MB的请求,还是会将其拆分为多个小请求?例如每个请求4KB?

5 个解决方案

#1


  • read(2) is a system call, so it calls the vDSO shared library to dispatch the system call (in very old times it used to be an interrupt, but nowadays there are faster ways of dispatching system calls).

    read(2)是一个系统调用,因此它调用vDSO共享库来调度系统调用(在很久以前它曾经是一个中断,但现在有更快的方式来调度系统调用)。

  • inside the kernel the call is first handled by the vfs (virtual file system); the virtual file system provides a common interface for inodes (the structures that represents open files) and a common way of interfacing with the underlying file system.

    在内核中,调用首先由vfs(虚拟文件系统)处理;虚拟文件系统为inode(表示打开文件的结构)提供了一个通用接口,并提供了与底层文件系统连接的常用方法。

  • the vfs dispatches to the underlying file system (the mount(8) program will tell you which mount point exists and what file system is used there). (see here for more information http://www.inf.fu-berlin.de/lehre/SS01/OS/Lectures/Lecture16.pdf )

    vfs将调度到底层文件系统(mount(8)程序将告诉您存在哪个挂载点以及在那里使用的文件系统)。 (有关更多信息,请参见此处http://www.inf.fu-berlin.de/lehre/SS01/OS/Lectures/Lecture16.pdf)

  • the file system can do its own caching, so number of disk reads depends on what is present in the cache and how the file system allocates blocks for storage of a particular file and how the file is divided into disk blocks - all questions to the particular file system)

    文件系统可以自己进行缓存,因此磁盘读取次数取决于缓存中存在的内容以及文件系统如何分配块以存储特定文件以及文件如何分成磁盘块 - 所有问题都针对特定文件文件系统)

  • If you want to do your own caching then open the file with O_DIRECT flag; in this case there is an effort not to use the cache; however all reads have to be aligned to 512 offsets and come in multiples of 512 size (this is in order that your buffer can be transfered via DMA to the backing store http://www.quora.com/Why-does-O_DIRECT-require-I-O-to-be-512-byte-aligned )

    如果要进行自己的缓存,请使用O_DIRECT标志打开文件;在这种情况下,努力不使用缓存;但是所有读取必须与512个偏移对齐,并且是512个大小的倍数(这是为了使您的缓冲区可以通过DMA传输到后备存储http://www.quora.com/Why-does-O_DIRECT- require-IO-to-512-byte-aligned)

#2


If there is data available, read will return as much data as is immediately available and will fit in the buffer, without waiting. If there's no data available, it will wait until there is some and return what it can without waiting more.

如果有可用的数据,read将返回尽可能多的数据,并且将适合缓冲区,而无需等待。如果没有可用的数据,它将等到有一些数据并返回它可以不用等待更多的数据。

How much that is depends on what the file descriptor refers to. If it refers to a socket, that will be whatever is in the socket buffer. If it is a file, that will be whatever is in the buffer cache.

多少取决于文件描述符引用的内容。如果它引用套接字,那将是套接字缓冲区中的任何内容。如果它是一个文件,那将是缓冲区缓存中的任何内容。

#3


When you call read it only make just one request to fill the buffer size and if it couldn't to fill all the buffer (no more data or data is not arrived like in sockets) it returns the number of bytes it actually wrote in your buffer.

当你调用read时,它只会发出一个填充缓冲区大小的请求,如果它不能填充所有的缓冲区(没有更多的数据或数据没有像套接字那样到达),它会返回它实际写入的字节数。缓冲。

As the manual says:

正如手册所说:

RETURN VALUE

Upon successful completion, these functions shall return a non-negative integer indicating the number of bytes actually read. Otherwise, the functions shall return −1 and set errno to indicate the error.

成功完成后,这些函数将返回一个非负整数,表示实际读取的字节数。否则,函数应返回-1并设置errno以指示错误。

#4


It depends on how deep you go.

这取决于你走多远。

The C library just passes the size you gave it straight to the kernel in one read() system call, so at that level it's just one request.

C库只是在一次read()系统调用中将你给它的大小直接传递给内核,所以在那个级别它只是一个请求。

Inside the kernel, for an ordinary file in standard buffered mode the 4MB you requested is going to be copied from multiple pagecache pages (4kB each) which are unlikely to be contiguous. Any of the file data which isn't actually already in the pagecache is going to have to be read from disk. The file might not be stored contiguously on disk, so that 4MB could result in multiple requests to the underlying block device.

在内核中,对于标准缓冲模式下的普通文件,您请求的4MB将从多个页面缓存页面(每个4kB)复制,这些页面不太可能是连续的。实际上不在页面缓存中的任何文件数据都必须从磁盘读取。该文件可能不会连续存储在磁盘上,因此4MB可能会导致对底层块设备的多个请求。

#5


There's really no one right answer, other than however many are necessary what whatever layer the request winds up going to. Typically, a single request will be passed to the kernel. This may result in no further requests going to other layers because all the information is in memory. But if the data has to be read from, say, a software RAID, requests may have to be issued to multiple physical devices to satisfy the request.

实际上没有一个正确的答案,除了许多是必要的,无论申请结束的是什么层。通常,单个请求将传递给内核。这可能导致没有其他请求进入其他层,因为所有信息都在内存中。但是,如果必须从例如软件RAID读取数据,则可能必须向多个物理设备发出请求以满足请求。

I don't think you can really give a better answer than "whatever the implementer thought was was the best way".

我认为你不能真正给出一个比“实施者认为最好的方式”更好的答案。

#1


  • read(2) is a system call, so it calls the vDSO shared library to dispatch the system call (in very old times it used to be an interrupt, but nowadays there are faster ways of dispatching system calls).

    read(2)是一个系统调用,因此它调用vDSO共享库来调度系统调用(在很久以前它曾经是一个中断,但现在有更快的方式来调度系统调用)。

  • inside the kernel the call is first handled by the vfs (virtual file system); the virtual file system provides a common interface for inodes (the structures that represents open files) and a common way of interfacing with the underlying file system.

    在内核中,调用首先由vfs(虚拟文件系统)处理;虚拟文件系统为inode(表示打开文件的结构)提供了一个通用接口,并提供了与底层文件系统连接的常用方法。

  • the vfs dispatches to the underlying file system (the mount(8) program will tell you which mount point exists and what file system is used there). (see here for more information http://www.inf.fu-berlin.de/lehre/SS01/OS/Lectures/Lecture16.pdf )

    vfs将调度到底层文件系统(mount(8)程序将告诉您存在哪个挂载点以及在那里使用的文件系统)。 (有关更多信息,请参见此处http://www.inf.fu-berlin.de/lehre/SS01/OS/Lectures/Lecture16.pdf)

  • the file system can do its own caching, so number of disk reads depends on what is present in the cache and how the file system allocates blocks for storage of a particular file and how the file is divided into disk blocks - all questions to the particular file system)

    文件系统可以自己进行缓存,因此磁盘读取次数取决于缓存中存在的内容以及文件系统如何分配块以存储特定文件以及文件如何分成磁盘块 - 所有问题都针对特定文件文件系统)

  • If you want to do your own caching then open the file with O_DIRECT flag; in this case there is an effort not to use the cache; however all reads have to be aligned to 512 offsets and come in multiples of 512 size (this is in order that your buffer can be transfered via DMA to the backing store http://www.quora.com/Why-does-O_DIRECT-require-I-O-to-be-512-byte-aligned )

    如果要进行自己的缓存,请使用O_DIRECT标志打开文件;在这种情况下,努力不使用缓存;但是所有读取必须与512个偏移对齐,并且是512个大小的倍数(这是为了使您的缓冲区可以通过DMA传输到后备存储http://www.quora.com/Why-does-O_DIRECT- require-IO-to-512-byte-aligned)

#2


If there is data available, read will return as much data as is immediately available and will fit in the buffer, without waiting. If there's no data available, it will wait until there is some and return what it can without waiting more.

如果有可用的数据,read将返回尽可能多的数据,并且将适合缓冲区,而无需等待。如果没有可用的数据,它将等到有一些数据并返回它可以不用等待更多的数据。

How much that is depends on what the file descriptor refers to. If it refers to a socket, that will be whatever is in the socket buffer. If it is a file, that will be whatever is in the buffer cache.

多少取决于文件描述符引用的内容。如果它引用套接字,那将是套接字缓冲区中的任何内容。如果它是一个文件,那将是缓冲区缓存中的任何内容。

#3


When you call read it only make just one request to fill the buffer size and if it couldn't to fill all the buffer (no more data or data is not arrived like in sockets) it returns the number of bytes it actually wrote in your buffer.

当你调用read时,它只会发出一个填充缓冲区大小的请求,如果它不能填充所有的缓冲区(没有更多的数据或数据没有像套接字那样到达),它会返回它实际写入的字节数。缓冲。

As the manual says:

正如手册所说:

RETURN VALUE

Upon successful completion, these functions shall return a non-negative integer indicating the number of bytes actually read. Otherwise, the functions shall return −1 and set errno to indicate the error.

成功完成后,这些函数将返回一个非负整数,表示实际读取的字节数。否则,函数应返回-1并设置errno以指示错误。

#4


It depends on how deep you go.

这取决于你走多远。

The C library just passes the size you gave it straight to the kernel in one read() system call, so at that level it's just one request.

C库只是在一次read()系统调用中将你给它的大小直接传递给内核,所以在那个级别它只是一个请求。

Inside the kernel, for an ordinary file in standard buffered mode the 4MB you requested is going to be copied from multiple pagecache pages (4kB each) which are unlikely to be contiguous. Any of the file data which isn't actually already in the pagecache is going to have to be read from disk. The file might not be stored contiguously on disk, so that 4MB could result in multiple requests to the underlying block device.

在内核中,对于标准缓冲模式下的普通文件,您请求的4MB将从多个页面缓存页面(每个4kB)复制,这些页面不太可能是连续的。实际上不在页面缓存中的任何文件数据都必须从磁盘读取。该文件可能不会连续存储在磁盘上,因此4MB可能会导致对底层块设备的多个请求。

#5


There's really no one right answer, other than however many are necessary what whatever layer the request winds up going to. Typically, a single request will be passed to the kernel. This may result in no further requests going to other layers because all the information is in memory. But if the data has to be read from, say, a software RAID, requests may have to be issued to multiple physical devices to satisfy the request.

实际上没有一个正确的答案,除了许多是必要的,无论申请结束的是什么层。通常,单个请求将传递给内核。这可能导致没有其他请求进入其他层,因为所有信息都在内存中。但是,如果必须从例如软件RAID读取数据,则可能必须向多个物理设备发出请求以满足请求。

I don't think you can really give a better answer than "whatever the implementer thought was was the best way".

我认为你不能真正给出一个比“实施者认为最好的方式”更好的答案。