在Linux中读取没有磁盘缓存的文件

时间:2021-05-31 21:43:30

I have a C program that runs only weekly, and reads a large amount of files only once. Since Linux also caches everything that's read, they fill up the cache needlessly and this slows down the system a lot unless it has an SSD drive.

我有一个只运行一周的C程序,只读取大量文件一次。由于Linux还会缓存所有已读取的内容,因此它们会不必要地填充缓存,除非它具有SSD驱动器,否则会大大减慢系统速度。

So how do I open and read from a file without filling up the disk cache?

那么如何在不填满磁盘缓存的情况下打开和读取文件呢?

Note:

注意:

By disk caching I mean that when you read a file twice, the second time it's read from RAM, not from disk. I.e. data once read from the disk is left in RAM, so subsequent reads of the same file will not need to reread the data from disk.

通过磁盘缓存我的意思是当你读取文件两次时,第二次从RAM读取而不是从磁盘读取。即从磁盘读取的数据留在RAM中,因此后续读取同一文件不需要重新读取磁盘中的数据。

2 个解决方案

#1


6  

You can use posix_fadvise() with the POSIX_FADV_DONTNEED advice to request that the system free the pages you've already read.

您可以将posix_fadvise()与POSIX_FADV_DONTNEED建议一起使用,以请求系统释放您已经读过的页面。

#2


7  

I believe passing O_DIRECT to open() should help:

我相信将O_DIRECT传递给open()应该会有所帮助:

O_DIRECT (Since Linux 2.4.10)

O_DIRECT(自Linux 2.4.10起)

Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The O_DIRECT flag on its own makes at an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC that data and necessary metadata are transferred. To guarantee synchronous I/O the O_SYNC must be used in addition to O_DIRECT.

尝试最小化I / O与此文件之间的缓存效果。通常,这会降低性能,但在特殊情况下很有用,例如应用程序执行自己的缓存时。文件I / O直接进出用户空间缓冲区。 O_DIRECT标志本身用于同步传输数据,但不保证O_SYNC传输数据和必要的元数据。为了保证同步I / O,除O_DIRECT外还必须使用O_SYNC。

There are further detailed notes on O_DIRECT towards the bottom of the man page, including a fun quote from Linus.

关于手册页底部的O_DIRECT还有更详细的说明,包括来自Linus的有趣引用。

#1


6  

You can use posix_fadvise() with the POSIX_FADV_DONTNEED advice to request that the system free the pages you've already read.

您可以将posix_fadvise()与POSIX_FADV_DONTNEED建议一起使用,以请求系统释放您已经读过的页面。

#2


7  

I believe passing O_DIRECT to open() should help:

我相信将O_DIRECT传递给open()应该会有所帮助:

O_DIRECT (Since Linux 2.4.10)

O_DIRECT(自Linux 2.4.10起)

Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The O_DIRECT flag on its own makes at an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC that data and necessary metadata are transferred. To guarantee synchronous I/O the O_SYNC must be used in addition to O_DIRECT.

尝试最小化I / O与此文件之间的缓存效果。通常,这会降低性能,但在特殊情况下很有用,例如应用程序执行自己的缓存时。文件I / O直接进出用户空间缓冲区。 O_DIRECT标志本身用于同步传输数据,但不保证O_SYNC传输数据和必要的元数据。为了保证同步I / O,除O_DIRECT外还必须使用O_SYNC。

There are further detailed notes on O_DIRECT towards the bottom of the man page, including a fun quote from Linus.

关于手册页底部的O_DIRECT还有更详细的说明,包括来自Linus的有趣引用。