Linux文件系统是否有效地缓存文件?

时间:2021-07-05 20:39:09

I'm creating a web application running on a Linux server. The application is constantly accessing a 250K file - it loads it in memory, reads it and sends back some info to the user. Since this file is read all the time, my client is suggesting to use something like memcache to cache it to memory, presumably because it will make read operations faster.

我正在创建一个在Linux服务器上运行的Web应用程序。应用程序不断访问250K文件 - 它将其加载到内存中,读取并将一些信息发送回用户。由于这个文件一直被读取,我的客户端建议使用像memcache这样的东西将它缓存到内存中,大概是因为它会使读取操作更快。

However, I'm thinking that the Linux filesystem is probably already caching the file in memory since it's accessed frequently. Is that right? In your opinion, would memcache provide a real improvement? Or is it going to do the same thing that Linux is already doing?

但是,我认为Linux文件系统可能已经将文件缓存在内存中,因为它经常被访问。是对的吗?在您看来,memcache会提供真正的改进吗?或者它会做与Linux已经做的相同的事情?

I'm not really familiar with neither Linux nor memcache, so I would really appreciate if someone could clarify this.

我对Linux和memcache都不熟悉,所以如果有人能澄清这一点,我真的很感激。

5 个解决方案

#1


18  

Yes, if you do not modify the file each time you open it.

是的,如果每次打开文件时都不修改文件。

Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).

Linux将把文件的信息保存在内存中的写时复制页面上,并且将文件“加载”到内存中应该非常快(最坏的页表交换)。

Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.

编辑:虽然,正如cdhowie指出的那样,没有'linux filesystem'。但是,我相信相关代码是在linux的内存管理中,因此独立于所讨论的文件系统。如果您很好奇,可以在linux源代码中阅读有关在linux / mm / mmap.c中处理vm_area_struct对象的内容。

#2


3  

As people have mentioned, mmap is a good solution here.

正如人们所提到的,mmap是一个很好的解决方案。

But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.

但是,一个250k文件非常小。您可能希望将其读入并将其放入某种内存结构中,该结构与您希望在启动时发送回用户的内容相匹配。即,如果是文本文件,则行数组可能是一个不错的选择,等等。

#3


2  

Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.

当然是。它会无限期地将访问的文件保存在内存中,除非其他东西需要内存。

You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.

您可以使用fadvise系统调用来控制此行为(在某种程度上)。有关详细信息,请参阅其“man”页面。

A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.

读/写系统调用通常仍然需要复制数据,因此如果您看到真正的瓶颈,请考虑使用mmap(),通过将缓存页直接映射到进程中来避免复制。

#4


2  

The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.

该文件应该被缓存,但要确保在mount上设置了noatime选项,否则访问时间将尝试保存到文件中,从而使缓存无效。

#5


1  

I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.

我想将该文件放入ramdisk(tmpfs)可以获得足够的优势而无需大的修改。除非你真的认真对待以微秒为单位的响应时间。

#1


18  

Yes, if you do not modify the file each time you open it.

是的,如果每次打开文件时都不修改文件。

Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).

Linux将把文件的信息保存在内存中的写时复制页面上,并且将文件“加载”到内存中应该非常快(最坏的页表交换)。

Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.

编辑:虽然,正如cdhowie指出的那样,没有'linux filesystem'。但是,我相信相关代码是在linux的内存管理中,因此独立于所讨论的文件系统。如果您很好奇,可以在linux源代码中阅读有关在linux / mm / mmap.c中处理vm_area_struct对象的内容。

#2


3  

As people have mentioned, mmap is a good solution here.

正如人们所提到的,mmap是一个很好的解决方案。

But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.

但是,一个250k文件非常小。您可能希望将其读入并将其放入某种内存结构中,该结构与您希望在启动时发送回用户的内容相匹配。即,如果是文本文件,则行数组可能是一个不错的选择,等等。

#3


2  

Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.

当然是。它会无限期地将访问的文件保存在内存中,除非其他东西需要内存。

You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.

您可以使用fadvise系统调用来控制此行为(在某种程度上)。有关详细信息,请参阅其“man”页面。

A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.

读/写系统调用通常仍然需要复制数据,因此如果您看到真正的瓶颈,请考虑使用mmap(),通过将缓存页直接映射到进程中来避免复制。

#4


2  

The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.

该文件应该被缓存,但要确保在mount上设置了noatime选项,否则访问时间将尝试保存到文件中,从而使缓存无效。

#5


1  

I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.

我想将该文件放入ramdisk(tmpfs)可以获得足够的优势而无需大的修改。除非你真的认真对待以微秒为单位的响应时间。