I have the following questions related to handling files and mapping them (mmap
):
我有以下关于处理文件和映射文件的问题(mmap):
- We know that if we create a file, and write to that file, then either ways we are writing to the memory. Then why map that file to memory using
mmap
and then write? - 我们知道,如果我们创建一个文件,并写入到该文件,那么我们写入到内存的两种方式。那么,为什么要使用mmap将文件映射到内存,然后写入呢?
- If it is because of protection that we are achieving using
mmap
-PROT_NONE
,PROT_READ
,PROT_WRITE
, then the same level of protection can also be achieved using files.O_RDONLY
,O_RDWR
etc. Then whymmap
? - 如果是由于使用mmap - PROT_NONE、PROT_READ、PROT_WRITE实现的保护,那么也可以使用文件实现相同级别的保护。那么为什么是mmap呢?
- Is there any special advantage we get on mapping files to memory, and then using it? Rather than just creating a file and writing to it?
- 我们将文件映射到内存,然后使用它,有什么特别的好处吗?而不是创建一个文件并对其进行写入?
- Finally, suppose we
mmap
a file to memory, if we write to that memory location returned by mmap, does it also simultaneously write to that file as well? - 最后,假设我们将一个文件映射到内存,如果我们将它写入mmap返回的内存位置,它是否也同时写入该文件?
Kindly help me to reply to all the queries.
请帮我回复所有的查询。
Thanks a lot in advance.
非常感谢。
*EDIT: Sharing files between threads *
*编辑:在线程之间共享文件*
As far as i know, if we share a file between two threads (not process) then it is advisable to mmap
it into memory and then use it, rather than directly using the file.
据我所知,如果我们在两个线程(不是进程)之间共享一个文件,那么最好将它映射到内存中,然后使用它,而不是直接使用这个文件。
But we know that using a file means, it is surely in main memory, then why again the threads needs to be mmaped?
但是我们知道,使用文件意味着,它肯定在主内存中,那么为什么还要对线程进行mmaped呢?
4 个解决方案
#1
14
A memory mapped file is actually partially or wholly mapped in memory (RAM), whereas a file you write to would be written to memory and then flushed to disk. A memory mapped file is taken from disk and placed into memory explicitly for reading and/or writing. It stays there until you unmap it.
内存映射文件实际上是在内存(RAM)中部分或全部映射的,而您写入的文件将被写入内存,然后刷新到磁盘。一个内存映射文件从磁盘取出并显式地放入内存中进行读写。它会一直呆在那里直到你把它拆开。
Access to disk is slower, so when you've written to a file, it will be flushed to disk and no longer reside in RAM, which means, that next time you need the file, you might be going to get it from disk (slow), whereas in memory mapped files, you know the file is in RAM and you can have faster access to it then when it's on disk.
访问磁盘是慢,所以当你写一个文件,它将被刷新到磁盘,不再驻留在RAM中,这意味着,,下次你需要的文件,你可能会得到它从磁盘(缓慢),而在内存映射文件,你知道文件在RAM中,然后您可以更快的访问磁盘上。
Also, mememory mapped files are often used as an IPC mechanism, so two or more processes can easily share the same file and read/write to it. (using necessary sycnh mechanisms)
此外,mememory映射文件经常被用作IPC机制,因此两个或多个进程可以轻松地共享同一个文件并对其进行读写。(使用必要的sycnh机制)
When you need to read a file often, and this file is quite large, it can be advantageous to map it into memory so that you have faster access to it then having to go open it and get it from disk each time.
当您需要经常读取一个文件,并且这个文件相当大时,最好将它映射到内存中,这样您就可以更快地访问它,然后每次都必须打开它并从磁盘获取它。
EDIT:
编辑:
That depends on your needs, when you have a file that will need to be accessed very frequently by different threads, then I'm not sure that memory mapping the file will necessarily be a good idea, from the view that, you'll need to synch access to this mmap'ed file if you wish it write to it, in the same places from different threads. If that happens very often, it could be a spot for resource contention.
这取决于你的需要,当你有一个文件,需要经常访问不同的线程,然后我不知道内存映射文件一定会是一个好主意,从视图,您将需要同步访问这个“mmap”文件如果你希望它写,从不同的线程在相同的地方。如果这种情况经常发生,它可能是资源争用的地方。
Just reading from the file, then this might be a good solution, cause you don't really need to synch access, if you're only reading from it from multiple threads. The moment you start writing, you do have to use synch mechanisms.
从文件中读取数据,这可能是一个很好的解决方案,因为如果您只是从多个线程读取数据,那么就不需要同步访问。当你开始写作时,你必须使用同步机制。
I suggest, that you have each thread do it's own file access in a thread local way, if you have to write to the file, just like you do with any other file. In this way it reduces the need for thread synchronization and the likelyhood of bugs hard to find and debug.
我建议,您让每个线程以线程本地的方式进行自己的文件访问,如果您必须写入文件,就像您对任何其他文件所做的那样。通过这种方式,它减少了对线程同步的需要,减少了很难找到和调试的bug。
#2
2
1) You misunderstand the write(2) system call. write() does not write, it just copies a buffer-contents to the OS buffer chain and marks it as dirty. One of the OS threads (bdflush IIRC) will pick up these buffers, write them to disk and fiddle with some flags. later. With mmap, you directly access the OS buffer (but if you alter it's contents, it will be marked dirty, too)
1)您误解了write(2)系统调用。write()不是write,它只是将缓冲内容复制到OS缓冲区链并将其标记为dirty。其中一个操作系统线程(bdflush IIRC)将提取这些缓冲区,将它们写到磁盘上,并处理一些标记。以后。使用mmap,您可以直接访问OS缓冲区(但是如果您修改它的内容,它也会被标记为脏的)
2) This is not about protection, It is about setting flags in the pagetable entries.
这不是关于保护,而是关于在可分页条目中设置标志。
3) you avoid double buffering. Also you can address the file in terms of characters instead of blocks, which sometimes is more practical
3)避免双缓冲。您还可以使用字符而不是块来处理文件,这有时更实用
4) It's the system buffers (hooked into your address space) you have been using. The system may or may not have written parts of it to disk.
它是你一直在使用的系统缓冲(连接到你的地址空间)。系统可能将其部分写入磁盘,也可能没有。
5) If threads belong to the same process and share the pagetables and address-space, yes.
5)如果线程属于同一个进程,并且共享分页符和地址空间,那么可以。
#3
1
-
One reason may be that you have (legacy) code that is set up to write into a data buffer, and then this buffer is written to file in one go at the end. In this case using
mmap
will save at least one copy of the data, as the OS can directly write the buffer to disk. As long as it is about file writing only, I can not (yet) imagine any other reasons why you'd want to usemmap
.一个原因可能是您有(遗留的)代码被设置为写入数据缓冲区,然后这个缓冲区在最后一次性写入文件。在这种情况下,使用mmap将保存至少一个数据副本,因为操作系统可以直接将缓冲区写到磁盘。只要它只是关于文件编写,我就不能(还)想象为什么您想要使用mmap。
-
No, the protection is not relevant here I'd say.
不,我得说,保护在这里没有关系。
-
It might save one or two copies of the data from e.g. app buffer to libc buffer to OS buffer, see point 1. This might make a performance difference when writing large amounts of data.
它可能会将数据的一两个副本从应用程序缓冲区保存到libc缓冲区到OS缓冲区,参见第1点。当编写大量数据时,这可能会对性能产生影响。
-
No. As far as I know, the OS is free to write the data at any time it likes, as long as the data has been written to disk after a call to
msync
ormunmap
on that memory region. (And for most files it will likely not write anything in between the majority of the time, for performce reasons: writing a whole block to disk because one byte changed is rather expensive, in particular if it is to be expected that a lot more modifications to the block will happen in the near future.)不。据我所知,只要在调用msync或munmap之后将数据写到磁盘上,操作系统就可以随时编写数据。(和大多数文件可能不会写任何东西在绝大多数时候,原因performce:写一个整体块磁盘,因为一个字节变化相当昂贵,尤其是如果它是可以预料到的,很多修改块会在不久的将来发生。)
#4
0
In most cases you should consider memory mapped file as memory that you work with. You should care only about special cases like sync with disc. It's the same kind of storage as memory but it can be initialized from file and stored to file whenever you need.
在大多数情况下,您应该将内存映射文件视为使用的内存。你应该只关心特殊的情况,如与磁盘同步。它与内存是相同的存储类型,但可以从文件初始化,并在需要时存储到文件中。
#1
14
A memory mapped file is actually partially or wholly mapped in memory (RAM), whereas a file you write to would be written to memory and then flushed to disk. A memory mapped file is taken from disk and placed into memory explicitly for reading and/or writing. It stays there until you unmap it.
内存映射文件实际上是在内存(RAM)中部分或全部映射的,而您写入的文件将被写入内存,然后刷新到磁盘。一个内存映射文件从磁盘取出并显式地放入内存中进行读写。它会一直呆在那里直到你把它拆开。
Access to disk is slower, so when you've written to a file, it will be flushed to disk and no longer reside in RAM, which means, that next time you need the file, you might be going to get it from disk (slow), whereas in memory mapped files, you know the file is in RAM and you can have faster access to it then when it's on disk.
访问磁盘是慢,所以当你写一个文件,它将被刷新到磁盘,不再驻留在RAM中,这意味着,,下次你需要的文件,你可能会得到它从磁盘(缓慢),而在内存映射文件,你知道文件在RAM中,然后您可以更快的访问磁盘上。
Also, mememory mapped files are often used as an IPC mechanism, so two or more processes can easily share the same file and read/write to it. (using necessary sycnh mechanisms)
此外,mememory映射文件经常被用作IPC机制,因此两个或多个进程可以轻松地共享同一个文件并对其进行读写。(使用必要的sycnh机制)
When you need to read a file often, and this file is quite large, it can be advantageous to map it into memory so that you have faster access to it then having to go open it and get it from disk each time.
当您需要经常读取一个文件,并且这个文件相当大时,最好将它映射到内存中,这样您就可以更快地访问它,然后每次都必须打开它并从磁盘获取它。
EDIT:
编辑:
That depends on your needs, when you have a file that will need to be accessed very frequently by different threads, then I'm not sure that memory mapping the file will necessarily be a good idea, from the view that, you'll need to synch access to this mmap'ed file if you wish it write to it, in the same places from different threads. If that happens very often, it could be a spot for resource contention.
这取决于你的需要,当你有一个文件,需要经常访问不同的线程,然后我不知道内存映射文件一定会是一个好主意,从视图,您将需要同步访问这个“mmap”文件如果你希望它写,从不同的线程在相同的地方。如果这种情况经常发生,它可能是资源争用的地方。
Just reading from the file, then this might be a good solution, cause you don't really need to synch access, if you're only reading from it from multiple threads. The moment you start writing, you do have to use synch mechanisms.
从文件中读取数据,这可能是一个很好的解决方案,因为如果您只是从多个线程读取数据,那么就不需要同步访问。当你开始写作时,你必须使用同步机制。
I suggest, that you have each thread do it's own file access in a thread local way, if you have to write to the file, just like you do with any other file. In this way it reduces the need for thread synchronization and the likelyhood of bugs hard to find and debug.
我建议,您让每个线程以线程本地的方式进行自己的文件访问,如果您必须写入文件,就像您对任何其他文件所做的那样。通过这种方式,它减少了对线程同步的需要,减少了很难找到和调试的bug。
#2
2
1) You misunderstand the write(2) system call. write() does not write, it just copies a buffer-contents to the OS buffer chain and marks it as dirty. One of the OS threads (bdflush IIRC) will pick up these buffers, write them to disk and fiddle with some flags. later. With mmap, you directly access the OS buffer (but if you alter it's contents, it will be marked dirty, too)
1)您误解了write(2)系统调用。write()不是write,它只是将缓冲内容复制到OS缓冲区链并将其标记为dirty。其中一个操作系统线程(bdflush IIRC)将提取这些缓冲区,将它们写到磁盘上,并处理一些标记。以后。使用mmap,您可以直接访问OS缓冲区(但是如果您修改它的内容,它也会被标记为脏的)
2) This is not about protection, It is about setting flags in the pagetable entries.
这不是关于保护,而是关于在可分页条目中设置标志。
3) you avoid double buffering. Also you can address the file in terms of characters instead of blocks, which sometimes is more practical
3)避免双缓冲。您还可以使用字符而不是块来处理文件,这有时更实用
4) It's the system buffers (hooked into your address space) you have been using. The system may or may not have written parts of it to disk.
它是你一直在使用的系统缓冲(连接到你的地址空间)。系统可能将其部分写入磁盘,也可能没有。
5) If threads belong to the same process and share the pagetables and address-space, yes.
5)如果线程属于同一个进程,并且共享分页符和地址空间,那么可以。
#3
1
-
One reason may be that you have (legacy) code that is set up to write into a data buffer, and then this buffer is written to file in one go at the end. In this case using
mmap
will save at least one copy of the data, as the OS can directly write the buffer to disk. As long as it is about file writing only, I can not (yet) imagine any other reasons why you'd want to usemmap
.一个原因可能是您有(遗留的)代码被设置为写入数据缓冲区,然后这个缓冲区在最后一次性写入文件。在这种情况下,使用mmap将保存至少一个数据副本,因为操作系统可以直接将缓冲区写到磁盘。只要它只是关于文件编写,我就不能(还)想象为什么您想要使用mmap。
-
No, the protection is not relevant here I'd say.
不,我得说,保护在这里没有关系。
-
It might save one or two copies of the data from e.g. app buffer to libc buffer to OS buffer, see point 1. This might make a performance difference when writing large amounts of data.
它可能会将数据的一两个副本从应用程序缓冲区保存到libc缓冲区到OS缓冲区,参见第1点。当编写大量数据时,这可能会对性能产生影响。
-
No. As far as I know, the OS is free to write the data at any time it likes, as long as the data has been written to disk after a call to
msync
ormunmap
on that memory region. (And for most files it will likely not write anything in between the majority of the time, for performce reasons: writing a whole block to disk because one byte changed is rather expensive, in particular if it is to be expected that a lot more modifications to the block will happen in the near future.)不。据我所知,只要在调用msync或munmap之后将数据写到磁盘上,操作系统就可以随时编写数据。(和大多数文件可能不会写任何东西在绝大多数时候,原因performce:写一个整体块磁盘,因为一个字节变化相当昂贵,尤其是如果它是可以预料到的,很多修改块会在不久的将来发生。)
#4
0
In most cases you should consider memory mapped file as memory that you work with. You should care only about special cases like sync with disc. It's the same kind of storage as memory but it can be initialized from file and stored to file whenever you need.
在大多数情况下,您应该将内存映射文件视为使用的内存。你应该只关心特殊的情况,如与磁盘同步。它与内存是相同的存储类型,但可以从文件初始化,并在需要时存储到文件中。