Caching in Network File Systems
OSR Staff | Published: 09-May-03| Modified: 09-May-03
网络文件系统的cache机制
Typically, it is possible (and quite easy) for a file system filter driver to determine the caching policy of a local file system such as NTFS or FAT by simply examining the state of the I/O Request Packet (IRP). The IRP_NOCACHE bit in the Flags field will tell the file system (and, of course, the filter) that the file I/O in question is not to be cached. Normally, this is the clue to the file system driver that this data should not be cached.
对于文件系统过滤驱动程序来说,确定本地文件系统NTFS或者FAT的cache策略是相当容易的,只要检查一下I/O请求包(IRP)的状态就可以了。如果发现IRP的flag域带有IRP_NOCACHE标志,就说明文件系统不允许文件cache。
Network file systems are a bit more complex than this. While they also use the IRP_NOCACHE bit, they may also need to disable caching as a result of their own internal policy - perhaps directed by the state of the remote file on the file s
erver, as well as other clients in the network that might be using the file. The rdbss.sys, which implements part of the "mini redirector" model allows the redirector (for example mrxsmb.sys, which is the driver that implements CIFS or Lan
Manager functionality in Windows 2000 and more recent) to change the caching policy on a per-file basis. In this case, a normal IRP_MJ_READ IRP, which would normally be cached, may be treated as non-cached.
而网络文件系统在cache策略方面则有点复杂。虽然它们也使用IPR_NOCACHE标志,它们也需要在它们的内部策略当中禁止使用文件cache,这些内部的cache策略是由文件服务器上的远程文件的状态决定的,而网络上其它的客户端都会使用这些文件。rdbss.sys驱动程序实现了被称为“mini redirector”模型的一部分功能,它能够允许每一个重定向器(redirector)基于每一个文件来改变其缓冲策略。在这种情况下,一个普通的IRP_MJ_READ类型的IRP可能被当作可被缓冲的,也可以被当作非缓冲来处理。
For a filter driver that is modifying the data, the usual technique is to look for and operate on non-cached I/O operations. This will capture both paging I/O operations as well as user level non-cached I/O operations. However, if the filter wishes to also filter any of the mini-redirectors (there are two shipped in Windows XP for example) it needs to look at the fields of the File Control Block (FCB).
对于一个正在修改文件读数据的过滤驱动程序来说(例如对文件内容做透明加解密的驱动程序),通常是通过检查和拦截非缓冲I/O请求来实现自己的功能。这样它们就会捕获分页I/O操作或者用户层的非缓冲I/O操作。但是,如果过滤驱动程序如果也想拦截mini-redirector的话,它就必须检查文件控制块(FCB)的相关域。
For most file systems, the format of this structure is mostly under the control of the file system (except for the common header structure) but for mini-redirectors the format of the file control block is defined by the mini-redirector mode
l. See mrxfcb.h in the IFS Kit for the full definition. The key data structure here (for a filter) is the MRX_FCB. The FcbState field will indicate if the current state of the file is cached or non-cached. If the file allows caching the
FCB_STATE_READCACHING_ENABLED bit will be set. Otherwise, I/O to the given file will be treated as non-cached.
对于大多数文件系统来说,FCB结构的格式主要由文件系统来决定,除了通用的头结构以外,但是mini-redirector的FCB结构的格式由mini-redirector模式定义,完整的FCB定义可以参考mrxfcb.h。对过滤驱动程序来说,最关键的数据结构是MRX_FCB。该结构的FcbState域描述了该文件是否需要cache或者非cache。如果文件可以被cache,则FCB_STATE_READCACHING_ENABLED标志将置位。否则,对于指定的文件将被视为非cached。值得注意的是在Windows Server 2003 IFS Kit中,这个标志的拼写已经发生了变化,现在的拼写是FCB_STATE_READCACHING_ENABLED。
Note: In the Windows Server 2003 IFS Kit the spelling of this flag has been chan
ged so that it is now FCB_STATE_READCACHING_ENABLED.
While this allows a filter to determine the current state of the file, there does not appear to be any simple way for a filter to ensure that the state of this field does not change between the time the filter checks it and the time the cal
l is actually processed by the file system. Thus, it is possible that the file state might change to disallow caching after this check is made. Similarly, if the check is done after the I/O has been processed, it is possible the file stat
e might change to indicate that caching is now allowed once again. Sample code for this can be seen in the IFS Kit (see smbmrx/wnet/sys/openclos.c) to demonstrate one potential implementation model.
通过检查FCB结构的FcbStatus域的状态,允许过滤驱动程序检查当前文件的状态,但是却没有什么有效而简单的方法确保过滤驱动程序检查时的状态与文件系统真正处理这个文件之间该文件状态保持不变。因此这样的情况很可能发生-文件过滤驱动程序检查文件状态时,文件是允许cache的,但是真正处理的时候变为不允许cache了。相反的情况也同样会出现。详细的示例请看IFS的例子代码(smbmrx/wnet/sys/openclos.c)。
To prevent the state from changing, the caller must acquire the FCB resource; in order to avoid deadlocks while calling the redirector, it must be owned exclusive (using the ERESOURCE in the FCB itself). Again, to do this requies relying u
pon the implementation and published interface available in the IFS Kit.
为了防止这种状态的变化,调用者必须获取FCB的资源。为了避免调用redirector其间出现死锁现象,调用者必须排他的拥有FCB资源。
Note: this synchronization is only needed for user level cached requests, since paging I/O or user level non-cached requests will already not be cached as a matter of course. This is important because this lock cannot be safely acquired when processing paging I/O - this would violate the existing lock hierarchy and introduce the possibility of deadlocks.
Eventually I figured out this was because network redirectors like to set an internal flag called SRVOPEN_FLAG_DONTUSE_WRITE_CACHEING when a file is opened for write-only, which causes the redirector to send all writes across the network as soon as it gets them, bypass the NT cache. This means any layered filter will see the ordinary write request, but never a corresponding paging-I/O request. To get around this, my filter now has to forcibly turn every write-only network file open into aread/write open.
当一个文件以只写(write-only)方式打开时,网络重定向器会设置一个内部标志-SRVOPEN_FLAG_DONTUSE_WRITE_CACHEING,这将导致重定向器发送所有的写请求到网络文件服务器上,绕过了NT cache机制。这意味着所有的分层过滤驱动程序只能看到普通的写请求,但是不会看到任何对应的分页I/O请求。为了能够过滤网络文件的读写分页请求,我的过滤驱动程序不得不强迫将所有的以只写(write-only)方式打开网络文件变为读写方式打开。
下面是如何将write-only打开的文件转换为读写打开的文件的方式:
The reason why I'm ranting in public is that seems that I can never know a-priori whether I will see a read or write request as both paging and non-paging I/O, or one or the other, for a given filesystem. Instead, I must special case my code for each filesystem and pray that I've covered every scenario that can result in my not handling a read/write or
handling it twice. The only alternative I can come up with is to force ALL reads/writes to a filtered file to be non-cached, with the corresponding performance penalties. Is there an elegant way out of this mess?
I think I sent a message about this before. The long and the short of it is that you CANNOT force the redirector to cache writes for files that are open write-only. You must instead tweak the file permissions to 'convert' a write-only open of a network file into a read/write open. Use code like the following to do so:
if ((0 == (desiredAccess & (FILE_EXECUTE |FILE_READ_DATA))) && (0 != (desiredAccess & (FILE_WRITE_DATA |
FILE_APPEND_DATA))))
{
pIrpSp->Parameters.Create.SecurityContext->DesiredAccess |= FILE_READ_DATA;
}
下面解释了为什么write-only类型的文件不能执行cache
write only handle in redirector doesn't cache
--------------------------------------------------------------------------------
I got bitten by this a few months ago, so here is my take on the situation...
This is because the NT cache needs read access to the file in order to work. The cache works on page-sized chunks. If you open a file and write 1 byte at location 0, and caching is enabled, the NT cache will page-in the memory page representing the first 4096 bytes of the file, which requires it to issue a paging I/O read for the first 4096 bytes. Then it will paste in your new byte and mark the page dirty, which causes the page to be written out later by the lazy writer.
For local files, paging I/O is allowed to bypass all security checks (since only trusted kernel components can issue paging I/O requests), so that paging reads are allowed on all opens for all files. For network files, there is no reason the remote PC should 'trust' your PC and grant it read access if you only have write-access to the file. Therefore the NT cache cannot be used on write-only remote files.