I know splice() is designed for zero copy and used Linux kernel pipe buffer to achieve that. For example if I wanted to copy data from one file descriptor(fp1) to another file descriptor(fp2), it didn't need to copy data from "kernel space->user space->kernel space". Instead it just copy data in kernel space the flow will be like "fp1 -> pipe_read -> pipe_write -> fp2". And my question is that dose kernel need to copy data between "fp1 -> pipe_read" and "pipe_write -> fp2"?
我知道splice()是为零拷贝而设计的,并使用Linux内核管道缓冲区来实现这一点。例如,如果我想将数据从一个文件描述符(fp1)复制到另一个文件描述符(fp2),它不需要从“内核空间—>用户空间—>内核空间”中复制数据。相反,它只是在内核空间中复制数据,流将类似于“fp1 -> pipe_read -> pipe_write -> fp2”。我的问题是内核需要在"fp1 -> pipe_read"和"pipe_write -> fp2"之间复制数据吗?
The Wikipedia said that:
*说:
Ideally, splice and vmsplice work by remapping pages and do not actually copy any data,
which may improve I/O performance. As linear addresses do not necessarily correspond to
contiguous physical addresses, this may not be possible in all cases and on all hardware
combinations.
I have already traced kernel source(3.12) for my question and I found that the flow between "fp1->write_pipe", in the end it would called kernel_readv() in fs/splice.c and then called "do_readv_writev()" and finally called "aio_write()"
我已经跟踪了内核源代码(3.12),我发现在“fp1->write_pipe”之间的流,在最后它将被称为fs/splice中的kernel_readv()。然后调用"do_readv_writev()"最后调用"aio_write()"
558 static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
559 unsigned long vlen, loff_t offset)
//*vec would point to struct page which belong to pipe
The flow between "read_pipe -> fp2" in the end would call "__kernel_write()" and then called "fp2->f_op->write()"
最后,“read_pipe -> fp2”之间的流将调用“__kernel_write()”,然后调用“fp2->f_op->write()”
430 ssize_t __kernel_write(struct file *file, const char *buf, size_t count, loff_t *pos)
//*buf is the pipe buffer
And I thought both "aio_write()" and "file->f_op_write()" would perform really data copy, so does splice() really perform zero copy?
我认为“aio_write()”和“file->f_op_write()”都可以执行真正的数据复制,那么splice()是否真的执行了零拷贝呢?
2 个解决方案
#1
2
As I understand splice(), it will read pages of fd1 and the MMU will map these pages. The reference created by the mapping will be put into the pipe and handed over to fd2. No real data should be copied in the process, as long as every participant has DMA available. If no DMA is available you need to copy data.
按照我的理解,splice()将读取fd1的页面,MMU将映射这些页面。由映射创建的引用将被放入管道并移交给fd2。在这个过程中不应该复制任何真实的数据,只要每个参与者都有DMA可用。如果没有DMA可用,您需要复制数据。
#2
1
splice
most probably works zero-copy (there is no hard guarantee for that, but it almost certainly works that way for any reasonably recent hardware). Strictly following the docs, you would need to call it with SPLICE_F_MOVE
so no actual copies are made, but I don't see how it would need to make one either way as long as there's DMA support (which is a rather fair assumption).
splice很可能会工作于零拷贝(没有硬的保证,但是几乎可以肯定地说,对于任何最近的硬件都是这样)。严格遵循文档,您将需要使用SPLICE_F_MOVE调用它,这样就不会生成实际的副本,但是我不认为只要有DMA支持(这是一个相当合理的假设),就需要使用这两个文档。
The same is not necessarily true with vmsplice
involved since it (or a successive splice
) only works zero-copy if the SPLICE_F_GIFT
flag is provided (and in this case, I can see how it would not work otherwise, since the "source descriptor" is main memory) but this flag is broken in some and unsupported in other Linux versions, and badly documented on top.
For example, it is not clear what to do with the memory afterwards. The documentation used to say that you are not allowed to touch the gifted memory ever after, this was recently slightly reworded, but it isn't less ambiguous. It remains unclear what is to become of the memory region. Following the documentation, you would have to leak the memory. There seems to be no notification mechanism that tells you when it is safe to free the memory or reuse it.
以来就不一定涉及vmsplice它(或连续拼接)只提供零拷贝如果SPLICE_F_GIFT标志(在这种情况下,我可以看到它不会工作否则,自“源描述符”是主存)但这国旗坏了一些,在其他Linux版本不支持,和糟糕的记录。例如,不清楚以后该怎么处理这些记忆。文件上说,你从此以后就不允许触碰天赋记忆了,这是最近才稍微改写的,但它的含混不清。目前还不清楚记忆区域将变成什么样子。根据文档,您将不得不泄漏内存。似乎没有通知机制告诉您何时释放内存或重用内存是安全的。
aio_write
is the userland (Glibc) implementation of asynchronous I/O which uses threads and the write
syscall. This normally performs at least one copy from user space to kernel space.
aio_write是异步I/O的userland (Glibc)实现,它使用线程和写syscall。这通常执行从用户空间到内核空间的至少一个拷贝。
#1
2
As I understand splice(), it will read pages of fd1 and the MMU will map these pages. The reference created by the mapping will be put into the pipe and handed over to fd2. No real data should be copied in the process, as long as every participant has DMA available. If no DMA is available you need to copy data.
按照我的理解,splice()将读取fd1的页面,MMU将映射这些页面。由映射创建的引用将被放入管道并移交给fd2。在这个过程中不应该复制任何真实的数据,只要每个参与者都有DMA可用。如果没有DMA可用,您需要复制数据。
#2
1
splice
most probably works zero-copy (there is no hard guarantee for that, but it almost certainly works that way for any reasonably recent hardware). Strictly following the docs, you would need to call it with SPLICE_F_MOVE
so no actual copies are made, but I don't see how it would need to make one either way as long as there's DMA support (which is a rather fair assumption).
splice很可能会工作于零拷贝(没有硬的保证,但是几乎可以肯定地说,对于任何最近的硬件都是这样)。严格遵循文档,您将需要使用SPLICE_F_MOVE调用它,这样就不会生成实际的副本,但是我不认为只要有DMA支持(这是一个相当合理的假设),就需要使用这两个文档。
The same is not necessarily true with vmsplice
involved since it (or a successive splice
) only works zero-copy if the SPLICE_F_GIFT
flag is provided (and in this case, I can see how it would not work otherwise, since the "source descriptor" is main memory) but this flag is broken in some and unsupported in other Linux versions, and badly documented on top.
For example, it is not clear what to do with the memory afterwards. The documentation used to say that you are not allowed to touch the gifted memory ever after, this was recently slightly reworded, but it isn't less ambiguous. It remains unclear what is to become of the memory region. Following the documentation, you would have to leak the memory. There seems to be no notification mechanism that tells you when it is safe to free the memory or reuse it.
以来就不一定涉及vmsplice它(或连续拼接)只提供零拷贝如果SPLICE_F_GIFT标志(在这种情况下,我可以看到它不会工作否则,自“源描述符”是主存)但这国旗坏了一些,在其他Linux版本不支持,和糟糕的记录。例如,不清楚以后该怎么处理这些记忆。文件上说,你从此以后就不允许触碰天赋记忆了,这是最近才稍微改写的,但它的含混不清。目前还不清楚记忆区域将变成什么样子。根据文档,您将不得不泄漏内存。似乎没有通知机制告诉您何时释放内存或重用内存是安全的。
aio_write
is the userland (Glibc) implementation of asynchronous I/O which uses threads and the write
syscall. This normally performs at least one copy from user space to kernel space.
aio_write是异步I/O的userland (Glibc)实现,它使用线程和写syscall。这通常执行从用户空间到内核空间的至少一个拷贝。