How can data written to a file really be flushed/synced with the block device by Java.
如何通过Java真正使用块设备刷新/同步写入文件的数据。
I tried this code with NIO:
我用NIO尝试了这段代码:
FileOutputStream s = new FileOutputStream(filename)
Channel c = s.getChannel()
while(xyz)
c.write(buffer)
c.force(true)
s.getFD().sync()
c.close()
I supposed that c.force(true) togehter with s.getFD().sync() should be sufficient because the doc for force states
我认为c.force(true)与s.getFD()同步.sync()应该足够了,因为强制状态的文档
Forces any updates to this channel's file to be written to the storage device that contains it. If this channel's file resides on a local storage device then when this method returns it is guaranteed that all changes made to the file since this channel was created, or since this method was last invoked, will have been written to that device. This is useful for ensuring that critical information is not lost in the event of a system crash.
The documentation to sync states:
同步状态的文档:
Force all system buffers to synchronize with the underlying device. This method returns after all modified data and attributes of this FileDescriptor have been written to the relevant device(s). In particular, if this FileDescriptor refers to a physical storage medium, such as a file in a file system, sync will not return until all in-memory modified copies of buffers associated with this FileDesecriptor have been written to the physical medium. sync is meant to be used by code that requires physical storage (such as a file) to be in a known state.
These two calls should be sufficient. Is it? I guess they aren't.
这两个电话应该足够了。是吗?我猜他们不是。
Background: I do a small performance comparison (2 GB, sequential write) using C/Java and the Java version is twice as fast as the C version and probably faster than the hardware (120 MB/s on a single HD). I also tried to execute the command line tool sync with Runtime.getRuntime().exec("sync") but that hasn't changed the behavior.
背景:我使用C / Java进行小的性能比较(2 GB,顺序写入),Java版本的速度是C版本的两倍,可能比硬件速度快(单个HD上的速度为120 MB / s)。我还尝试使用Runtime.getRuntime()。exec(“sync”)执行命令行工具同步,但这并没有改变行为。
The C code resulting in 70 MB/s is (using the low level APIs (open,write,close) doesn't change much):
导致70 MB / s的C代码(使用低级API(打开,写入,关闭)不会发生太大变化):
FILE* fp = fopen(filename, "w");
while(xyz) {
fwrite(buffer, 1, BLOCK_SIZE, fp);
}
fflush(fp);
fclose(fp);
sync();
Without the final call to sync; I got unrealistical values (over 1 GB aka main memory performance).
没有最后的同步调用;我得到了不切实际的价值(超过1 GB又称主内存性能)。
Why is there such a big difference between C and Java? There are two possiblities: I doesn't sync the data correctly in Java or the C code is suboptimal for some reason.
为什么C和Java之间有这么大的差异?有两种可能性:我没有在Java中正确地同步数据,或者由于某种原因C代码是次优的。
Update: I have done strace runs with "strace -cfT cmd". Here are the results:
更新:我已经使用“strace -cfT cmd”完成了strace运行。结果如下:
C (Low-Level API): MB/s 67.389782
C(低级API):MB / s 67.389782
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 87.21 0.200012 200012 1 fdatasync 11.05 0.025345 1 32772 write 1.74 0.004000 4000 1 sync
C (High-Level API): MB/s 61.796458
C(高级API):MB / s 61.796458
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 73.19 0.144009 144009 1 sync 26.81 0.052739 1 65539 write
Java (1.6 SUN JRE, java.io API): MB/s 128.6755466197537
Java(1.6 SUN JRE,java.io API):MB / s 128.6755466197537
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 80.07 105.387609 3215 32776 write 2.58 3.390060 3201 1059 read 0.62 0.815251 815251 1 fsync
Java (1.6 SUN JRE, java.nio API): MB/s 127.45830221558376
Java(1.6 SUN JRE,java.nio API):MB / s 127.45830221558376
5.52 0.980061 490031 2 fsync 1.60 0.284752 9 32774 write 0.00 0.000000 0 80 close
The time values seem to be system time only and are therefore pretty meaningless.
时间值似乎只是系统时间,因此毫无意义。
Update 2: I switched to another server, rebooted, and I use a fresh formatted ext3. Now I get only 4% differences between Java and C. I simply don't know what went wrong. Sometimes things are strange. I should have tried the measurement with another system before writing this question. Sorry.
更新2:我切换到另一台服务器,重新启动,并使用新格式化的ext3。现在我在Java和C之间只有4%的差异。我只是不知道出了什么问题。有时事情很奇怪。在写这个问题之前,我应该尝试用另一个系统进行测量。抱歉。
Update 3: To summarize the answers:
更新3:总结答案:
- Use c.force(true) followed by s.getFD().sync() for Java NIO and s.flush() and s.getFD().sync() for Java's stream API. For the High-Level API in C don't forget to sync. A fflush submitted the data to the OS, but doesn't bring your data to the block device.
- Use strace to analyze the syscalls done by a command
- Cross check your results before posting a question.
使用c.force(true)后跟s.getFD()。sync()用于Java NIO和s.flush()和s.getFD()。sync()用于Java的流API。对于C语言中的高级API,请不要忘记同步。 fflush将数据提交给操作系统,但不会将数据带到块设备。
使用strace分析命令完成的系统调用
在发布问题之前交叉检查您的结果。
Update 4: Please note the following follow-up question.
更新4:请注意以下后续问题。
5 个解决方案
#1
2
You need to tell us more about the hardware and operating system, also the specific Java version. How are you measuring this throughput?
您需要告诉我们有关硬件和操作系统的更多信息,以及特定的Java版本。你是如何衡量这个吞吐量的?
You're correct that force/sync should force the data out to the physical media.
你是正确的,强制/同步应该强制数据输出到物理媒体。
Here's a raw version of copy. Compiled with gcc 4.0 on an Intel Mac, should be clean.
这是副本的原始版本。在Intel Mac上使用gcc 4.0编译,应该是干净的。
/* rawcopy -- pure C, system calls only, copy argv[1] to argv[2] */
/* This is a test program which simply copies from file to file using
* only system calls (section 2 of the manual.)
*
* Compile:
*
* gcc -Wall -DBUFSIZ=1024 -o rawcopy rawcopy.c
*
* If DIRTY is defined, then errors are interpreted with perror(3).
* This is ifdef'd so that the CLEAN version is free of stdio. For
* convenience I'm using BUFSIZ from stdio.h; to compile CLEAN just
* use the value from your stdio.h in place of 1024 above.
*
* Compile DIRTY:
*
* gcc -DDIRTY -Wall -o rawcopy rawcopy.c
*
*/
#include <fcntl.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <stdlib.h>
#include <unistd.h>
#if defined(DIRTY)
# if defined(BUFSIZ)
# error "Don't define your own BUFSIZ when DIRTY"
# endif
# include <stdio.h>
# define PERROR perror(argv[0])
#else
# define CLEAN
# define PERROR
# if ! defined(BUFSIZ)
# error "You must define your own BUFSIZ with -DBUFSIZ=<number>"
# endif
#endif
char * buffer[BUFSIZ]; /* by definition stdio BUFSIZ should
be optimal size for read/write */
extern int errno ; /* I/O errors */
int main(int argc, char * argv[]) {
int fdi, fdo ; /* Input/output file descriptors */
ssize_t len ; /* length to read/write */
if(argc != 3){
PERROR;
exit(errno);
}
/* Open the files, returning perror errno as the exit value if fails. */
if((fdi = open(argv[1],O_RDONLY)) == -1){
PERROR;
exit(errno);
}
if((fdo = open(argv[2], O_WRONLY|O_CREAT)) == -1){
PERROR;
exit(errno);
}
/* copy BUFSIZ bytes (or total read on last block) fast as you
can. */
while((len = read(fdi, (void *) buffer, BUFSIZ)) > -1){
if(len == -1){
PERROR;
exit(errno);
}
if(write(fdo, (void*)buffer, len) == -1){
PERROR;
exit(errno);
}
}
/* close and fsync the files */
if(fsync(fdo) ==-1){
PERROR;
exit(errno);
}
if(close(fdo) == -1){
PERROR;
exit(errno);
}
if(close(fdi) == -1){
PERROR;
exit(errno);
}
/* if it survived to here, all worked. */
exit(0);
}
#2
8
Actually, in C you want to just call fsync()
on the one file descriptor, not sync()
(or the "sync" command) which signals the kernel to flush
all buffers to disk system-wide.
实际上,在C中你只想在一个文件描述符上调用fsync(),而不是sync()(或“sync”命令),它指示内核将所有缓冲区刷新到系统范围的磁盘。
If you strace
(getting Linux-specific here) the JVM you should be able to observe an fsync()
or fdatasync()
system call being made on your output file. That would be what I'd expect the getFD()
.sync()
call to do. I assume c.force(true)
simply flags to NIO that fsync()
should be called after each write. It might simply be that the JVM you're using doesn't actually implement the sync()
call?
如果您在这里使用JVM(在此处获取特定于Linux),您应该能够观察到对输出文件进行的fsync()或fdatasync()系统调用。那就是我期望getFD()。sync()调用。我假设c.force(true)只是向NIO标记每次写入后应调用fsync()。可能只是你正在使用的JVM实际上没有实现sync()调用?
I'm not sure why you weren't seeing any difference when calling "sync" as a command: but obviously, after the first sync invocation, subsequent ones are usually quite a lot faster. Again, I'd be inclined to break out strace
(truss on Solaris) as a "what's actually happening here?" tool.
我不确定为什么在将“sync”作为命令调用时没有看到任何区别:但显然,在第一次同步调用之后,后续的调用通常要快得多。再一次,我倾向于将strace(Solaris上的桁架)打破为“这里实际发生了什么?”工具。
#3
3
It is a good idea to use the synchronized I/O data integrity completion. However your C sample is using the wrong method. You use sync()
, which is used to sync the whole OS.
使用同步I / O数据完整性完成是个好主意。但是,您的C示例使用了错误的方法。您使用sync(),它用于同步整个操作系统。
If you want to write the blocks of that single file to disk, you need to use fsync(2)
or fdatasync(2)
in C. BTW: when you use buffered stdio in C (or a BufferedOutputStream or some Writer in Java) you need to flush both first before you sync.
如果要将该单个文件的块写入磁盘,则需要在C中使用fsync(2)或fdatasync(2).BTW:当您在C中使用缓冲的stdio(或者在Java中使用BufferedOutputStream或某些Writer)时在同步之前需要先刷新两者。
The fdatasync()
variant is a bit more efficient if the file has not changed name or size since you sync. But it might also not persit all the meta data. If you want to write your own transactional safe database systems, you need to observe some more stuff (like fsyncing the parent directory).
如果文件在您同步后没有更改名称或大小,则fdatasync()变体会更有效。但它也可能不会持久存在所有元数据。如果你想编写自己的事务安全数据库系统,你需要观察更多的东西(比如fsyncing父目录)。
#4
0
The C code could be suboptimal, because it uses stdio rather than raw OS write(). But then, java could be more optimal because it allocates larger buffers?
C代码可能不是最理想的,因为它使用stdio而不是原始OS write()。但是,java可能更优,因为它分配更大的缓冲区?
Anyway, you can only trust the APIDOC. The rest is beyond your duties.
无论如何,你只能信任APIDOC。其余的超出了你的职责范围。
#5
0
(I know this is a very late reply, but I ran into this thread doing a Google search, and that's probably how you ended up here too.)
(我知道这是一个非常晚的回复,但我在这个线程中遇到谷歌搜索,这可能就是你在这里结束的方式。)
Your calling sync() in Java on a single file descriptor, so only that buffers related to that one file get flushed out to disk.
您在单个文件描述符上调用Java中的sync(),因此只有与该文件相关的缓冲区才会刷新到磁盘。
In C and command-line, you're calling sync() on the entire operating system - so every file buffer gets flushed out to disk, for everything your O/S is doing.
在C和命令行中,您在整个操作系统上调用sync() - 因此每个文件缓冲区都会被刷新到磁盘上,用于O / S正在执行的所有操作。
To be comparable, the C call should be to syncfs(fp);
为了具有可比性,C调用应该是syncfs(fp);
From the Linux man page:
从Linux手册页:
sync() causes all buffered modifications to file metadata and data to
be written to the underlying file systems.
syncfs() is like sync(), but synchronizes just the file system contain‐
ing file referred to by the open file descriptor fd.
#1
2
You need to tell us more about the hardware and operating system, also the specific Java version. How are you measuring this throughput?
您需要告诉我们有关硬件和操作系统的更多信息,以及特定的Java版本。你是如何衡量这个吞吐量的?
You're correct that force/sync should force the data out to the physical media.
你是正确的,强制/同步应该强制数据输出到物理媒体。
Here's a raw version of copy. Compiled with gcc 4.0 on an Intel Mac, should be clean.
这是副本的原始版本。在Intel Mac上使用gcc 4.0编译,应该是干净的。
/* rawcopy -- pure C, system calls only, copy argv[1] to argv[2] */
/* This is a test program which simply copies from file to file using
* only system calls (section 2 of the manual.)
*
* Compile:
*
* gcc -Wall -DBUFSIZ=1024 -o rawcopy rawcopy.c
*
* If DIRTY is defined, then errors are interpreted with perror(3).
* This is ifdef'd so that the CLEAN version is free of stdio. For
* convenience I'm using BUFSIZ from stdio.h; to compile CLEAN just
* use the value from your stdio.h in place of 1024 above.
*
* Compile DIRTY:
*
* gcc -DDIRTY -Wall -o rawcopy rawcopy.c
*
*/
#include <fcntl.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <stdlib.h>
#include <unistd.h>
#if defined(DIRTY)
# if defined(BUFSIZ)
# error "Don't define your own BUFSIZ when DIRTY"
# endif
# include <stdio.h>
# define PERROR perror(argv[0])
#else
# define CLEAN
# define PERROR
# if ! defined(BUFSIZ)
# error "You must define your own BUFSIZ with -DBUFSIZ=<number>"
# endif
#endif
char * buffer[BUFSIZ]; /* by definition stdio BUFSIZ should
be optimal size for read/write */
extern int errno ; /* I/O errors */
int main(int argc, char * argv[]) {
int fdi, fdo ; /* Input/output file descriptors */
ssize_t len ; /* length to read/write */
if(argc != 3){
PERROR;
exit(errno);
}
/* Open the files, returning perror errno as the exit value if fails. */
if((fdi = open(argv[1],O_RDONLY)) == -1){
PERROR;
exit(errno);
}
if((fdo = open(argv[2], O_WRONLY|O_CREAT)) == -1){
PERROR;
exit(errno);
}
/* copy BUFSIZ bytes (or total read on last block) fast as you
can. */
while((len = read(fdi, (void *) buffer, BUFSIZ)) > -1){
if(len == -1){
PERROR;
exit(errno);
}
if(write(fdo, (void*)buffer, len) == -1){
PERROR;
exit(errno);
}
}
/* close and fsync the files */
if(fsync(fdo) ==-1){
PERROR;
exit(errno);
}
if(close(fdo) == -1){
PERROR;
exit(errno);
}
if(close(fdi) == -1){
PERROR;
exit(errno);
}
/* if it survived to here, all worked. */
exit(0);
}
#2
8
Actually, in C you want to just call fsync()
on the one file descriptor, not sync()
(or the "sync" command) which signals the kernel to flush
all buffers to disk system-wide.
实际上,在C中你只想在一个文件描述符上调用fsync(),而不是sync()(或“sync”命令),它指示内核将所有缓冲区刷新到系统范围的磁盘。
If you strace
(getting Linux-specific here) the JVM you should be able to observe an fsync()
or fdatasync()
system call being made on your output file. That would be what I'd expect the getFD()
.sync()
call to do. I assume c.force(true)
simply flags to NIO that fsync()
should be called after each write. It might simply be that the JVM you're using doesn't actually implement the sync()
call?
如果您在这里使用JVM(在此处获取特定于Linux),您应该能够观察到对输出文件进行的fsync()或fdatasync()系统调用。那就是我期望getFD()。sync()调用。我假设c.force(true)只是向NIO标记每次写入后应调用fsync()。可能只是你正在使用的JVM实际上没有实现sync()调用?
I'm not sure why you weren't seeing any difference when calling "sync" as a command: but obviously, after the first sync invocation, subsequent ones are usually quite a lot faster. Again, I'd be inclined to break out strace
(truss on Solaris) as a "what's actually happening here?" tool.
我不确定为什么在将“sync”作为命令调用时没有看到任何区别:但显然,在第一次同步调用之后,后续的调用通常要快得多。再一次,我倾向于将strace(Solaris上的桁架)打破为“这里实际发生了什么?”工具。
#3
3
It is a good idea to use the synchronized I/O data integrity completion. However your C sample is using the wrong method. You use sync()
, which is used to sync the whole OS.
使用同步I / O数据完整性完成是个好主意。但是,您的C示例使用了错误的方法。您使用sync(),它用于同步整个操作系统。
If you want to write the blocks of that single file to disk, you need to use fsync(2)
or fdatasync(2)
in C. BTW: when you use buffered stdio in C (or a BufferedOutputStream or some Writer in Java) you need to flush both first before you sync.
如果要将该单个文件的块写入磁盘,则需要在C中使用fsync(2)或fdatasync(2).BTW:当您在C中使用缓冲的stdio(或者在Java中使用BufferedOutputStream或某些Writer)时在同步之前需要先刷新两者。
The fdatasync()
variant is a bit more efficient if the file has not changed name or size since you sync. But it might also not persit all the meta data. If you want to write your own transactional safe database systems, you need to observe some more stuff (like fsyncing the parent directory).
如果文件在您同步后没有更改名称或大小,则fdatasync()变体会更有效。但它也可能不会持久存在所有元数据。如果你想编写自己的事务安全数据库系统,你需要观察更多的东西(比如fsyncing父目录)。
#4
0
The C code could be suboptimal, because it uses stdio rather than raw OS write(). But then, java could be more optimal because it allocates larger buffers?
C代码可能不是最理想的,因为它使用stdio而不是原始OS write()。但是,java可能更优,因为它分配更大的缓冲区?
Anyway, you can only trust the APIDOC. The rest is beyond your duties.
无论如何,你只能信任APIDOC。其余的超出了你的职责范围。
#5
0
(I know this is a very late reply, but I ran into this thread doing a Google search, and that's probably how you ended up here too.)
(我知道这是一个非常晚的回复,但我在这个线程中遇到谷歌搜索,这可能就是你在这里结束的方式。)
Your calling sync() in Java on a single file descriptor, so only that buffers related to that one file get flushed out to disk.
您在单个文件描述符上调用Java中的sync(),因此只有与该文件相关的缓冲区才会刷新到磁盘。
In C and command-line, you're calling sync() on the entire operating system - so every file buffer gets flushed out to disk, for everything your O/S is doing.
在C和命令行中,您在整个操作系统上调用sync() - 因此每个文件缓冲区都会被刷新到磁盘上,用于O / S正在执行的所有操作。
To be comparable, the C call should be to syncfs(fp);
为了具有可比性,C调用应该是syncfs(fp);
From the Linux man page:
从Linux手册页:
sync() causes all buffered modifications to file metadata and data to
be written to the underlying file systems.
syncfs() is like sync(), but synchronizes just the file system contain‐
ing file referred to by the open file descriptor fd.