REdis Asynchronous AOF fsync is taking too long

时间:2023-03-10 07:05:44
REdis Asynchronous AOF fsync is taking too long

redis.conf中的no-appendfsync-on-rewrite
默认值为no,表示在重写AOF文件或RDB文件时阻塞fsync。

如果重写AOF或RDB文件时长过长,则在日志中可以看到如下信息:
Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.

严重时会导致该节点被判断为fail,从而触发主从切换,建议尽可能将配置项“appendfsync”的值设置为“no”。

相关源代码(以REdis-5.0.4为例):

void flushAppendOnlyFile(int force) { // aof.c:331
。。。。。。
/* Don't fsync if no-appendfsync-on-rewrite
* is set to yes and there are
* children doing I/O in the background. */
// no-appendfsync-on-rewrite值为yes,
// 并且存在AOF或RDB进程时,直接返回而不调用fsync。
if (server.aof_no_fsync_on_rewrite &&
(server.aof_child_pid != -1 ||
server.rdb_child_pid != -1))
return;
。。。。。。
} int rewriteAppendOnlyFileBackground() { // aof.c:1532
。。。。。。
childpid = fork();
。。。。。。
server.aof_child_pid = childpid;
。。。。。。
} int rdbSaveBackground( // rdb.c:1282
char *filename,
rdbSaveInfo *rsi) {
。。。。。。
childpid = fork();
。。。。。。
server.rdb_child_pid = childpid;
。。。。。。
}

  

22301:M 19 Apr 2019 20:49:39.391 * Starting automatic rewriting of AOF on 100% growth
22301:M 19 Apr 2019 20:49:39.520 * Background append only file rewriting started by pid 38549
22301:M 19 Apr 2019 20:49:59.080 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
22301:M 19 Apr 2019 20:50:32.008 * Background AOF buffer size: 80 MB
22301:M 19 Apr 2019 20:50:47.406 * AOF rewrite child asks to stop sending diffs.
38549:C 19 Apr 2019 20:50:47.406 * Parent agreed to stop sending diffs. Finalizing AOF...
38549:C 19 Apr 2019 20:50:47.406 * Concatenating 647.71 MB of AOF diff received from parent.
38549:C 19 Apr 2019 20:50:53.097 * SYNC append only file rewrite performed
38549:C 19 Apr 2019 20:50:53.248 * AOF rewrite: 3998 MB of memory used by copy-on-write
22301:M 19 Apr 2019 20:50:53.975 * Background AOF rewrite terminated with success
22301:M 19 Apr 2019 20:50:54.030 * Residual parent diff successfully flushed to the rewritten AOF (69.97 MB)
22301:M 19 Apr 2019 20:50:54.214 * Background AOF rewrite finished successfully
22301:M 19 Apr 2019 20:51:30.085 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
22301:M 19 Apr 2019 20:51:54.071 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
22301:M 19 Apr 2019 20:52:23.040 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
22301:M 19 Apr 2019 20:53:12.043 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
22301:M 19 Apr 2019 20:55:06.226 * Marking node 720a9ead7beb61042fd56a873deec6b2cb0daec5 as failing (quorum reached).
22301:M 19 Apr 2019 20:55:06.226 # Cluster state changed: fail
22301:M 19 Apr 2019 20:55:38.186 * Clear FAIL state for node 720a9ead7beb61042fd56a873deec6b2cb0daec5: is reachable again and nobody is serving its slots after some time.
22301:M 19 Apr 2019 20:55:38.186 # Cluster state changed: ok
22301:M 19 Apr 2019 20:55:44.322 * FAIL message received from 252b3bb2f902bc81926c8ae04b6eefa8c3133bd4 about 1d5315824f9ce14f3076ff04c7330590b942efb8
22301:M 19 Apr 2019 20:55:44.322 # Cluster state changed: fail