rabbitmq重装之后无法加入原有cluster的bug解析

时间:2021-12-10 05:43:21

背景:

一台controller node,一台compute1节点

两台机器的host文件均已经进行hostname解析

两节点本已经加入了同一rabbitmq cluster

但controller node因为服务原因,还原至裸机状态,在yum安装rabbitmq-server.service之后,存在compute1节点无法加入到controller rabbitmq cluster的异常

相关异常如下

[root@compute1 ~]# rabbitmqctl join_cluster rabbit@controller
Clustering node rabbit@compute1 with rabbit@controller ...
Error: {cannot_start_mnesia,
{{shutdown,{failed_to_start_child,mnesia_kernel_sup,killed}},
{mnesia_sup,start,[normal,[]]}}}
[root@compute1 ~]# rabbitmqctl start_app
Starting node rabbit@compute1 ...
BOOT FAILED
===========
Error description:
{error,{inconsistent_cluster,"Node rabbit@compute1 thinks it's clustered with node rabbit@controller, but rabbit@controller disagrees"}}
Log files (may contain more information):
/var/log/rabbitmq/rabbit@compute1.log
/var/log/rabbitmq/rabbit@compute1-sasl.log
Stack trace:
[{rabbit_mnesia,check_cluster_consistency,,
[{file,"src/rabbit_mnesia.erl"},{line,}]},
{rabbit,'-start/0-fun-0-',,[{file,"src/rabbit.erl"},{line,}]},
{rabbit,start_it,,[{file,"src/rabbit.erl"},{line,}]},
{rpc,'-handle_call_call/6-fun-0-',,[{file,"rpc.erl"},{line,}]}]
Error: {error,{inconsistent_cluster,"Node rabbit@compute1 thinks it's clustered with node rabbit@controller, but rabbit@controller disagrees"}}

其中报错说明是compute1 node认为controller node节点是其cluster,但是controller并不是

同时还有如下的error报错

[root@compute1 ~]# rabbitmqctl join_cluster rabbit@controller
Clustering node rabbit@compute1 with rabbit@controller ...
Error: {cannot_start_mnesia,
{{shutdown,{failed_to_start_child,mnesia_kernel_sup,killed}},
{mnesia_sup,start,[normal,[]]}}}
因为controller node是新安装,其icook信息也复制过去。compute1 node也执行stop_app,故应该推测应该是compute1 node之前残留的cluster信息,导致认证失败

在网上查询到因为mnesia的信息残留,故会认证失败。

其目录为/var/lib/rabbitmq/mnesia

mv /var/lib/rabbitmq/mnesia /tmp

然后再将controller节点的icook节点scp至compute1节点

重新使用 rabbitmqctl join_cluster rabbit@controller

完成cluster的加入

日常很难遇到,但在实验环境中很容易遇到,特此记录,以备后需