1.1.1. 人工故障转移之DEBUG SEGFAULT。
人工故障转移在Redis Cluster中有两种途径:
方法一:对一个master使用DEBUG SEGFAULT命令。
方法二:对一个slave使用CLUSTER FAILOVER命令。
本文介绍DEBUG SEGFAULT命令用于master节点的情况。这个命令也可以用于slave节点,但是跟人工故障转移没什么关系,暂不做介绍。
使用DEBUG SEGFAULT命令人工产生一个故障转移事件,从而触发slave的自动提升,从而使得原来的master负责的slots变化为由其获得提升的slave负责,而且该slave将转化为master,取代了原来的master的服务。
执行DEBUG SEGFAULT 命令之前的节点状态如下:master节点7009有2个slave节点:7006,7007。
./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes
37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500107530823 2 connected 5461-10922
78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave 5d0632d76008ea3010878317d804b3c0ae50a13f 0 1500107529816 11 connected
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500107529816 5 connected
5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master - 0 1500107530823 11 connected 0-5460
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500107531327 3 connected 10923-16383
38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500107531831 6 connected
4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected
f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 5d0632d76008ea3010878317d804b3c0ae50a13f 0 1500107531831 11 connected
连接到7009节点,并执行DEBUG SEGFAULT命令。
./redis-cli -c -h 192.168.197.101 -p 7009
192.168.197.101:7009> debug segfault
Could not connect to Redis at 192.168.197.101:7009: Connection refused
(1.37s)
not connected> exit
执行之后,7009节点处于FAIL状态,其slave节点之一7006获得提升,成为新的master节点。
./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes
37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500109072577 2 connected 5461-10922
78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 master - 0 1500109072074 12 connected 0-5460
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500109073080 5 connected
5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master,fail - 1500109048874 1500109046355 11 disconnected
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500109073584 3 connected 10923-16383
38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500109072074 6 connected
4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected
f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 78ae31a28bcd62b87f93c932552b5f6c1fe3329c 0 1500109071570 12 connected
1.1.2. 人工故障转移之CLUSTER FAILOVER
Redis Cluster中,除了对master节点使用DEBUG SEGFAULT命令之外,还有一种方式也可以实现人工故障转移,就是对一个slave使用CLUSTER FAILOVER命令。
当前节点现状如下:
./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes
37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500112868234 2 connected 5461-10922
78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 master - 0 1500112868738 12 connected 0-5460
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500112867230 5 connected
5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master,fail - 1500109048874 1500109046355 11 disconnected
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500112869243 3 connected 10923-16383
38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500112868738 6 connected
4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected
f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 78ae31a28bcd62b87f93c932552b5f6c1fe3329c 0 1500112867732 12 connected
节点7007是slave节点,节点7006是其master节点。
对节点7007执行命令CLUSTER FAILOVER:
./redis-cli -c -h 192.168.197.101 -p 7007
192.168.197.101:7007> cluster failover
OK
执行成功之后,再次查看节点情况:
192.168.197.101:7007> cluster nodes
4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 master - 0 1500113387728 10 connected
f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 myself,master - 0 0 15 connected 0-5460
78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 0 1500113387728 15 connected
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500113389747 3 connected 10923-16383
38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500113388737 3 connected
5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master,fail - 1500109048489 1500109045968 11 disconnected
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500113388234 2 connected
37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500113389242 2 connected 5461-10922
由此可见,CLUSTER FAILOVER命令在没有导致master节点7006变成FAIL状态的情况下,使得slave节点7007提升成为master节点,而且使得原来的master7006节点成为slave节点。
操作完成之后,7006和7007都处于正常状态。
192.168.197.101:7007> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:8
cluster_size:3
cluster_current_epoch:15
cluster_my_epoch:15
cluster_stats_messages_sent:131678
cluster_stats_messages_received:85262
可以看到,整个Cluster的状态也是OK的。
总结:
DEBUG SEGFAULT命令和CLUSTER FAILOVER命令有一些相似之处,也有不同之处。
相似点:
(a)两者都是在节点处于正常工作状态的情况下,通过命令强制模拟了故障的发生。
(b)两者都会导致slave提升为master(DEBUG SEGFAULT用于master节点才会)。
不同点:
(a)DEBUG SEGFAULT可用于master节点,也可以用于slave节点,而CLUSTER FAILOVER只能用于slave节点,否则报错。
(b)DEBUG SEGFAULT执行完成之后会导致原来的master变成FAIL状态,而CLUSTER FAILOVER不会。
(c)DEBUG SEGFAULT执行完成之后,原来的master节点仍为master节点,而CLUSTER FAILOVER执行完成后,原来的master节点会变成slave节点。