Redis学习笔记17Redis Cluster人工故障转移

时间:2022-02-01 12:39:54

 

1.1.1. 人工故障转移之DEBUG SEGFAULT

人工故障转移在Redis Cluster中有两种途径:

方法一:对一个master使用DEBUG SEGFAULT命令。

方法二:对一个slave使用CLUSTER FAILOVER命令。

 

本文介绍DEBUG SEGFAULT命令用于master节点的情况。这个命令也可以用于slave节点,但是跟人工故障转移没什么关系,暂不做介绍。

使用DEBUG SEGFAULT命令人工产生一个故障转移事件,从而触发slave的自动提升,从而使得原来的master负责的slots变化为由其获得提升的slave负责,而且该slave将转化为master,取代了原来的master的服务。

 

执行DEBUG SEGFAULT 命令之前的节点状态如下:master节点70092slave节点:7006,7007

./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes

37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500107530823 2 connected 5461-10922

78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave 5d0632d76008ea3010878317d804b3c0ae50a13f 0 1500107529816 11 connected

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500107529816 5 connected

5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master - 0 1500107530823 11 connected 0-5460

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500107531327 3 connected 10923-16383

38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500107531831 6 connected

4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected

f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 5d0632d76008ea3010878317d804b3c0ae50a13f 0 1500107531831 11 connected

 

 

连接到7009节点,并执行DEBUG SEGFAULT命令。

./redis-cli -c -h 192.168.197.101 -p 7009

192.168.197.101:7009> debug segfault

Could not connect to Redis at 192.168.197.101:7009: Connection refused

(1.37s)

not connected> exit

执行之后,7009节点处于FAIL状态,其slave节点之一7006获得提升,成为新的master节点。

./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes

37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500109072577 2 connected 5461-10922

78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 master - 0 1500109072074 12 connected 0-5460

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500109073080 5 connected

5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master,fail - 1500109048874 1500109046355 11 disconnected

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500109073584 3 connected 10923-16383

38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500109072074 6 connected

4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected

f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 78ae31a28bcd62b87f93c932552b5f6c1fe3329c 0 1500109071570 12 connected

 

 

1.1.2. 人工故障转移之CLUSTER FAILOVER

 

Redis Cluster中,除了对master节点使用DEBUG SEGFAULT命令之外,还有一种方式也可以实现人工故障转移,就是对一个slave使用CLUSTER FAILOVER命令。

 

当前节点现状如下:

./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes

37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500112868234 2 connected 5461-10922

78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 master - 0 1500112868738 12 connected 0-5460

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500112867230 5 connected

5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master,fail - 1500109048874 1500109046355 11 disconnected

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500112869243 3 connected 10923-16383

38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500112868738 6 connected

4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected

f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 78ae31a28bcd62b87f93c932552b5f6c1fe3329c 0 1500112867732 12 connected

 

节点7007slave节点,节点7006是其master节点。

对节点7007执行命令CLUSTER FAILOVER:

./redis-cli -c -h 192.168.197.101 -p 7007

192.168.197.101:7007> cluster failover

OK

执行成功之后,再次查看节点情况:

192.168.197.101:7007> cluster nodes

4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 master - 0 1500113387728 10 connected

f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 myself,master - 0 0 15 connected 0-5460

78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 0 1500113387728 15 connected

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500113389747 3 connected 10923-16383

38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500113388737 3 connected

5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master,fail - 1500109048489 1500109045968 11 disconnected

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500113388234 2 connected

37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500113389242 2 connected 5461-10922

 

由此可见,CLUSTER FAILOVER命令在没有导致master节点7006变成FAIL状态的情况下,使得slave节点7007提升成为master节点,而且使得原来的master7006节点成为slave节点。

操作完成之后,70067007都处于正常状态。

 

192.168.197.101:7007> cluster info

cluster_state:ok

cluster_slots_assigned:16384

cluster_slots_ok:16384

cluster_slots_pfail:0

cluster_slots_fail:0

cluster_known_nodes:8

cluster_size:3

cluster_current_epoch:15

cluster_my_epoch:15

cluster_stats_messages_sent:131678

cluster_stats_messages_received:85262

可以看到,整个Cluster的状态也是OK的。

 

总结:

DEBUG SEGFAULT命令和CLUSTER FAILOVER命令有一些相似之处,也有不同之处。

相似点:

(a)两者都是在节点处于正常工作状态的情况下,通过命令强制模拟了故障的发生。

(b)两者都会导致slave提升为master(DEBUG SEGFAULT用于master节点才会)

不同点:

(a)DEBUG SEGFAULT可用于master节点,也可以用于slave节点,而CLUSTER FAILOVER只能用于slave节点,否则报错。

(b)DEBUG SEGFAULT执行完成之后会导致原来的master变成FAIL状态,而CLUSTER FAILOVER不会。

(c)DEBUG SEGFAULT执行完成之后,原来的master节点仍为master节点,而CLUSTER FAILOVER执行完成后,原来的master节点会变成slave节点。