关于zookeeper和zkfc的一些测试

1.停掉zookeeper集群

****进程影响******

zkfc：报错无法连接zookeeper.ClientCnxn java.net.connectexception:拒绝连接，但不会shutdown

nn ：无影响，未发生切换或shutdown

****命令影响******

hdfs haadmin -failover nn2 nn1

失败报错：因连接zkfc socket timeout导致operation failed，原因：failovercontroller是zkfc的一个模块，zkfc因无法连接zookeeper而无法正常工作

hdfs haadmin -transitionToActive --forceactive --forcemanual nn1

成功作用：nn1变为active，nn2因为丢失latest epoch而shutdown 结论：transitionToActive/Standby命令与zkfc无关

2.停掉zkfc集群

****进程影响******

zookeeper：一个候选地址拒绝连接，关闭socket

nn ：无影响，未发生切换或shutdown

****命令影响******

hdfs haadmin -failover nn2 nn1

失败报错：因无法连接zkfc（拒绝连接）导致operation failed，原因：failovercontroller是zkfc的一个模块，zkfc shutdown导致无法接收请求

hdfs haadmin -transitionToActive --forceactive --forcemanual nn1 成功作用：nn1变为active，nn2因为丢失latest epoch而shutdown 结论：transitionToActive/Standby命令与zkfc无关，注：重启nn2后，有时会出现nn1会因不再持有最新epoch而shutdown

3.nn1 nn2均为standby状态

***执行mr操作****

mr客户端报错：轮流连接两个nn，均返回——operation category read is not supported in state standby

2个nn端报错：operation journal is not supported in state standby

***执行put操作****

同上

4.nn1 nn2均为active-脑裂

只有transition切换时会出现短暂的脑裂状态，但随后old nn就会因为丢失epoch值而shutdown，目前尝试的方法都无法模拟长时间acitve脑裂的现象

5.接下来需要测试的问题：

短暂脑裂后，旧acitve nn启动后转为standby状态，客户端是否会想两个nn都请求读、写，而standby nn是否还会发出delete要求

秒客网

关于zookeeper和zkfc的一些测试

相关文章