zookeeper curator选主(Leader)

时间:2022-03-27 20:00:45

在分布式系统设计中,选主是一个常见的场景。选主是一个这样的过程,通过选主,主节点被选择出来控制其他节点或者是分配任务。

选主算法要满足的几个特征:

1)各个节点均衡的获得成为主节点的权利,一旦主节点被选出,其他的节点可以感知到谁是主节点,被服从分配。

2)主节点是唯一存在的

3)一旦主节点失效,宕机或者断开连接,其他的节点能够感知,并且重新进行选主算法。

zookeeper实现了安全可靠的选主机制。

作为zookeeper的高级api封装库curator选主算法主要有以下两个:Leader LatchLeader Election

1、Leader Latch

实例被选为leader后,执行isLeader中的逻辑。当领导权易主之后才会再次执行isLeader。

直接看代码吧,注释里面已经有说明了。

/*
* Leader Latch(群首闩)
* isLeader 中的方法会在实例被选为主节点后被执行, 而notLeader中的不会被执行
* 如果主节点被失效, 会进行重新选主
* */
public void setLeaderLatch(String path) {
try {
String id = "client#" + InetAddress.getLocalHost().getHostAddress();
leaderLatch = new LeaderLatch(client, path, id);
LeaderLatchListener leaderLatchListener = new LeaderLatchListener() {
@Override
public void isLeader() {
logger.info("[LeaderLatch]我是主节点, id={}", leaderLatch.getId());
} @Override
public void notLeader() {
logger.info("[LeaderLatch]我不是主节点, id={}", leaderLatch.getId());
}
};
leaderLatch.addListener(leaderLatchListener);
leaderLatch.start();
} catch (Exception e) {
logger.error("c创建LeaderLatch失败, path={}", path);
}
} /*
* 判断实例是否是主节点
* */
public boolean hasLeadershipByLeaderLatch() {
return leaderLatch.hasLeadership();
} /*
* 阻塞直到获得领导权
* */
public void awaitByLeaderLatch() {
try {
leaderLatch.await();
} catch (InterruptedException | EOFException e) {
e.printStackTrace();
}
} /*
* 尝试获得领导权并超时
* */
public boolean awaitByLeaderLatch(long timeout, TimeUnit unit) {
boolean hasLeadership = false;
try {
hasLeadership = leaderLatch.await(timeout, unit);
} catch (InterruptedException e) {
e.printStackTrace();
}
return hasLeadership;
}

2、Leader Election

当实例被选为leader之后,调用takeLeadership方法进行业务逻辑处理,处理完成即释放领导权。

其中autoRequeue()方法的调用确保此实例在释放领导权后还可能获得领导权。

/*
* Leader Election模式
* 实例被选主后执行takeLeadership, 执行完之后立刻释放领导权, 再次选主, 所以这里sleep 10秒
* */
public void setLeaderSelector(String path) {
try {
final String id = "client#" + InetAddress.getLocalHost().getHostAddress();
LeaderSelectorListener leaderSelectorListener = new LeaderSelectorListener() {
@Override
public void takeLeadership(CuratorFramework client) throws Exception {
logger.info("[LeaderSelector]我是主节点, id={}", id);
Thread.sleep(10000);
} @Override
public void stateChanged(CuratorFramework client, ConnectionState newState) { }
};
leaderSelector = new LeaderSelector(client, path, leaderSelectorListener);
leaderSelector.autoRequeue();
leaderSelector.start();
} catch (Exception e) {
logger.error("c创建LeaderLatch失败, path={}", path);
}
}

LeaderLatch instances add a ConnectionStateListener to watch for connection problems. If SUSPENDED or LOST is reported, the LeaderLatch that is the leader will report that it is no longer the leader (i.e. there will not be a leader until the connection is re-established). If a LOST connection is RECONNECTED, the LeaderLatch will delete its previous ZNode and create a new one.

Users of LeaderLatch must take account that connection issues can cause leadership to be lost. i.e. hasLeadership() returns true but some time later the connection is SUSPENDED or LOST. At that point hasLeadership() will return false. It is highly recommended that LeaderLatch users register a ConnectionStateListener.

LeaderSelectorListener类继承了ConnectionStateListener。一旦LeaderSelector启动,它会向curator客户端添加监听器。 使用LeaderSelector必须时刻注意连接的变化。一旦出现连接问题如SUSPENDED,curator实例必须确保它可能不再是leader,直至它重新收到RECONNECTED。如果LOST出现,curator实例不再是leader并且其takeLeadership()应该直接退出。

推荐的做法是,如果发生SUSPENDED或者LOST连接问题,最好直接抛CancelLeadershipException,此时,leaderSelector实例会尝试中断并且取消正在执行takeLeadership()方法的线程。 建议扩展LeaderSelectorListenerAdapter, LeaderSelectorListenerAdapter中已经提供了推荐的处理方式 。