上周将公司生产环境中zookeeper集群中的一台zookeeper重启后,所有的连接此zookeeper的客户端都报如下错误:
[ctvpay] 2018-03-23 10:57:08.569 --- ERROR [main-EventThread] CuratorFrameworkImpl.java:529 - Watcher
exception
java.lang.IllegalArgumentException: Path cannot be null
at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45) ~[zookeeper-3.4.6.jar
!/:3.4.6-1569965]
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1572) ~[zookeeper-3.4.6.jar!/:3.4
.6-1569965]
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.jav
a:213) ~[curator-framework-1.3.3.jar!/:?]
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.jav
a:202) ~[curator-framework-1.3.3.jar!/:?]
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106) ~[curator-client-1.3.3.jar!
/:?]
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuild
erImpl.java:198) ~[curator-framework-1.3.3.jar!/:?]
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.ja
va:190) ~[curator-framework-1.3.3.jar!/:?]
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.ja
va:37) ~[curator-framework-1.3.3.jar!/:?]
at com.alibaba.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient$CuratorWatcherImpl.proc
ess(CuratorZookeeperClient.java:115) ~[dubbo-2.5.3.jar!/:2.5.3]
at com.netflix.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:56) [cura
tor-framework-1.3.3.jar!/:?]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) [zookeeper-3.
4.6.jar!/:3.4.6-1569965]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) [zookeeper-3.4.6.jar!/
:3.4.6-1569965]
zookeeper是使用了三台机器做了集群,在重启zookeeper前,查看某个服务的日志时报类似错误,以为zookeeper出了错误。本以为zookeeper做了集群,重启一个zookeeper节点不会整个系统服务的,但是重启了某个zookeeper节点后,所有连接zookeeper集群的服务都报类似的错误,但系统服务是正常的。由于所有服务都是两个,将所有连接zookeeper的服务重启后,问题解决。
经过分析,得出的原因如下:zookeeper重启后,由于客户端一直时开着的,重启了服务端,由于服务端还没初始化好,所有的服务开始连接服务端,导致读取不到“path”,客户端会一直不停的重试,以致所有的链接此zookeeper的服务都报上面错误。但是所有服务连接到其他的zookeeper是正常,故不影响系统服务的正常运行。
导致此问题的最终原因还是服务连接zookeeper的配置问题,查看所有连接zookeeper集群的服务的配置如下:
dubbo.registry.address=192.168.1.10:2181,192.168.1.11:2181,192.168.1.12:2181
参考zookeeper的官方文档及博客(https://www.cnblogs.com/sinxsoft/p/4984321.html)
连接zookeeper集群的正确配置应该写成一个地址加多个backup的方式:
dubbo.registry.address=zookeeper://192.168.1.10:2181?backup=192.168.1.11:2182,192.168.1.12:2183
写成这种方式,重启某个zookeeper节点后,程序不在出现报错