集群异常关闭后,有个zookeeper节点始终无法启动,CM上的日志没有明显的报错
解决思路:
1.尝试通过命令行启动zkServer.sh start,查看zookeeper.out,发现报如下错误
Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:175) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:250) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:377) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:118) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:121) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)由此可见,确实是意外退出导致,
2.网上查询解决方法:
首先查看conf/zoo.cfg
配置文件,找到dataDir和dataLogDir对应目录,CDH5.12均为/var/lib/zookeeper,到此文件夹下删除version-2文件夹,命令行启动zookeeper,查看zookeeper.out发现启动成功。
欣喜的到集群上启动,发现仍然失败,且报如下错误
Unable to access datadir, exiting abnormally org.apache.zookeeper.server.persistence.FileTxnSnapLog$DatadirException: Missing data directory /var/lib/zookeeper/version-2, automatic data directory creation is disabled (zookeeper.datadir.autocreate is false). Please create this directory manually. at org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:102) at org.apache.zookeeper.server.quorum.QuorumPeer.<init>(QuorumPeer.java:490) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:138) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
查看发现通过命令行version-2目录用户组为root,而通CM启动的用户组为zookeeper
于是决定删除version-2,通过CM再次启动,让CM来生成这个目录,测试后仍然启动失败,报同样错误
CM有点坑,不会自动生成这个文件夹,于是尝试用命令行启动,生成version后,修改用户组为zookeeper
通过CM成功启动。理论上直接手动创建也可以
3.解决方法总结
- 删除/var/lib/zookeeper/version-2文件夹
- 创建version-2文件夹,修改用户和用户组为zookeeper