Zookeeper第一课 安装和配置

时间:2022-04-03 20:06:13

简介:

Zookeeper,是Google的Chubby一个开源的实现,Hadoop的分布式协调服务,它包含一个简单的原语集,来实现同步、配置维护、分集群、命名的服务

zookeeper是一个由多个service组成的集群,一个leader,多个follower,每个server数据一致,分布式读写,更新请求转发由leader实施.

更新请求顺序进行,来自同一个client的更新请求按其发送顺序依次执行,数据更新原子性,一次数据更新要么成功,要么失败,全局唯一数据试图,client无论连接到哪个server,数据试图是一致的.

 

下载zookeeper的安装包之后, 解压到合适目录.

下载路径:http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.4.6/

 

ZooKeeper集群是一个独立的分布式协调服务集群,“独立”的含义就是说,如果想使用ZooKeeper实现分布式应用的协调与管理,简化协调与管理,任何分布式应用都可以使用,这就要归功于Zookeeper的数据模型(Data Model)和层次命名空间(Hierarchical Namespace)结构,详细可以参考http://zookeeper.apache.org/doc/trunk/zookeeperOver.html。在设计你的分布式应用协调服务时,首要的就是考虑如何组织层次命名空间ZooKeeper集群中具有两个关键的角色:Leader和Follower。集群中所有的结点作为一个整体对分布式应用提供服务,集群中每个结点之间都互相连接,所以,在配置的ZooKeeper集群的时候,每一个结点的host到IP地址的映射都要配置上集群中其它结点的映射信息。

ZooKeeper采用一种称为Leader election的选举算法。在整个集群运行过程中,只有一个Leader,其他的都是Follower,如果ZooKeeper集群在运行过程中Leader出了问题,系统会采用该算法重新选出一个Leader。因此,各个结点之间要能够保证互相连接,必须配置上述映射。
ZooKeeper集群启动的时候,会首先选出一个Leader,在Leader election过程中,某一个满足选举算的结点就能成为Leader。整个集群的架构可以参考http://zookeeper.apache.org/doc/trunk/zookeeperOver.html#sc_designGoals。

Zookeeper 不仅可以单机提供服务,同时也支持多机组成集群来提供服务,实际上Zookeeper还支持另外一种伪集群的方式(也就是可以在一台物理机上运行多个Zookeeper实例)。Zookeeper通过复制来实现高可用性,只要集合体中半数以上的机器处于可用状态,它就能够保证服务继续。集群容灾性:3台机器只要有2台可用就可以选出leader并且对外提供服务(2n+1台机器,可以容n台机器挂掉)。 

 

单机模式

1、进入zookeeper目录下的conf子目录, 创建zoo.cfg(也可以使用默认的zoo_sample.cfg,只需要把名称改下即可):

  Zookeeper第一课 安装和配置

  参数说明:

  • tickTime: zookeeper中使用的基本时间单位, 毫秒值.
  • dataDir: 数据目录. 可以是任意目录.
  • dataLogDir: log目录, 同样可以是任意目录. 如果没有设置该参数, 将使用和dataDir相同的设置.
  • clientPort: 监听client连接的端口号.

2、启动zookeeper:zkServer.cmd(bin目录下)

3、启动客户端:双击zkCli.cmd(zk和客户端在一个机器上的时候)或者zkCli.cmd -server localhost:2181(不在一个机器上的时候)

  Zookeeper第一课 安装和配置

 

伪集群模式

所谓伪集群, 是指在单台机器中启动多个zookeeper进程, 并组成一个集群. 以启动3个zookeeper进程为例.

将zookeeper的目录拷贝2份:

  1. |--zookeeper0  
  2. |--zookeeper1  
  3. |--zookeeper2  

 更改zookeeper0/conf/zoo.cfg文件为:

  tickTime=2000
  initLimit=5
  syncLimit=2
  dataDir=F:/ZOOKEEPER/zookeeper0/data
  dataLogDir=F:/ZOOKEEPER/zookeeper0/logs
  clientPort=4180
  server.0=127.0.0.1:8880:7770
  server.1=127.0.0.1:8881:7771
  server.2=127.0.0.1:8882:7772

 

新增了几个参数, 其含义如下:

  • initLimit: zookeeper集群中的包含多台server, 其中一台为leader, 集群中其余的server为follower. initLimit参数配置初始化连接时, follower和leader之间的最长心跳时间. 此时该参数设置为5, 说明时间限制为5倍tickTime, 即5*2000=10000ms=10s.
  • syncLimit: 该参数配置leader和follower之间发送消息, 请求和应答的最大时间长度. 此时该参数设置为2, 说明时间限制为2倍tickTime, 即4000ms.
  • server.X=A:B:C 其中X是一个数字, 表示这是第几号server. A是该server所在的IP地址. B配置该server和集群中的leader交换消息所使用的端口. C配置选举leader时所使用的端口. 由于配置的是伪集群模式, 所以各个server的B, C参数必须不同.
  • 参照zookeeper0/conf/zoo.cfg, 配置zookeeper1/conf/zoo.cfg, 和zookeeper2/conf/zoo.cfg文件. 只需更改dataDir, dataLogDir, clientPort参数即可.
  • 在之前设置的dataDir中新建myid文件, 写入一个数字, 该数字表示这是第几号server. 该数字必须和zoo.cfg文件中的server.X中的X一一对应.F:/ZOOKEEPER/zookeeper0/data/myid文件中写入0,F:/ZOOKEEPER/zookeeper1/data/myid文件中写入1, F:/ZOOKEEPER/zookeeper2/data/myid文件中写入2.

 

启动server3个server.
任意选择一个server目录, 启动客户端:bin/zkCli.cmd -server localhost:4180

 

集群模式

集群模式的配置和伪集群基本一致.
由于集群模式下, 各server部署在不同的机器上, 因此各server的conf/zoo.cfg文件可以完全一样.

 

在LINUX下的部署:

1、修改ZooKeeper配置文件conf/zoo.cfg:
  tickTime=2000
  dataDir=/home/xuhui/hadoop-2.2.0/tmp/zookeeper
  clientPort=2181
  initLimit=5
  syncLimit=2
  server.1=cloud001:2888:3888
  server.2=cloud002:2888:3888

2、远程复制分发安装文件
  上面已经在一台机器slave-01上配置完成ZooKeeper,现在可以将该配置好的安装文件远程拷贝到集群中的各个结点对应的目录下:
  cd /home/xuhui/hadoop-2.2.0/
  scp -r zookeeper-3.4.6/ xuhui@cloud002:/home/xuhui/hadoop-2.2.0/
3、设置myid
  /dataDir下创建一个文件myid,里面内容为一个数字,用来标识当前主机,和conf/zoo.cfg文件中配置的server.X中X数字抱回一致,例如:
  xuhui@cloud001:~/hadoop-2.2.0/tmp$ mkdir zookeeper
  xuhui@cloud001:~/hadoop-2.2.0$ echo "1" > /home/xuhui/hadoop-2.2.0/tmp/zookeeper/myid
  xuhui@cloud002:~/hadoop-2.2.0/tmp$ mkdir zookeeper
  xuhui@cloud002:~/hadoop-2.2.0$ echo "2" > /home/xuhui/hadoop-2.2.0/tmp/zookeeper/myid
4、启动ZooKeeper集群
在ZooKeeper集群的每个结点上,执行启动ZooKeeper服务的脚本,如下所示:
  xuhui@cloud001:~/hadoop-2.2.0/zookeeper-3.4.6$ bin/zkServer.sh start
  xuhui@cloud002:~/hadoop-2.2.0/zookeeper-3.4.6$ bin/zkServer.sh start
以结点master为例,日志如下所示:
xuhui@cloud001:~/hadoop-2.2.0/zookeeper-3.4.6$ tail -500f zookeeper.out 
2014-05-21 11:26:42,603 [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../conf/zoo.cfg
2014-05-21 11:26:42,611 [myid:] - WARN [main:QuorumPeerConfig@293] - No server failure will be tolerated. You need at least 3 servers.
2014-05-21 11:26:42,612 [myid:] - INFO [main:QuorumPeerConfig@340] - Defaulting to majority quorums
2014-05-21 11:26:42,626 [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2014-05-21 11:26:42,627 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2014-05-21 11:26:42,627 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2014-05-21 11:26:42,646 [myid:1] - INFO [main:QuorumPeerMain@127] - Starting quorum peer
2014-05-21 11:26:42,695 [myid:1] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2014-05-21 11:26:42,744 [myid:1] - INFO [main:QuorumPeer@959] - tickTime set to 2000
2014-05-21 11:26:42,744 [myid:1] - INFO [main:QuorumPeer@979] - minSessionTimeout set to -1
2014-05-21 11:26:42,744 [myid:1] - INFO [main:QuorumPeer@990] - maxSessionTimeout set to -1
2014-05-21 11:26:42,744 [myid:1] - INFO [main:QuorumPeer@1005] - initLimit set to 5
2014-05-21 11:26:42,768 [myid:1] - INFO [main:QuorumPeer@473] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2014-05-21 11:26:42,940 [myid:1] - INFO [main:QuorumPeer@488] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2014-05-21 11:26:43,035 [myid:1] - INFO [Thread-1:QuorumCnxManager$Listener@504] - My election bind port: cloud001/172.24.241.56:3888
2014-05-21 11:26:43,050 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714] - LOOKING
2014-05-21 11:26:43,054 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@815] - New election. My id = 1, proposed zxid=0x0
2014-05-21 11:26:43,057 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2014-05-21 11:26:43,085 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:744)
2014-05-21 11:26:43,263 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:26:43,265 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 400
2014-05-21 11:26:43,667 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:26:43,669 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 800
2014-05-21 11:26:44,471 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:26:44,473 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 1600
2014-05-21 11:26:46,075 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:26:46,076 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 3200
2014-05-21 11:26:49,278 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:26:49,280 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 6400
2014-05-21 11:26:55,682 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:26:55,684 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 12800
2014-05-21 11:27:08,539 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:27:08,541 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 25600
2014-05-21 11:27:34,143 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:27:34,145 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 51200
2014-05-21 11:28:25,347 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address cloud002/172.18.19.37:3888
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2014-05-21 11:28:25,349 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 60000
2014-05-21 11:28:30,573 [myid:1] - INFO [cloud001/172.24.241.56:3888:QuorumCnxManager$Listener@511] - Received connection request /172.18.19.37:39108
2014-05-21 11:28:30,593 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2014-05-21 11:28:30,594 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2014-05-21 11:28:30,796 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@784] - FOLLOWING
2014-05-21 11:28:30,819 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner@86] - TCP NoDelay set to: true
2014-05-21 11:28:30,830 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2014-05-21 11:28:30,830 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:host.name=cloud001
2014-05-21 11:28:30,830 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.version=1.7.0_45
2014-05-21 11:28:30,830 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.vendor=Oracle Corporation
2014-05-21 11:28:30,831 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.home=/usr/lib/jvm/jdk1.7.0_45/jre
2014-05-21 11:28:30,831 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.class.path=/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../build/classes:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../build/lib/*.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../conf:.:/usr/lib/jvm/jdk1.7.0_45/lib:/home/xuhui/hadoop-2.2.0/mahout-distribution-0.9/lib:/usr/lib/jvm/jdk1.7.0_45/jre/lib:.:/usr/lib/jvm/jdk1.7.0_45/lib:/lib:/usr/lib/jvm/jdk1.7.0_45/jre/lib:
2014-05-21 11:28:30,831 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/i386:/lib:/usr/lib
2014-05-21 11:28:30,831 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.io.tmpdir=/tmp
2014-05-21 11:28:30,831 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:java.compiler=<NA>
2014-05-21 11:28:30,836 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:os.name=Linux
2014-05-21 11:28:30,836 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:os.arch=i386
2014-05-21 11:28:30,836 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:os.version=3.8.0-29-generic
2014-05-21 11:28:30,837 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:user.name=xuhui
2014-05-21 11:28:30,837 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:user.home=/home/xuhui
2014-05-21 11:28:30,837 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server environment:user.dir=/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6
2014-05-21 11:28:30,839 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/xuhui/hadoop-2.2.0/tmp/zookeeper/version-2 snapdir /home/xuhui/hadoop-2.2.0/tmp/zookeeper/version-2
2014-05-21 11:28:30,840 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 107786
2014-05-21 11:28:31,367 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner@323] - Getting a diff from the leader 0x0
2014-05-21 11:28:31,371 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240] - Snapshotting: 0x0 to /home/xuhui/hadoop-2.2.0/tmp/zookeeper/version-2/snapshot.0

启动的顺序是slave-01>slave-02>slave-03,由于ZooKeeper集群启动的时候,每个结点都试图去连接集群中的其它结点,先启动的肯定连不上后面还没启动的,所以上面日志前面部分的异常是可以忽略的。通过后面部分可以看到,集群在选出一个Leader后,最后稳定了。
其他结点可能也出现类似问题,属于正常。
第六步:安装验证
可以通过ZooKeeper的脚本来查看启动状态,包括集群中各个结点的角色(或是Leader,或是Follower),如下所示,是在ZooKeeper集群中的每个结点上查询的结果:
xuhui@cloud002:~/hadoop-2.2.0/zookeeper-3.4.6$ bin/zkServer.sh status
JMX enabled by default
Using config: /home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
通过上面状态查询结果可见,cloud002是集群的Leader,其余的两个结点是Follower。
另外,可以通过客户端脚本,连接到ZooKeeper集群上。对于客户端来说,ZooKeeper是一个整体(ensemble),连接到ZooKeeper集群实际上感觉在独享整个集群的服务,所以,你可以在任何一个结点上建立到服务集群的连接,例如:

xuhui@cloud002:~/hadoop-2.2.0/zookeeper-3.4.6$ bin/zkCli.sh -server cloud002:2181
Connecting to cloud002:2181
2014-05-21 11:38:55,520 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2014-05-21 11:38:55,523 [myid:] - INFO [main:Environment@100] - Client environment:host.name=cloud002
2014-05-21 11:38:55,524 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_45
2014-05-21 11:38:55,526 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2014-05-21 11:38:55,526 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/jdk1.7.0_45/jre
2014-05-21 11:38:55,526 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../build/classes:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../build/lib/*.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6/bin/../conf:.:/usr/lib/jvm/jdk1.7.0_45/lib:/usr/lib/jvm/jdk1.7.0_45/jre/lib:
2014-05-21 11:38:55,526 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/i386:/lib:/usr/lib
2014-05-21 11:38:55,526 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2014-05-21 11:38:55,526 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2014-05-21 11:38:55,526 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2014-05-21 11:38:55,526 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=i386
2014-05-21 11:38:55,527 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.8.0-29-generic
2014-05-21 11:38:55,527 [myid:] - INFO [main:Environment@100] - Client environment:user.name=xuhui
2014-05-21 11:38:55,527 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/xuhui
2014-05-21 11:38:55,527 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/home/xuhui/hadoop-2.2.0/zookeeper-3.4.6
2014-05-21 11:38:55,528 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=cloud002:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@a61d64
Welcome to ZooKeeper!
2014-05-21 11:38:55,552 [myid:] - INFO [main-SendThread(cloud002:2181):ClientCnxn$SendThread@975] - Opening socket connection to server cloud002/172.18.19.37:2181. Will not attempt to authenticate using SASL (unknown error)
2014-05-21 11:38:55,575 [myid:] - INFO [main-SendThread(cloud002:2181):ClientCnxn$SendThread@852] - Socket connection established to cloud002/172.18.19.37:2181, initiating session
JLine support is enabled
[zk: cloud002:2181(CONNECTING) 0] 2014-05-21 11:38:55,744 [myid:] - INFO [main-SendThread(cloud002:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server cloud002/172.18.19.37:2181, sessionid = 0x2461cd2455b0000, negotiated timeout = 30000


WATCHER::


WatchedEvent state:SyncConnected type:None path:null


[zk: cloud002:2181(CONNECTED) 1] ls /
[zookeeper]
[zk: cloud002:2181(CONNECTED) 2]

当前根路径为/zookeeper。

总结说明
主机名与IP地址映射配置问题
启动ZooKeeper集群时,如果ZooKeeper集群中slave-02结点的日志出现如下错误:

java.net.SocketTimeoutException
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2012-01-08 06:37:46,026 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@697] - Notification time out: 6400
2012-01-08 06:37:57,431 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@384] - Cannot open channel to 2 at election address slave-02/202.106.199.35:3888
java.net.SocketTimeoutException
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:109)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622)
2012-01-08 06:38:02,442 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@384] - Cannot open channel to 3 at election address slave-03/202.106.199.35:3888
很显然,slave-01在启动时连接集群中其他结点(slave-02、slave-03)时,主机名映射的IP与我们实际配置的不一致,所以集群中各个结点之间无法建立链路,整个ZooKeeper集群启动是失败的。
上面错误日志中slave-02/202.106.199.35:3888实际应该是slave-02/202.192.168.0.178:3888就对了,但是在进行域名解析的时候映射有问题,修改每个结点的/etc/hosts文件,将ZooKeeper集群中所有结点主机名到IP地址的映射配置上。


参考链接:http://blog.csdn.net/shirdrn/article/details/7183503#