数据模型
Zookeeper的数据模型与文件系统非常相似,唯一不同的它的每个节点(ZNode)都可以存放数据,无论父节点还是子节点。
事务ID
即前面提到的ZXID。对每个事务请求,Zookeeper都会分配一个ZXID,保证操作的全局顺序。
节点类型
- 持久节点:创建后一直存在,直到被删除
- 临时节点:当会话结束或超时就会消失
- 有序节点:在给定的节点名后面加上一个有序的数字后缀,这个后缀的上限是整型的最大值
节点的状态
节点的状态信息定于为Stat类,基本属性如下:
版本号-----保证分布式数据的原子操作
上面节点状态属性中的version、cversion、aversion就是Zookeeper利用乐观锁机制来保证原子操作的属性。
Zookeeper服务器的PrepRequestProcessor处理器类中,处理每个数据更新请求(setDataRequest)时,进行如下操作:
zks.sessionTracker.checkSession(request.sessionId, request.getOwner());
SetDataRequest setDataRequest = (SetDataRequest)record;
if(deserialize)
ByteBufferInputStream.byteBuffer2Record(request.request, setDataRequest);
path = setDataRequest.getPath();
validatePath(path, request.sessionId);
nodeRecord = getRecordForPath(path);
checkACL(zks, nodeRecord.acl, ZooDefs.Perms.WRITE,
request.authInfo);
//使用乐观锁检查version
version = setDataRequest.getVersion();
int currentVersion = nodeRecord.stat.getVersion();
if (version != -1 && version != currentVersion) {
throw new KeeperException.BadVersionException(path);
}
version = currentVersion + 1;
request.txn = new SetDataTxn(path, setDataRequest.getData(), version);
nodeRecord = nodeRecord.duplicate(request.hdr.getZxid());
nodeRecord.stat.setVersion(version);
addChangeRecord(nodeRecord);
ACL-----保证数据安全
权限模式(Scheme):
- iP:"ip:192.168.0.12"表示针对这个ip进行权限控制,"ip:192.168.0.1/24"表示对192.168.0.*这个网段控制
- Digest:以"username:password"来标识,Zookeeper会对其进行两次编码----SHA-1和BASE64
- World:对所有用户开放
- Super:超级管理员,可以对任何数据操作,启动时配置
-Dzookeeper.DigestAuthenticationProvider.superDigest=super:password
,password需要经过编码
授权对象(ID):
权限(Permission):
- CREATE:子节点的创建权限
- DELETE:子节点的删除权限
- READ:读取权限
- WRITE:更新权限
- ADMIN:ACL操作权限
watcher机制
总体概况为:客户端注册watcher、服务端处理watcher、客户端回调watcher。
1.客户端注册watcher
以getData为例:
1.标记request,封装watcher到WatcherRegister
public byte[] getData(String path, Watcher watcher, Stat stat) throws KeeperException, InterruptedException {
....
ZooKeeper.WatchRegistration wcb = null;
if (watcher != null) {
wcb = new ZooKeeper.DataWatchRegistration(watcher, path);
}
....
request.setWatch(watcher != null);
GetDataResponse response = new GetDataResponse();
ReplyHeader r = this.cnxn.submitRequest(h, request, response, wcb);
....
}
2.将request封装为Packet(通讯的最小单元)放入发送队列发送,等待服务端响应
public ReplyHeader submitRequest(RequestHeader h, Record request, Record response, WatchRegistration watchRegistration, WatchDeregistration watchDeregistration) throws InterruptedException {
ReplyHeader r = new ReplyHeader();
ClientCnxn.Packet packet = this.queuePacket(h, r, request, response, (AsyncCallback)null, (String)null, (String)null, (Object)null, watchRegistration, watchDeregistration);
synchronized(packet) {
while(!packet.finished) {
packet.wait();
}
return r;
}
}
3.客户端的sendThread的readResqponse()负责接收响应,finishPacket方法将watcher注册到ZKWatcherManager中
private void finishPacket(ClientCnxn.Packet p) {
int err = p.replyHeader.getErr();
if (p.watchRegistration != null) {
p.watchRegistration.register(err);
}
......
}
2.服务端处理watcher
服务端处理分为ServerCnxn(与客户端的连接)存储和watcher触发
2.1ServerCnxn存储
1.FinalRequestProcessor的processRequest会判断是否要注册watcher
case OpCode.getData: {
lastOp = "GETD";
GetDataRequest getDataRequest = new GetDataRequest();
ByteBufferInputStream.byteBuffer2Record(request.request,
getDataRequest);
DataNode n = zks.getZKDatabase().getNode(getDataRequest.getPath());
if (n == null) {
throw new KeeperException.NoNodeException();
}
PrepRequestProcessor.checkACL(zks, zks.getZKDatabase().aclForNode(n),
ZooDefs.Perms.READ,
request.authInfo);
Stat stat = new Stat();
byte b[] = zks.getZKDatabase().getData(getDataRequest.getPath(), stat,
getDataRequest.getWatch() ? cnxn : null);
rsp = new GetDataResponse(b, stat);
break;
}
2.getDataRequest.getWatch()为true会将ServerCnxn存储到WatcherManager中
watchManager是Zk服务器端Watcher的管理者,从两个维度维护watcher:
- watchTable从数据节点的粒度来维护
- watch2Paths从watcher的粒度来维护
2.2watcher触发
当节点数据改变时将调用watcherManager的triggerWatch方法向客户端发送通知
public Set<Watcher> triggerWatch(String path, EventType type, Set<Watcher> supress) {
//1.封装watchedEvent
WatchedEvent e = new WatchedEvent(type,
KeeperState.SyncConnected, path);
HashSet<Watcher> watchers;
//2.查询watcher
synchronized (this) {
watchers = watchTable.remove(path);
if (watchers == null || watchers.isEmpty()) {
if (LOG.isTraceEnabled()) {
ZooTrace.logTraceMessage(LOG,
ZooTrace.EVENT_DELIVERY_TRACE_MASK,
"No watchers for " + path);
}
return null;
}
for (Watcher w : watchers) {
HashSet<String> paths = watch2Paths.get(w);
if (paths != null) {
paths.remove(path);
}
}
}
for (Watcher w : watchers) {
if (supress != null && supress.contains(w)) {
continue;
}
//3.获取ServerCnxn,向客户端发送通知
w.process(e);
}
return watchers;
}
3.客户端回调watcher
1.SendThread接收通知
else if (replyHdr.getXid() == -1) {//-1代表这是通知
if (ClientCnxn.LOG.isDebugEnabled()) {
ClientCnxn.LOG.debug("Got notification sessionid:0x" + Long.toHexString(ClientCnxn.this.sessionId));
}
//1.反序列化
WatcherEvent event = new WatcherEvent();
event.deserialize(bbia, "response");
//2.相对路径处理
if (ClientCnxn.this.chrootPath != null) {
String serverPath = event.getPath();
if (serverPath.compareTo(ClientCnxn.this.chrootPath) == 0) {
event.setPath("/");
} else if (serverPath.length() > ClientCnxn.this.chrootPath.length()) {
event.setPath(serverPath.substring(ClientCnxn.this.chrootPath.length()));
} else {
ClientCnxn.LOG.warn("Got server path " + event.getPath() + " which is too short for chroot path " + ClientCnxn.this.chrootPath);
}
}
//3.还原watchedEvent
WatchedEvent we = new WatchedEvent(event);
if (ClientCnxn.LOG.isDebugEnabled()) {
ClientCnxn.LOG.debug("Got " + we + " for sessionid 0x" + Long.toHexString(ClientCnxn.this.sessionId));
}
//4.交给eventThread回调watcher
ClientCnxn.this.eventThread.queueEvent(we);
}
2.调用EventThread的queueEvent方法从ZKWatcherManager获取watcher入队
private void queueEvent(WatchedEvent event, Set<Watcher> materializedWatchers) {
if (event.getType() != EventType.None || this.sessionState != event.getState()) {
this.sessionState = event.getState();
Object watchers;
if (materializedWatchers == null) {
//从ZKWatcherManager获取watcher
watchers = ClientCnxn.this.watcher.materialize(event.getState(), event.getType(), event.getPath());
} else {
watchers = new HashSet();
((Set)watchers).addAll(materializedWatchers);
}
ClientCnxn.WatcherSetEventPair pair = new ClientCnxn.WatcherSetEventPair((Set)watchers, event);
//入队等待run方法处理
this.waitingEvents.add(pair);
}
}
3.EventThread的run方法串行调用队列中的事件包含的watcher的process方法
public void run() {
try {
this.isRunning = true;
while(true) {
Object event = this.waitingEvents.take();
if (event == ClientCnxn.this.eventOfDeath) {
this.wasKilled = true;
} else {
this.processEvent(event);
}
if (this.wasKilled) {
LinkedBlockingQueue var2 = this.waitingEvents;
synchronized(this.waitingEvents) {
if (this.waitingEvents.isEmpty()) {
this.isRunning = false;
break;
}
}
}
}
} catch (InterruptedException var5) {
ClientCnxn.LOG.error("Event thread exiting due to interruption", var5);
}
ClientCnxn.LOG.info("EventThread shut down for session: 0x{}", Long.toHexString(ClientCnxn.this.getSessionId()));
}
private void processEvent(Object event) {
try {
if (event instanceof ClientCnxn.WatcherSetEventPair) {
ClientCnxn.WatcherSetEventPair pair = (ClientCnxn.WatcherSetEventPair)event;
Iterator i$ = pair.watchers.iterator();
while(i$.hasNext()) {
Watcher watcher = (Watcher)i$.next();
try {
watcher.process(pair.event);
} catch (Throwable var11) {
ClientCnxn.LOG.error("Error while calling watcher ", var11);
}
}
}
......
}
4.watcher特性
- 一次性:客户端和服务端都清除watcher
- 客户端串行执行
- 轻量:只告诉发生什么事件,不告诉变化的数据
参考资料
从 Paxos 到 Zookeeper——分布式一致性原理和实践