HBase源码学习客户端scan过程

申明：以下代码均来自HBase-1.0.1.1

HTable tb = new HTable(conf,"test");
Scan scan = new Scan();
scan.addColumn(("colfam").getBytes(),("col").getBytes());

ResultScanner rs = tb.getScanner(scan);
Result r = null;

while((r=rs.next())!=null){
byte[] val = r.getValue(("colfam").getBytes(),("col").getBytes());
    System.out.println("Get value: "+(new String(val)));
}

上面是一段非常简单的扫描test表的代码。首先它构造了一个scan，这是扫描的描述。然后调用HTable.getScanner获取ResultScanner。最后调用ResultScanner.next获取数据结果。整个扫描过程简单的来说就是这样，下面我们具体来看下这个过程是如何工作的。

首先是getScanner，它实际上返回的是ClientScanner或其子类。根据scan设置的不同会返回4种不同的Scanner。

  public ResultScanner getScanner(final Scan scan) throws IOException {
if (scan.getBatch() > 0 && scan.isSmall()) {
throw new IllegalArgumentException("Small scan should not be used with batching");
    }
if (scan.getCaching() <= 0) {
      scan.setCaching(getScannerCaching());
    }

if (scan.isReversed()) {
if (scan.isSmall()) {
return new ClientSmallReversedScanner(getConfiguration(), scan, getName(),
this.connection, this.rpcCallerFactory, this.rpcControllerFactory,
            pool, tableConfiguration.getReplicaCallTimeoutMicroSecondScan());//1
      } else {
return new ReversedClientScanner(getConfiguration(), scan, getName(),
this.connection, this.rpcCallerFactory, this.rpcControllerFactory,
            pool, tableConfiguration.getReplicaCallTimeoutMicroSecondScan());//2
      }
    }

if (scan.isSmall()) {
return new ClientSmallScanner(getConfiguration(), scan, getName(),
this.connection, this.rpcCallerFactory, this.rpcControllerFactory,
          pool, tableConfiguration.getReplicaCallTimeoutMicroSecondScan());//3
    } else {
return new ClientScanner(getConfiguration(), scan, getName(), this.connection,
this.rpcCallerFactory, this.rpcControllerFactory,
          pool, tableConfiguration.getReplicaCallTimeoutMicroSecondScan());//4
    }
  }

ClientScanner实现了ResultScanner接口，我们来看看ClientScanner的next是如何工作的。

    public Result next() throws IOException {
// If the scanner is closed and there's nothing left in the cache, next is a no-op.
if (cache.size() == 0 && this.closed) {
return null;
      }
if (cache.size() == 0) {
        loadCache();
      }

if (cache.size() > 0) {
return cache.poll();
      }

// if we exhausted this scanner before calling close, write out the scan metrics
      writeScanMetrics();
return null;
    }

这段代码的过程是这样：首先判断cache的size是否为0，如果为0则加载一些数据进cache，然后如果cache的size大于0，说明有数据，那么poll第一个数据并返回。从这里我们可以看到，扫描并不是每次next都去server拿数据，而是预先加载一些进来，不够了再取。

然后我们看看如果cache里没有数据，loadCache是如何工作的。loadCache的代码很长，去掉了一些因为注释，把自己的理解注释在里面。

   protected void loadCache() throws IOException {
    Result[] values = null; 
long remainingResultSize = maxScannerResultSize; //剩余cache容量
int countdown = this.caching; //剩余cache能保存的Result数量

//设置cache大小，server将会一次返回这么多条数据回来，如果有的话
    callable.setCaching(this.caching); 

boolean skipFirst = false; //是否需要跳过一行
//发生OutOfOrderException时候，是否需要重试
boolean retryAfterOutOfOrderException = true; 

boolean serverHasMoreResults = false; //region是否还有更多的数据
    do {
try {
//跳过一行数据，为什么？
//因为当发生异常的时候，我们可能需要重试
//重试需要重新定位start row，而之前已经成功读取了一些记录
//我们只能通过lastResult记录最后一次成功读取的记录，然后从下一条开始
//所以这里其实就是skip lastRsult
if (skipFirst) {

          callable.setCaching(1); //跳过一条记录，所以设为1
          values = call(scan, callable, caller, scannerTimeout);

//当前向RS (region server)读取失败，切换到了新的replicaRS，我们需要重做一次skip
//switchedToADifferentReplica用来判断是否做了切换
if (values == null && callable.switchedToADifferentReplica()) {
if (this.lastResult != null) { 
              skipFirst = true;
            }
//设置新的region信息
this.currentRegion = callable.getHRegionInfo();
continue;
          }
          callable.setCaching(this.caching);
          skipFirst = false;
        }
// Server returns a null values if scanning is to stop. Else,
// returns an empty array if scanning is to go on and we've just
// exhausted current region.
//读取数据
//这里最终调用的是callable的call
//callable是ScannerCallableWithReplicas的实例
        values = call(scan,  callable, caller, scannerTimeout);
//我认为这个if不会发生，要么skip成功skipFirst置为false，要么失败执行continue
//这里不重要可以忽略
if (skipFirst && values != null && values.length == 1) {
          skipFirst = false; // Already skipped, unset it before scanning again
          values = call(scan, callable, caller, scannerTimeout);
        }

//这里和上面一样，在失败的时候做一些处理
//多说明一些东西：
//value何时为null？server停止scan，或者第一次调用call
//因为第一次执行openScanner并不返回数据
if (values == null && callable.switchedToADifferentReplica()) { 
if (this.lastResult != null) { 
            skipFirst = true;
          }
this.currentRegion = callable.getHRegionInfo();
continue;
        }
//重置为true
        retryAfterOutOfOrderException = true;
      } catch (DoNotRetryIOException e) {
//这里是一些异常处理

if (e instanceof UnknownScannerException) {
long timeout = lastNext + scannerTimeout;

if (timeout < System.currentTimeMillis()) {
long elapsed = System.currentTimeMillis() - lastNext;
            ScannerTimeoutException ex =
new ScannerTimeoutException(elapsed + "ms passed since the last invocation, "
                    + "timeout is currently set to " + scannerTimeout);
            ex.initCause(e);
throw ex;
          }
        } else {

          Throwable cause = e.getCause();
if ((cause != null && cause instanceof NotServingRegionException) ||
              (cause != null && cause instanceof RegionServerStoppedException) ||
              e instanceof OutOfOrderScannerNextException) {
// Pass
// It is easier writing the if loop test as list of what is allowed rather than
// as a list of what is not allowed... so if in here, it means we do not throw.
          } else {
throw e;
          }
        }
// Else, its signal from depths of ScannerCallable that we need to reset the scanner.
//这里就是上面提到的为什么要skip
//发生异常，我们需要重新开始，所以必须定位start row
if (this.lastResult != null) {

//设置start row为最后一次成功读取的结果，但是我们会跳过它
//真实的start row在下一行
this.scan.setStartRow(this.lastResult.getRow());

// Skip first row returned. We already let it out on previous
// invocation.
          skipFirst = true;
        }
//这个异常在第一出现的时候
//因为retryAfterOutOfOrderException=ture，所以会重试一次
//但是如果连续出现第二次
//因为retryAfterOutOfOrderException还没被重置为true
//所以会直接抛出异常，不再重试
if (e instanceof OutOfOrderScannerNextException) {
if (retryAfterOutOfOrderException) {
            retryAfterOutOfOrderException = false;
          } else {
// TODO: Why wrap this in a DNRIOE when it already is a DNRIOE?
throw new DoNotRetryIOException("Failed after retry of " +
"OutOfOrderScannerNextException: was there a rpc timeout?", e);
          }
        }
//发生异常我们重置region信息和callable
// Clear region.
this.currentRegion = null;
// Set this to zero so we don't try and do an rpc and close on remote server when
// the exception we got was UnknownScanner or the Server is going down.
        callable = null;
// This continue will take us to while at end of loop where we will set up new scanner.
continue;
      }
long currentTime = System.currentTimeMillis();
if (this.scanMetrics != null) {
this.scanMetrics.sumOfMillisSecBetweenNexts.addAndGet(currentTime - lastNext);
      }
      lastNext = currentTime;
//把结果加入cache
if (values != null && values.length > 0) {
for (Result rs : values) {
          cache.add(rs);
// We don't make Iterator here
for (Cell cell : rs.rawCells()) {
            remainingResultSize -= CellUtil.estimatedHeapSizeOf(cell);
          }
          countdown--;
this.lastResult = rs;
        }
      }

//设置serverHasMoreResults，true表示这个region还有数据，否则
if (null != values && values.length > 0 && callable.hasMoreResultsContext()) {
// Only adhere to more server results when we don't have any partialResults
// as it keeps the outer loop logic the same.
        serverHasMoreResults = callable.getServerHasMoreResults();
      }

//这个while的逻辑是这样：
//假如serverHasMoreResults=true，还有数据
//!serverHasMoreResults=false，整个逻辑false，循环终止
//因server一次返回剩余容量的数据，cache已经load满了，当然要终止
//反之，!serverHasMoreResults=true，那么将会执行possiblyNextScanner
//possiblyNextScanner用来寻找下一个region，并构造新的callable
//如果成功找到，那么返回true，这样循环继续进行
    } while (remainingResultSize > 0 && countdown > 0 && !serverHasMoreResults
        && possiblyNextScanner(countdown, values == null));
  }

整个过程的主线是：调用call获取Result，如果发生异常进行重试。重试重新确定start row通过跳过lastResult完成。

注意：
1. call如果失败会自动切换replicaRS，也就是换到备份的region server上去读取，这个只有在scan的Consistency设为Consistency.TIMELINE情况下才会发生。
2. OutOfOrderScannerNextException异常的原因是客户端和服务端维护的一个序列值不一致。每次读取收到结果，客户端的nextCallSeq++。假如某次读取请求服务端正确收到，返回结果给客户端的时候，客户端等待超时了，那么服务端的seq就会比客户端多1。
3.第一次调用call实则是在服务端上打开scan服务，并获取scannerId为之后做准备。如果你继续跟踪possiblyNextScanner，你会发现定位了新的region并构造了对应的ScannerCallableWithReplicas后主动调了一次call，就是这个原因。

更详细的过程可以继续跟踪call，其实是调用ScannerCallableWithReplicas.call,而后者维护了线程池执行任务，最终调用的是ScannerCallable.call。如果任务失败，那么ScannerCallableWithReplicas会为每一个replicaRS构造各自的ScannerCallable，然后各自起任务运行。只有第一个成功返回的replicaRS会被选中，之后的读取会被切换到这个replicaRS上。

秒客网

HBase源码学习客户端scan过程

相关文章

HBase源码学习 客户端scan过程

相关文章

HBase源码学习客户端scan过程