HBase 1.1.2 split 策略

时间:2022-04-22 08:30:49

背景:

今天让同事用ycsb做HBase的性能测试,他跟我反馈reigon总是在配置的大小前split(配置的是10G),于是我就给他说起了hbase的spilt策略:从0.94增加了新的策略,还是在会每次flush的时候会去判断需不需要split,但是判断的策略有了改变,会比较现有文件的大小与改表region个数的平方*memstore大小的关系,如果前者较大也会去做split,巴拉巴拉。但他跟我说region个数的平方*memstore这个不对吧,日志里是memstore二倍的大小,还给我截图为证,无奈,就翻看了1.1.2的代码,发下算法真的变了,对应变成了region个数的三次方*2*memstore。

日志如下:

regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 0 size=296442912, sizeToCheck=268435456, regionsWithCommonTable=1
从日志中可以看到,每次flush时候都会调用 IncreasingToUpperBoundRegionSplitPolicy类中的shouldSplit方法,方法内容如下:

 @Override
  protected boolean shouldSplit() {
    if (region.shouldForceSplit()) return true;
    boolean foundABigStore = false;
    // Get count of regions that have the same common table as this.region
    int tableRegionsCount = getCountOfCommonTableRegions();
    // Get size to check
    long sizeToCheck = getSizeToCheck(tableRegionsCount);

    for (Store store : region.getStores()) {
      // If any of the stores is unable to split (eg they contain reference files)
      // then don't split
      if ((!store.canSplit())) {
        return false;
      }

      // Mark if any store is big enough
      long size = store.getSize();
      if (size > sizeToCheck) {
        LOG.debug("ShouldSplit because " + store.getColumnFamilyName() +
          " size=" + size + ", sizeToCheck=" + sizeToCheck +
          ", regionsWithCommonTable=" + tableRegionsCount);
        foundABigStore = true;
      }
    }

    return foundABigStore;
  }

代码说明:

首先,通过getCountOfCommonTableRegions()方法获取目前的region个数tableRegionCount,然后通过getSizeTOCheck(tableRegionCount)方法运算得出一个阈值sizeToCheck,接着在for循环中遍历该reigon下所有的sotre,如果有store不能做split(调用HStore类的canSplit方法,该方法判断store下的hfile是否有被reference的,即region刚拆分,但hfile还处于reference状态,未完成拆分),直接返回false。如果该store可以做split,则比较store下hfile的大小与sizeToCheck的值,如果大于则标识foundABigStore置为true。

接着看下getSizeTOCheck(tableRegionCount)方法:

  protected long getSizeToCheck(final int tableRegionsCount) {
    // safety check for 100 to avoid numerical overflow in extreme cases
    return tableRegionsCount == 0 || tableRegionsCount > 100 ? getDesiredMaxFileSize():
      Math.min(getDesiredMaxFileSize(),
        this.initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount);
  }

代码说明:

如果tableRegionCount的值是0或者大于100,则通过getDesiredMaxFileSize()方法读取配置文件中的hbase.hregion.max.filesize值(即前文说的10G),否则进行Math.min判断,后面tableRegionCount三次方很容易理解,看看initialSize怎么来的,相关方法内容如下:

@Override
  protected void configureForRegion(HRegion region) {
    super.configureForRegion(region);
    Configuration conf = getConf();
    this.initialSize = conf.getLong("hbase.increasing.policy.initial.size", -1);
    if (this.initialSize > 0) {
      return;
    }
    HTableDescriptor desc = region.getTableDesc();
    if (desc != null) {
      this.initialSize = 2*desc.getMemStoreFlushSize();
    }
    if (this.initialSize <= 0) {
      this.initialSize = 2*conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE,
        HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE);
    }
  }

代码说明:

代码逻辑也是非常简单,这里不再赘述。

补充,昨天以为getCountOfCommonTableRegions()的逻辑是获取这张表所有的region,今天测试发现不是这样,回头再看代码,代码内容如下:

  private int getCountOfCommonTableRegions() {
    RegionServerServices rss = this.region.getRegionServerServices();
    // Can be null in tests
    if (rss == null) return 0;
    TableName tablename = this.region.getTableDesc().getTableName();
    int tableRegionsCount = 0;
    try {
      List<Region> hri = rss.getOnlineRegions(tablename);
      tableRegionsCount = hri == null || hri.isEmpty()? 0: hri.size();
    } catch (IOException e) {
      LOG.debug("Failed getOnlineRegions " + tablename, e);
    }
    return tableRegionsCount;
  }
}


代码说明:

首先获取该region所在的regionserver,然后获取该regionserver上的所有region,而不是该表在整个集群中的region数量。