背景:
今天让同事用ycsb做HBase的性能测试,他跟我反馈reigon总是在配置的大小前split(配置的是10G),于是我就给他说起了hbase的spilt策略:从0.94增加了新的策略,还是在会每次flush的时候会去判断需不需要split,但是判断的策略有了改变,会比较现有文件的大小与改表region个数的平方*memstore大小的关系,如果前者较大也会去做split,巴拉巴拉。但他跟我说region个数的平方*memstore这个不对吧,日志里是memstore二倍的大小,还给我截图为证,无奈,就翻看了1.1.2的代码,发下算法真的变了,对应变成了region个数的三次方*2*memstore。
日志如下:
regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because 0 size=296442912, sizeToCheck=268435456, regionsWithCommonTable=1从日志中可以看到,每次flush时候都会调用 IncreasingToUpperBoundRegionSplitPolicy类中的shouldSplit方法,方法内容如下:
@Override protected boolean shouldSplit() { if (region.shouldForceSplit()) return true; boolean foundABigStore = false; // Get count of regions that have the same common table as this.region int tableRegionsCount = getCountOfCommonTableRegions(); // Get size to check long sizeToCheck = getSizeToCheck(tableRegionsCount); for (Store store : region.getStores()) { // If any of the stores is unable to split (eg they contain reference files) // then don't split if ((!store.canSplit())) { return false; } // Mark if any store is big enough long size = store.getSize(); if (size > sizeToCheck) { LOG.debug("ShouldSplit because " + store.getColumnFamilyName() + " size=" + size + ", sizeToCheck=" + sizeToCheck + ", regionsWithCommonTable=" + tableRegionsCount); foundABigStore = true; } } return foundABigStore; }
代码说明:
首先,通过getCountOfCommonTableRegions()方法获取目前的region个数tableRegionCount,然后通过getSizeTOCheck(tableRegionCount)方法运算得出一个阈值sizeToCheck,接着在for循环中遍历该reigon下所有的sotre,如果有store不能做split(调用HStore类的canSplit方法,该方法判断store下的hfile是否有被reference的,即region刚拆分,但hfile还处于reference状态,未完成拆分),直接返回false。如果该store可以做split,则比较store下hfile的大小与sizeToCheck的值,如果大于则标识foundABigStore置为true。
接着看下getSizeTOCheck(tableRegionCount)方法:
protected long getSizeToCheck(final int tableRegionsCount) { // safety check for 100 to avoid numerical overflow in extreme cases return tableRegionsCount == 0 || tableRegionsCount > 100 ? getDesiredMaxFileSize(): Math.min(getDesiredMaxFileSize(), this.initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount); }
代码说明:
如果tableRegionCount的值是0或者大于100,则通过getDesiredMaxFileSize()方法读取配置文件中的hbase.hregion.max.filesize值(即前文说的10G),否则进行Math.min判断,后面tableRegionCount三次方很容易理解,看看initialSize怎么来的,相关方法内容如下:
@Override protected void configureForRegion(HRegion region) { super.configureForRegion(region); Configuration conf = getConf(); this.initialSize = conf.getLong("hbase.increasing.policy.initial.size", -1); if (this.initialSize > 0) { return; } HTableDescriptor desc = region.getTableDesc(); if (desc != null) { this.initialSize = 2*desc.getMemStoreFlushSize(); } if (this.initialSize <= 0) { this.initialSize = 2*conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE, HTableDescriptor.DEFAULT_MEMSTORE_FLUSH_SIZE); } }
代码说明:
代码逻辑也是非常简单,这里不再赘述。
补充,昨天以为getCountOfCommonTableRegions()的逻辑是获取这张表所有的region,今天测试发现不是这样,回头再看代码,代码内容如下:
private int getCountOfCommonTableRegions() { RegionServerServices rss = this.region.getRegionServerServices(); // Can be null in tests if (rss == null) return 0; TableName tablename = this.region.getTableDesc().getTableName(); int tableRegionsCount = 0; try { List<Region> hri = rss.getOnlineRegions(tablename); tableRegionsCount = hri == null || hri.isEmpty()? 0: hri.size(); } catch (IOException e) { LOG.debug("Failed getOnlineRegions " + tablename, e); } return tableRegionsCount; } }
首先获取该region所在的regionserver,然后获取该regionserver上的所有region,而不是该表在整个集群中的region数量。