【发现RIT问题】
hbase某个表区域的状态处于Regions In Transition,即RIT
处于RIT的region是 6f6f4f8204ccaa0b71df08591de155ba 在master角色的web UI查看到具体RIT信息如下Region State RIT time (ms)
6f6f4f8204ccaa0b71df08591de155ba ks_namespace:active_users,9c1be2,1512918712031.6f6f4f8204ccaa0b71df08591de155ba. state=OFFLINE, ts=Thu Jan 25 09:15:31 CST 2018 (1657s ago), server=kmr-5b9c18fc-gn-7b3518df-core-1-006.ksc.com,16020,15137131561341657561
# RIT的持续时间1657561ms,超过了27分钟!
【排查过程】
(1)到HDFS上查看此region是否有数据,结果是有数据的(若无数据则进一步处理)
[hdfs@hbase-master1 yarn]$ hdfs dfs -ls -R /apps/hbase/data/data/ks_namespace/ | grep --color 6f6f4f8204ccaa0b71df08591de155ba
drwxr-xr-x - hbase hdfs 0 2017-12-20 03:23 /apps/hbase/data/data/ks_namespace/active_users/6f6f4f8204ccaa0b71df08591de155ba
-rw-r--r-- 3 hbase hdfs 105 2017-12-10 23:11 /apps/hbase/data/data/ks_namespace/active_users/6f6f4f8204ccaa0b71df08591de155ba/.regioninfo
drwxr-xr-x - hbase hdfs 0 2017-12-19 05:26 /apps/hbase/data/data/ks_namespace/active_users/6f6f4f8204ccaa0b71df08591de155ba/cf
-rw-r--r-- 3 hbase hdfs 386450467 2017-12-19 05:26 /apps/hbase/data/data/ks_namespace/active_users/6f6f4f8204ccaa0b71df08591de155ba/cf/cfaa112cc5f449e0890411a35966825d
drwxr-xr-x - hbase hdfs 0 2017-12-22 21:11 /apps/hbase/data/data/ks_namespace/active_users/6f6f4f8204ccaa0b71df08591de155ba/recovered.edits
-rw-r--r-- 3 hbase hdfs 0 2017-12-22 21:11 /apps/hbase/data/data/ks_namespace/active_users/6f6f4f8204ccaa0b71df08591de155ba/recovered.edits/206932.seqid
(2)检查hadoop fsck检查是否有损坏的文件块,结果是健康的(若不健康则在进一步处理)
[hdfs@hbase-master1 yarn]$ hadoop fsck /apps/hbase/data/data/ks_namespace/active_users/6f6f4f8204ccaa0b71df08591de155ba
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Connecting to namenode via http://hbase-master1.ksc.com:50070/fsck?ugi=hdfs&path=%2Fapps%2Fhbase%2Fdata%2Fdata%2Fks_namespace%2Factive_users%2F6f6f4f8204ccaa0b71df08591de155ba
FSCK started by hdfs (auth:SIMPLE) from /172.31.0.10 for path /apps/hbase/data/data/ks_namespace/active_users/6f6f4f8204ccaa0b71df08591de155ba at Thu Jan 25 09:33:56 CST 2018
...Status: HEALTHY
Total size: 386450572 B
Total dirs: 3
Total files: 3
Total symlinks: 0
Total blocks (validated): 4 (avg. block size 96612643 B)
Minimally replicated blocks: 4 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 8
Number of racks: 1
FSCK ended at Thu Jan 25 09:33:56 CST 2018 in 2 milliseconds
The filesystem under path '/apps/hbase/data/data/ks_namespace/active_users/6f6f4f8204ccaa0b71df08591de155ba' is HEALTHY
【解决】尝试删除zookeeper下的此region节点
# 进到zkCli,删掉RIT节点region-in-transition下的区域名
[zk: localhost:2181(CONNECTED) 3] ls /hbase-unsecure/region-in-transition
[6f6f4f8204ccaa0b71df08591de155ba]
[zk: localhost:2181(CONNECTED) 4] rmr /hbase-unsecure/region-in-transition/6f6f4f8204ccaa0b71df08591de155ba
[zk: localhost:2181(CONNECTED) 5] ls /hbase-unsecure/region-in-transition
[]
[zk: localhost:2181(CONNECTED) 6] ls /hbase-unsecure/region-in-transition
[]
# 重启master之后(master会重建RIT和region信息),再到hbase shell验证一下,结果正常显示
hbase(main):007:0> scan 'ks_namespace:active_users', LIMIT => 2
ROW COLUMN+CELL
(null) column=cf:confidence, timestamp=10000, value=1.0
(null) column=cf:label, timestamp=10000, value=\xE4\xB8\xAD\xE5\x9B\xBD,\xE4\xB8\x8A\xE6\xB5\xB7,\xE4
\xB8\x8A\xE6\xB5\xB7,,\xE7\xA7\xBB\xE5\x8A\xA8
---- column=cf:confidence, timestamp=10000, value=1.0
---- column=cf:label, timestamp=10000, value=\xE4\xB8\xAD\xE5\x9B\xBD,\xE8\xBE\xBD\xE5\xAE\x81,\xE8
\x90\xA5\xE5\x8F\xA3,,\xE8\x81\x94\xE9\x80\x9A
2 row(s) in 0.2300 seconds
【延伸】
具体的RIT产生机制可参考这哥们的博文:http://blog.csdn.net/LW_GHY/article/details/60780065