Apache HBase 是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,是NoSQL数据库,基于Google Bigtable思想的开源实现,可在廉价的PC Server上搭建大规模结构化存储集群,利用Hadoop HDFS作为其文件存储系统,利用Hadoop MapReduce来处理HBase海量数据,使用Zookeeper协调服务器集群。Apache HBase官网有详细的介绍文档。
Apache HBase的完全分布式集群安装部署并不复杂,下面是部署的详细过程:
1、规划HBase集群节点
本实验有4个节点,要配置HBase Master、Master-backup、RegionServer,节点主机操作系统为Centos 6.9,各节点的进程规划如下:
主机 | IP | 节点进程 |
---|---|---|
hd1 | 172.17.0.1 | Master、Zookeeper |
hd2 | 172.17.0.2 | Master-backup、RegionServer、Zookeeper |
hd3 | 172.17.0.3 | RegionServer、Zookeeper |
hd4 | 172.17.0.4 | RegionServer |
2、安装 JDK、Zookeeper、Hadoop
各服务器节点关闭防火墙、设置selinux为disabled
安装 JDK、Zookeeper、Apache Hadoop 分布式集群(具体过程详见我另一篇博文:Apache Hadoop 2.8分布式集群搭建超详细过程)
安装后设置环境变量,这些变量在安装配置HBase时需要用到
export JAVA_HOME=/usr/java/jdk1.8.0_131 export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/home/ahadoop/hadoop-2.8.0 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export ZOOKEEPER_HOME=/home/ahadoop/zookeeper-3.4.10 export PATH=$PATH:$ZOOKEEPER_HOME/bin
3、安装NTP,实现服务器节点间的时间一致
如果服务器节点之间时间不一致,可能会引发HBase的异常,这一点在HBase官网上有特别强调。在这里,设置第1个节点hd1为NTP的服务端节点,也即该节点(hd1)从国家授时中心同步时间,然后其它节点(hd2、hd3、hd4)作为客户端从hd1同步时间
(1)安装 NTP
# 安装 NTP 服务 yum -y install ntp # 设置为开机启动 chkconfig --add ntpd chkconfig ntpd on
启动 NTP 服务
service ntpd start
(2)配置NTP服务端
在节点hd1,编辑 /etc/ntp.conf 文件,配置NTP服务,具体的配置改动项见以下中文注释
vi /etc/ntp.conf
# For more information about this file, see the man pages # ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5). driftfile /var/lib/ntp/drift # Permit time synchronization with our time source, but do not # permit the source to query or modify the service on this system. restrict default nomodify notrap nopeer noquery # Permit all access over the loopback interface. This could # be tightened as well, but to do so would effect some of # the administrative functions. restrict 127.0.0.1 restrict ::1 # Hosts on local network are less restricted. #restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap # 添加允许接收请求的网络范围 restrict 172.17.0.0 mask 255.255.255.0 nomodify notrap # Use public servers from the pool.ntp.org project. # Please consider joining the pool (http://www.pool.ntp.org/join.html). #server 0.centos.pool.ntp.org iburst # 同步时钟的服务器 server 210.72.145.44 perfer # 中国国家受时中心 server 202.112.10.36 # 1.cn.pool.ntp.org server 59.124.196.83 # 0.asia.pool.ntp.org #broadcast 192.168.1.255 autokey # broadcast server #broadcastclient # broadcast client #broadcast 224.0.1.1 autokey # multicast server #multicastclient 224.0.1.1 # multicast client #manycastserver 239.255.254.254 # manycast server #manycastclient 239.255.254.254 autokey # manycast client # 允许上层时间服务器主动修改本机时间 restrict 210.72.145.44 nomodify notrap noquery restrict 202.112.10.36 nomodify notrap noquery restrict 59.124.196.83 nomodify notrap noquery # 外部时间服务器不可用时,以本地时间作为时间服务 server 127.0.0.1 # local clock fudge 127.0.0.1 stratum 10 # Enable public key cryptography. #crypto includefile /etc/ntp/crypto/pw # Key file containing the keys and key identifiers used when operating # with symmetric key cryptography. keys /etc/ntp/keys # Specify the key identifiers which are trusted. #trustedkey 4 8 42 # Specify the key identifier to use with the ntpdc utility. #requestkey 8 # Specify the key identifier to use with the ntpq utility. #controlkey 8 # Enable writing of statistics records. #statistics clockstats cryptostats loopstats peerstats # Disable the monitoring facility to prevent amplification attacks using ntpdc # monlist command when default restrict does not include the noquery flag. See # CVE-2013-5211 for more details. # Note: Monitoring will not be disabled with the limited restriction flag. disable monitor
重启 NTP 服务
service ntpd restart
然后查看ntp状态
[root@31d48048cb1e ahadoop]# service ntpd status ntpd dead but pid file exists
这时发现有报错,原来ntpd服务有一个限制,ntpd仅同步更改与ntp server时差在1000s内的时间,而查了服务器节点的时间与实际时间差已超过了1000s,因此,必须先手动修改下操作系统时间与ntp server相差时间在1000s以内,然后再去同步服务
# 如果操作系统的时区有错,先修改下时区(亚洲-上海) cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime # 修改日期、时间 date -s 20170703 date -s 15:32:00
其实还有另外一个小技巧,就是在安装好NTP服务后,先通过授时服务器获得准确的时间,这样也不用手工修改了,命令如下:
ntpdate -u pool.ntp.orgpool.ntp.org
【注意】如果是在docker里面执行同步时间操作,系统会报错
9 Jan 05:13:57 ntpdate[7299]: step-systime: Operation not permitted
如果出现这个错误,说明系统不允许自行设置时间。在docker里面,由于docker容器共享的是宿主机的内核,而修改系统时间是内核层面的功能,因此,在 docker 里面是无法修改时间
(3)配置NTP客户端
在节点hd2、hd3、hd4编辑 /etc/ntp.conf 文件,配置 NPT 客户端,具体的配置改动项,见以下的中文注释
vi /etc/ntp.conf
# For more information about this file, see the man pages # ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5). driftfile /var/lib/ntp/drift # Permit time synchronization with our time source, but do not # permit the source to query or modify the service on this system. restrict default nomodify notrap nopeer noquery # Permit all access over the loopback interface. This could # be tightened as well, but to do so would effect some of # the administrative functions. restrict 127.0.0.1 restrict ::1 # Hosts on local network are less restricted. #restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap # Use public servers from the pool.ntp.org project. # Please consider joining the pool (http://www.pool.ntp.org/join.html). #server 0.centos.pool.ntp.org iburst # 同步服务端的时间 server 172.17.0.1 restrict 172.17.0.1 nomodify notrap noquery # 同步失败,则使用本地的时间 server 127.0.0.1 fudge 127.0.0.1 stratum 10 #broadcast 192.168.1.255 autokey # broadcast server #broadcastclient # broadcast client #broadcast 224.0.1.1 autokey # multicast server #multicastclient 224.0.1.1 # multicast client #manycastserver 239.255.254.254 # manycast server #manycastclient 239.255.254.254 autokey # manycast client # Enable public key cryptography. #crypto includefile /etc/ntp/crypto/pw # Key file containing the keys and key identifiers used when operating # with symmetric key cryptography. keys /etc/ntp/keys # Specify the key identifiers which are trusted. #trustedkey 4 8 42 # Specify the key identifier to use with the ntpdc utility. #requestkey 8 # Specify the key identifier to use with the ntpq utility. #controlkey 8 # Enable writing of statistics records. #statistics clockstats cryptostats loopstats peerstats # Disable the monitoring facility to prevent amplification attacks using ntpdc # monlist command when default restrict does not include the noquery flag. See # CVE-2013-5211 for more details. # Note: Monitoring will not be disabled with the limited restriction flag. disable monitor
重启NTP服务
service ntpd restart
启动后,查看时间的同步情况
$ ntpq -p
$ ntpstat
4、修改ulimit
在Apache HBase官网的介绍中有提到,使用 HBase 推荐修改ulimit,以增加同时打开文件的数量,推荐 nofile 至少 10,000 但最好 10,240 (It is recommended to raise the ulimit to at least 10,000, but more likely 10,240, because the value is usually expressed in multiples of 1024.)
修改 /etc/security/limits.conf 文件,在最后加上nofile(文件数量)、nproc(进程数量)属性,如下:
vi /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
修改后,重启服务器生效
reboot
5、安装配置Apache HBase
Apache HBase 官网提供了默认配置说明、参考的配置例子,建议在配置之前先阅读一下。
在本实验中,采用了独立的zookeeper配置,也hadoop共用,zookeeper具体配置方法可参考我的另一篇博客。其实在HBase中,还支持使用内置的zookeeper服务,但如果是在生产环境中,建议单独部署,方便日常的管理。
(1)下载Apache HBase
从官网上面下载最新的二进制版本:hbase-1.2.6-bin.tar.gz
然后解压
tar -zxvf hbase-1.2.6-bin.tar.gz
配置环境变量
vi ~/.bash_profile
export HBASE_HOME=/home/ahadoop/hbase-1.2.6 export PATH=$PATH:$HBASE_HOME/bin # 使用环境变量生效 source ~/.bash_profile
(2)复制hdfs-site.xml配置文件
复制$HADOOP_HOME/etc/hadoop/hdfs-site.xml到$HBASE_HOME/conf目录下,这样以保证hdfs与hbase两边一致,这也是官网所推荐的方式。在官网中提到一个例子,例如hdfs中配置的副本数量为5,而默认为3,如果没有将最新的hdfs-site.xml复制到$HBASE_HOME/conf目录下,则hbase将会按3份备份,从而两边不一致,导致会出现异常。
cp $HADOOP_HOME/etc/hadoop/hdfs-site.xml $HBASE_HOME/conf/
(3)配置hbase-site.xml
使用自带的zk
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave1,slave2</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/hbase/zookeeper/data</value>
</property>
</configuration>
使用单独的zk
编辑 $HBASE_HOME/conf/hbase-site.xml
<configuration> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hd1,hd2,hd3</value> <description>The directory shared by RegionServers. </description> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/ahadoop/zookeeper-data</value> <description> 注意这里的zookeeper数据目录与hadoop ha的共用,也即要与 zoo.cfg 中配置的一致 Property from ZooKeeper config zoo.cfg. The directory where the snapshot is stored. </description> </property> <property> <name>hbase.rootdir</name> <value>hdfs://hd1:9000/hbase</value> <description>The directory shared by RegionServers. 官网多次强调这个目录不要预先创建,hbase会自行创建,否则会做迁移操作,引发错误 至于端口,有些是8020,有些是9000,看 $HADOOP_HOME/etc/hadoop/hdfs-site.xml 里面的配置,本实验配置的是 dfs.namenode.rpc-address.hdcluster.nn1 , dfs.namenode.rpc-address.hdcluster.nn2 </description> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> <description>分布式集群配置,这里要设置为true,如果是单节点的,则设置为false The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed ZooKeeper true: fully-distributed with unmanaged ZooKeeper Quorum (see hbase-env.sh) </description> </property> </configuration>
(4)配置regionserver文件
编辑 $HBASE_HOME/conf/regionservers 文件,输入要运行 regionserver 的主机名
hd2
hd3
hd4
(5)配置 backup-masters 文件(master备用节点)
HBase 支持运行多个 master 节点,因此不会出现单点故障的问题,但只能有一个活动的管理节点(active master),其余为备用节点(backup master),编辑 $HBASE_HOME/conf/backup-masters 文件进行配置备用管理节点的主机名
hd2
(6)配置 hbase-env.sh 文件
编辑 $HBASE_HOME/conf/hbase-env.sh 配置环境变量,由于本实验是使用单独配置的zookeeper,因此,将其中的 HBASE_MANAGES_ZK 设置为 false
export HBASE_MANAGES_ZK=false
到此,HBase 配置完毕
6、启动 Apache HBase
可使用 $HBASE_HOME/bin/start-hbase.sh 指令启动整个集群,如果要使用该命令,则集群的节点必须实现ssh的免密码登录,这样才能到不同的节点启动服务
为了更加深入了解HBase启动过程,本实验将对各个节点依次启动进程,经查看 start-hbase.sh 脚本,里面的启动顺序如下
if [ "$distMode" == 'false' ] then "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" $commandToRun master $@ else "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" $commandToRun zookeeper "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" $commandToRun master "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \ --hosts "${HBASE_REGIONSERVERS}" $commandToRun regionserver "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \ --hosts "${HBASE_BACKUP_MASTERS}" $commandToRun master-backup fi
也就是使用 hbase-daemon.sh 命令依次启动 zookeeper、master、regionserver、master-backup
因此,我们也按照这个顺序,在各个节点进行启动
在启动HBase之前,必须先启动Hadoop,以便于HBase初始化、读取存储在hdfs上的数据
(1)启动zookeeper(hd1、hd2、hd3节点)
zkServer.sh start &
(2)启动hadoop分布式集群(集群的具体配置和节点规划,见我的另一篇博客)
# 启动 journalnode(hd1,hd2,hd3) hdfs journalnode & # 启动 namenode active(hd1) hdfs namenode & # 启动 namenode standby(hd2) hdfs namenode & # 启动ZookeeperFailoverController(hd1,hd2) hdfs zkfc & # 启动 datanode(hd2,hd3,hd4) hdfs datanode &
(3)启动hbase master(hd1)
hbase-daemon.sh start master &
(4)启动hbase regionserver(hd2、hd3、hd4)
hbase-daemon.sh start regionserver &
(5)启动hbase backup-master(hd2)
hbase-daemon.sh start master --backup &
这里很奇怪,在 $HBASE_HOME/bin/start-hbase.sh 写着启动 backup-master 的命令为
"$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \ --hosts "${HBASE_BACKUP_MASTERS}" $commandToRun master-backup
但实际按这个指令执行时,却报错提示无法加载类 master-backup
[ahadoop@1620d6ed305d ~]$ hbase-daemon.sh start master-backup &
[5] 1113
[ahadoop@1620d6ed305d ~]$ starting master-backup, logging to /home/ahadoop/hbase-1.2.6/logs/hbase-ahadoop-master-backup-1620d6ed305d.out
Error: Could not find or load main class master-backup
最后经查资料,才改用了以下命令为启动 backup-master
hbase-daemon.sh start master --backup &
经过以上步骤,就已成功地启动了hbase集群,可到每个节点里面使用 jps 指令查看 hbase 的启动进程情况。
启动后,再查看 hdfs 、zookeeper 的 /hbase 目录,发现均已初始化,并且已写入了相应的文件,如下
[ahadoop@ee8319514df6 ~]$ hadoop fs -ls /hbase
17/07/02 13:14:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 7 items drwxr-xr-x - ahadoop supergroup 0 2017-07-02 12:55 /hbase/.tmp drwxr-xr-x - ahadoop supergroup 0 2017-07-02 12:55 /hbase/MasterProcWALs drwxr-xr-x - ahadoop supergroup 0 2017-07-02 13:03 /hbase/WALs drwxr-xr-x - ahadoop supergroup 0 2017-07-02 12:55 /hbase/data -rw-r--r-- 3 ahadoop supergroup 42 2017-07-02 12:55 /hbase/hbase.id -rw-r--r-- 3 ahadoop supergroup 7 2017-07-02 12:55 /hbase/hbase.version drwxr-xr-x - ahadoop supergroup 0 2017-07-02 12:55 /hbase/oldWALs
[ahadoop@31d48048cb1e ~]$ zkCli.sh -server hd1:2181
Connecting to hd1:2181
2017-07-05 11:31:44,663 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2017-07-05 11:31:44,667 [myid:] - INFO [main:Environment@100] - Client environment:host.name=31d48048cb1e
2017-07-05 11:31:44,668 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_131
2017-07-05 11:31:44,672 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2017-07-05 11:31:44,673 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.8.0_131/jre
2017-07-05 11:31:44,674 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/home/ahadoop/zookeeper-3.4.10/bin/../build/classes:/home/ahadoop/zookeeper-3.4.10/bin/../build/lib/*.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/slf4j-api-1.6.1.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/netty-3.10.5.Final.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/log4j-1.2.16.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/jline-0.9.94.jar:/home/ahadoop/zookeeper-3.4.10/bin/../zookeeper-3.4.10.jar:/home/ahadoop/zookeeper-3.4.10/bin/../src/java/lib/*.jar:/home/ahadoop/zookeeper-3.4.10/bin/../conf:.:/usr/java/jdk1.8.0_131/lib:/usr/java/jdk1.8.0_131/lib/dt.jar:/usr/java/jdk1.8.0_131/lib/tools.jar:/home/ahadoop/apache-ant-1.10.1/lib
2017-07-05 11:31:44,674 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2017-07-05 11:31:44,675 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2017-07-05 11:31:44,675 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2017-07-05 11:31:44,678 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2017-07-05 11:31:44,679 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64
2017-07-05 11:31:44,679 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.10.105-1.el6.elrepo.x86_64
2017-07-05 11:31:44,680 [myid:] - INFO [main:Environment@100] - Client environment:user.name=ahadoop
2017-07-05 11:31:44,680 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/ahadoop
2017-07-05 11:31:44,681 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/home/ahadoop
2017-07-05 11:31:44,686 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=hd1:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@799f7e29 Welcome to ZooKeeper! 2017-07-05 11:31:44,724 [myid:] - INFO [main-SendThread(31d48048cb1e:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 31d48048cb1e/172.17.0.1:2181. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled 2017-07-05 11:31:44,884 [myid:] - INFO [main-SendThread(31d48048cb1e:2181):ClientCnxn$SendThread@876] - Socket connection established to 31d48048cb1e/172.17.0.1:2181, initiating session [zk: hd1:2181(CONNECTED) 0] 2017-07-05 11:31:44,912 [myid:] - INFO [main-SendThread(31d48048cb1e:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server 31d48048cb1e/172.17.0.1:2181, sessionid = 0x15d10c18fc70002, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: hd1:2181(CONNECTED) 1] ls /hbase [replication, meta-region-server, rs, splitWAL, backup-masters, table-lock, flush-table-proc, region-in-transition, online-snapshot, running, recovering-regions, draining, hbaseid, table]
7、HBase 测试使用
使用hbase shell进入到 hbase 的交互命令行界面,这时可进行测试使用
hbase shell
(1)查看集群状态和节点数量
hbase(main):001:0> status
1 active master, 1 backup masters, 4 servers, 0 dead, 0.5000 average load
(2)创建表
hbase(main):002:0> create 'testtable','c1','c2' 0 row(s) in 1.4850 seconds => Hbase::Table - testtable
hbase创建表create命令语法为:表名、列名1、列名2、列名3……
(3)查看表
hbase(main):003:0> list 'testtable' TABLE testtable 1 row(s) in 0.0400 seconds => ["testtable"]
(4)导入数据
hbase(main):004:0> put 'testtable','row1','c1','row1_c1_value' 0 row(s) in 0.2230 seconds hbase(main):005:0> put 'testtable','row2','c2:s1','row1_c2_s1_value' 0 row(s) in 0.0310 seconds hbase(main):006:0> put 'testtable','row2','c2:s2','row1_c2_s2_value' 0 row(s) in 0.0170 seconds
导入数据的命令put的语法为表名、行值、列名(列名可加冒号,表示这个列簇下面还有子列)、列数据
(5)全表扫描数据
hbase(main):007:0> scan 'testtable' ROW COLUMN+CELL row1 column=c1:, timestamp=1499225862922, value=row1_c1_value row2 column=c2:s1, timestamp=1499225869471, value=row1_c2_s1_value row2 column=c2:s2, timestamp=1499225870375, value=row1_c2_s2_value 2 row(s) in 0.0820 seconds
(6)根据条件查询数据
hbase(main):008:0> get 'testtable','row1' COLUMN CELL c1: timestamp=1499225862922, value=row1_c1_value 1 row(s) in 0.0560 seconds hbase(main):009:0> get 'testtable','row2' COLUMN CELL c2:s1 timestamp=1499225869471, value=row1_c2_s1_value c2:s2 timestamp=1499225870375, value=row1_c2_s2_value 2 row(s) in 0.0350 seconds
(7)表失效
使用 disable 命令可将某张表失效,失效后该表将不能使用,例如执行全表扫描操作,会报错,如下
hbase(main):010:0> disable 'testtable' 0 row(s) in 2.3090 seconds hbase(main):011:0> scan 'testtable' ROW COLUMN+CELL ERROR: testtable is disabled. Here is some help for this command: Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, ROWPREFIXFILTER, TIMESTAMP, MAXLENGTH or COLUMNS, CACHE or RAW, VERSIONS, ALL_METRICS or METRICS If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in 'col_family'. The filter can be specified in two ways: 1. Using a filterString - more information on this is available in the Filter Language document attached to the HBASE-4176 JIRA 2. Using the entire package name of the filter. If you wish to see metrics regarding the execution of the scan, the ALL_METRICS boolean should be set to true. Alternatively, if you would prefer to see only a subset of the metrics, the METRICS array can be defined to include the names of only the metrics you care about. Some examples: hbase> scan 'hbase:meta' hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'} hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} hbase> scan 't1', {REVERSED => true} hbase> scan 't1', {ALL_METRICS => true} hbase> scan 't1', {METRICS => ['RPC_RETRIES', 'ROWS_FILTERED']} hbase> scan 't1', {ROWPREFIXFILTER => 'row2', FILTER => " (QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))"} hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} hbase> scan 't1', {CONSISTENCY => 'TIMELINE'} For setting the Operation Attributes hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}} hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']} For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled. Examples: hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false} Also for experts, there is an advanced option -- RAW -- which instructs the scanner to return all cells (including delete markers and uncollected deleted cells). This option cannot be combined with requesting specific COLUMNS. Disabled by default. Example: hbase> scan 't1', {RAW => true, VERSIONS => 10} Besides the default 'toStringBinary' format, 'scan' supports custom formatting by column. A user can define a FORMATTER by adding it to the column name in the scan specification. The FORMATTER can be stipulated: 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString) 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'. Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt', 'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } Note that you can specify a FORMATTER by column only (cf:qualifier). You cannot specify a FORMATTER for all columns of a column family. Scan can also be used directly from a table, by first getting a reference to a table, like such: hbase> t = get_table 't' hbase> t.scan Note in the above situation, you can still provide all the filtering, columns, options, etc as described above.
(8)表重新生效
使用 enable 可使表重新生效,表生效后,即可对表进行操作,例如进行全表扫描操作
hbase(main):012:0> enable 'testtable' 0 row(s) in 1.2800 seconds hbase(main):013:0> scan 'testtable' ROW COLUMN+CELL row1 column=c1:, timestamp=1499225862922, value=row1_c1_value row2 column=c2:s1, timestamp=1499225869471, value=row1_c2_s1_value row2 column=c2:s2, timestamp=1499225870375, value=row1_c2_s2_value 2 row(s) in 0.0590 seconds
(9)删除数据表
使用drop命令对表进行删除,但只有表在失效的情况下,才能进行删除,否则会报错,如下
hbase(main):014:0> drop 'testtable' ERROR: Table testtable is enabled. Disable it first. Here is some help for this command: Drop the named table. Table must first be disabled: hbase> drop 't1' hbase> drop 'ns1:t1'
先对表失效,然后再删除,则可顺序删除表
hbase(main):008:0> disable 'testtable' 0 row(s) in 2.3170 seconds hbase(main):012:0> drop 'testtable' 0 row(s) in 1.2740 seconds
(10)退出 hbase shell
quit
以上就是使用hbase shell进行简单的测试和使用
8、HBase 管理页面
HBase 还提供了管理页面,供用户查看,可更加方便地查看集群状态
在浏览器中输入 http://172.17.0.1:16010 地址(默认端口为 16010),即可进入到管理页面,如下图
查看HBase里面的表信息,点击上方的菜单栏 Table Details 可查看所有表信息,如下图
在主页的 Tables 下面也会列出表名出来,点击可查看某张表的信息,如下图
在 Tables 中点击 System Tables 查看系统表,主要是元数据、命名空间,如下图
以上就是Apache HBase集群配置,以及测试使用的详细过程,欢迎大家批评指正,共同交流进步。