大数据学习（16）—— HBase环境搭建和基本操作

部署规划

HBase全称叫Hadoop Database，它的数据存储在HDFS上。我们的实验环境依然基于上个主题Hive的配置，参考大数据学习（11）—— Hive元数据服务模式搭建。

在此基础上，增加HBase的部署规划。我感觉这8G的内存马上要跑不动了。

主机	RegionServer	Master
server01	•
server02	•
server03	•	•

安装HBase

把HBase解压到/usr目录下，版本是2.26。

[root@server01 home]# tar -xvf hbase-2.2.6-bin.tar.gz -C /usr/

把解压好的目录权限修改为hadoop用户和组。

[root@server01 usr]# chown -R hadoop:hadoop hbase-2.2.6/

[root@server01 usr]# ll

总用量 92

drwxr-xr-x. 10 hadoop hadoop   184 9月  24 08:04 apache-hive-3.1.2

drwxr-xr-x.  7 hadoop hadoop   146 9月  24 12:57 apache-zookeeper-3.5.8

dr-xr-xr-x.  2 root   root   24576 10月 23 13:11 bin

drwxr-xr-x.  2 root   root       6 4月  11 2018 etc

drwxr-xr-x.  2 root   root       6 4月  11 2018 games

drwxr-xr-x. 11 hadoop hadoop   227 9月  24 12:58 hadoop-3.3.0

drwxr-xr-x.  6 hadoop hadoop   170 12月  5 14:58 hbase-2.2.6

drwxr-xr-x.  3 root   root      23 9月  22 16:44 include

drwxr-xr-x.  4 root   root      69 10月 23 13:06 java

dr-xr-xr-x. 27 root   root    4096 9月  22 16:46 lib

dr-xr-xr-x. 35 root   root   20480 9月  22 16:46 lib64

drwxr-xr-x. 21 root   root    4096 9月  22 16:46 libexec

drwxr-xr-x. 12 root   root     131 9月  22 16:44 local

dr-xr-xr-x.  2 root   root   12288 9月  29 18:17 sbin

drwxr-xr-x. 77 root   root    4096 9月  23 18:21 share

drwxr-xr-x.  4 root   root      34 9月  22 16:44 src

lrwxrwxrwx.  1 root   root      10 9月  22 16:44 tmp -> ../var/tmp

修改系统环境变量，增加HBase的路径设置

JAVA_HOME=/usr/java/jdk1.8.0

ZOOKEEPER_HOME=/usr/apache-zookeeper-3.5.8

HADOOP_HOME=/usr/hadoop-3.3.0

HIVE_HOME=/usr/apache-hive-3.1.2

HBASE_HOME=/usr/hbase-2.2.6

PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin

切换到hadoop用户，修改配置文件hbase-env.sh，创建/opt/hadoop/pids目录。

# The java implementation to use.  Java 1.8+ required.

export JAVA_HOME=/usr/java/jdk1.8.0/

# Tell HBase whether it should manage it's own instance of ZooKeeper or not.

export HBASE_MANAGES_ZK=false

# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR=/opt/hadoop/pids

修改配置文件hbase-site.xml

<configuration>

  <property>

    <name>hbase.rootdir</name>

    <value>hdfs://mycluster/hbase</value>

  </property>

  <property>

    <name>hbase.cluster.distributed</name>

    <value>true</value>

  </property>

  <property>

    <name>hbase.zookeeper.quorum</name>

    <value>server01,server02,server03</value>

  </property>

  <property>

    <name>hbase.tmp.dir</name>

    <value>./tmp</value>

  </property>

  <property>

    <name>hbase.unsafe.stream.capability.enforce</name>

    <value>false</value>

  </property>

</configuration>

修改regionservers文件，增加RegionServer配置

[hadoop@server01 conf]$ cat regionservers

server01

server02

server03

HDFS客户端配置

官网原文

Of note, if you have made HDFS client configuration changes on your Hadoop cluster, such as configuration directives for HDFS clients, as opposed to server-side configurations, you must use one of the following methods to enable HBase to see and use these configuration changes:

Add a pointer to your HADOOP_CONF_DIR to the HBASE_CLASSPATH environment variable in hbase-env.sh.
Add a copy of hdfs-site.xml (or hadoop-site.xml) or, better, symlinks, under ${HBASE_HOME}/conf, or
if only a small set of HDFS client configurations, add them to hbase-site.xml.

这里采用第二种方式，建一个链接吧。

[hadoop@server01 conf]$ ln -s /usr/hadoop-3.3.0/etc/hadoop/hdfs-site.xml hdfs-site.xml

[hadoop@server01 conf]$ ll

总用量 44

-rw-r--r--. 1 hadoop hadoop 1811 1月  22 2020 hadoop-metrics2-hbase.properties

-rw-r--r--. 1 hadoop hadoop 4284 1月  22 2020 hbase-env.cmd

-rw-r--r--. 1 hadoop hadoop 7533 12月  5 15:43 hbase-env.sh

-rw-r--r--. 1 hadoop hadoop 2257 1月  22 2020 hbase-policy.xml

-rw-r--r--. 1 hadoop hadoop 2322 12月  5 16:40 hbase-site.xml

lrwxrwxrwx. 1 hadoop hadoop   42 12月  5 17:08 hdfs-site.xml -> /usr/hadoop-3.3.0/etc/hadoop/hdfs-site.xml

-rw-r--r--. 1 hadoop hadoop 1169 1月  22 2020 log4j-hbtop.properties

-rw-r--r--. 1 hadoop hadoop 4977 1月  22 2020 log4j.properties

-rw-r--r--. 1 hadoop hadoop   27 12月  5 16:43 regionservers

第一台机的配置全部完成了。把/usr/hbase-2.2.6用scp拷贝到第二台和第三台机器相同目录下，并修改系统环境变量。至此，所有安装和配置全部完成。

启动HBase

在server03上执行start-hbase.sh，启动hbase

[hadoop@server03 conf]$ start-hbase.sh

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hbase-2.2.6/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

running master, logging to /usr/hbase-2.2.6/bin/../logs/hbase-hadoop-master-server03.out

server03: running regionserver, logging to /usr/hbase-2.2.6/bin/../logs/hbase-hadoop-regionserver-server03.out

server02: running regionserver, logging to /usr/hbase-2.2.6/bin/../logs/hbase-hadoop-regionserver-server02.out

server01: running regionserver, logging to /usr/hbase-2.2.6/bin/../logs/hbase-hadoop-regionserver-server01.out

在server02上执行hbase shell，启动命令行

[hadoop@server02 opt]$ hbase shell

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/hbase-2.2.6/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

HBase Shell

Use "help" to get list of supported commands.

Use "exit" to quit this interactive shell.

For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell

Version 2.2.6, r88c9a386176e2c2b5fd9915d0e9d3ce17d0e456e, Tue Sep 15 17:36:14 CST 2020

Took 0.0020 seconds

命令行启动之后就可以试一下hbase的命令了，比方说查看一下有什么表

hbase(main):001:0> list

TABLE

0 row(s)

Took 9.3559 seconds

=> []

用help可以查看所有命令

hbase(main):002:0> help

HBase Shell, version 2.2.6, r88c9a386176e2c2b5fd9915d0e9d3ce17d0e456e, Tue Sep 15 17:36:14 CST 2020

Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.

Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

COMMAND GROUPS:

  Group name: general

  Commands: processlist, status, table_help, version, whoami

  Group name: ddl

  Commands: alter, alter_async, alter_status, clone_table_schema, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters

  Group name: namespace

  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml

  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

  Group name: tools

  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_switch, clear_block_cache, clear_compaction_queues, clear_deadservers, close_region, compact, compact_rs, compaction_state, compaction_switch, decommission_regionservers, flush, hbck_chore_run, is_in_maintenance_mode, list_deadservers, list_decommissioned_regionservers, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, recommission_regionserver, regioninfo, rit, split, splitormerge_enabled, splitormerge_switch, stop_master, stop_regionserver, trace, unassign, wal_roll, zk_dump

  Group name: replication

  Commands: add_peer, append_peer_exclude_namespaces, append_peer_exclude_tableCFs, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_exclude_namespaces, remove_peer_exclude_tableCFs, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_peer_replicate_all, set_peer_serial, set_peer_tableCFs, show_peer_tableCFs, update_peer_config

  Group name: snapshots

  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots, list_table_snapshots, restore_snapshot, snapshot

  Group name: configuration

  Commands: update_all_config, update_config

  Group name: quotas

  Commands: disable_exceed_throttle_quota, disable_rpc_throttle, enable_exceed_throttle_quota, enable_rpc_throttle, list_quota_snapshots, list_quota_table_sizes, list_quotas, list_snapshot_sizes, set_quota

  Group name: security

  Commands: grant, list_security_capabilities, revoke, user_permission

  Group name: procedures

  Commands: list_locks, list_procedures

  Group name: visibility labels

  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility

  Group name: rsgroup

  Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, move_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_rsgroup, rename_rsgroup

SHELL USAGE:

Quote all names in HBase Shell such as table and column names.  Commas delimit

command parameters.  Type <RETURN> after entering a command to run it.

Dictionaries of configuration used in the creation and alteration of tables are

Ruby Hashes. They look like this:

  {'key1' => 'value1', 'key2' => 'value2', ...}

and are opened and closed with curley-braces.  Key/values are delimited by the

'=>' character combination.  Usually keys are predefined constants such as

NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type

'Object.constants' to see a (messy) list of all constants in the environment.

If you are using binary keys or values and need to enter them in the shell, use

double-quote'd hexadecimal representation. For example:

  hbase> get 't1', "key\x03\x3f\xcd"

  hbase> get 't1', "key\003\023\011"

  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.

For more on the HBase Shell, see http://hbase.apache.org/book.html

常用命令

HBase的语法跟SQL完全不同，毕竟是NoSQL。如果不知道怎么使用这些命令，可以直接敲help，根据输出内容，把命令一个一个拿来试试。用错了，它会给出提示，告诉你怎么用。

#官网上有很多例子，我直接拿过来用吧

#创建test表，包含一个列族cf。我这电脑卡的不行，创建个表都要50秒。

hbase(main):008:0> create 'test', 'cf'

Created table test

Took 48.5794 seconds

=> Hbase::Table - test

#看看有没有test这个表

hbase(main):009:0> list 'test'

TABLE

test

1 row(s)

Took 0.4441 seconds

=> ["test"]

#查看表的详细信息

hbase(main):016:0> describe 'test'

Table test is ENABLED

test

COLUMN FAMILIES DESCRIPTION

{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DEL

ETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN

_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMOR

Y => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE',

BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                                

1 row(s)

QUOTAS

0 row(s)

Took 0.3061 seconds       

#向test表插入两条记录

hbase(main):017:0> put 'test','rowkey1','cf:level','P8'

Took 1.7265 seconds

hbase(main):018:0> put 'test','rowkey2','cf:salary','200w'

Took 0.0235 seconds

#全表查询

hbase(main):019:0> scan 'test'

ROW                         COLUMN+CELL

 rowkey1                    column=cf:level, timestamp=1607316881281, value=P8

 rowkey2                    column=cf:salary, timestamp=1607317009943, value=200w

1 row(s)

Took 0.8274 seconds

#查询某一个rowkey的值

hbase(main):029:0> get 'test','rowkey2'

COLUMN                      CELL

 cf:salary                  timestamp=1607317246868, value=200w

1 row(s)

Took 0.2384 seconds 

#禁用test表

hbase(main):030:0> disable 'test'

Took 9.4715 seconds

#删除test表，删除之前必须先禁用disable。不能直接删除使用中的表，否则报错。

hbase(main):031:0> drop 'test'

Took 3.6645 seconds

IDEA连接HBase

在IDEA里创建一个maven工程，pom配置如下

<dependencies>

    <dependency>

      <groupId>junit</groupId>

      <artifactId>junit</artifactId>

      <version>4.11</version>

      <scope>test</scope>

    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->

    <dependency>

      <groupId>org.apache.hbase</groupId>

      <artifactId>hbase-client</artifactId>

      <version>2.0.0</version>

    </dependency>

    <dependency>

      <groupId>junit</groupId>

      <artifactId>junit</artifactId>

      <version>4.13</version>

      <scope>compile</scope>

    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-protocol -->

    <dependency>

      <groupId>org.apache.hbase</groupId>

      <artifactId>hbase-protocol</artifactId>

      <version>2.0.0</version>

    </dependency>

  </dependencies>

我用的HBase版本是2.2.6的，但是在pom里面不能导入2.2.6的包，否则运行代码会报下面的错。在网上找了半天，换成低版本的依赖就正常了，不知道是什么原理。

java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/protobuf/generated/MasterProtos$MasterService$BlockingInterface

服务端数据还用之前那个test表，已经有两条记录了

hbase(main):005:0> scan 'test'

ROW                         COLUMN+CELL

 rowkey1                    column=cf:level, timestamp=1607328808361, value=P8

 rowkey2                    column=cf:salary, timestamp=1607328820620, value=200w

2 row(s)

Took 0.1988 seconds

写一个HBaseTest类，代码如下

package gov.hbczt;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.client.Connection;

import org.apache.hadoop.hbase.client.ConnectionFactory;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Table;

import org.apache.hadoop.hbase.util.Bytes;

import org.junit.After;

import org.junit.Before;

import org.junit.Test;

import java.io.IOException;

public class HBaseTest {

    Configuration conf = null;

    Connection connection = null;

    TableName tname = TableName.valueOf("test");

    Table table = null;

    @Before

    public void init() throws IOException {

        conf = HBaseConfiguration.create();

        connection = ConnectionFactory.createConnection(conf);

        table = connection.getTable(tname);

    }

    @Test

    public void addData() throws IOException {

        Put put = new Put(Bytes.toBytes("rowkey3"));

        put.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("corp"), Bytes.toBytes("Alibaba"));

        table.put(put);

    }

    @After

    public void destroy() throws IOException {

        if(table != null)

            table.close();

        if(connection != null)

            connection.close();

    }

}

执行addData方法，执行成功之后用命令行查一下是不是新增了一条记录

hbase(main):010:0> scan 'test'

ROW                         COLUMN+CELL

 rowkey1                    column=cf:level, timestamp=1607328808361, value=P8

 rowkey2                    column=cf:salary, timestamp=1607328820620, value=200w

 rowkey3                    column=cf:corp, timestamp=1607330730061, value=Alibaba

3 row(s)

Took 0.0930 seconds

详细的API说明文档看这里Apache HBase 2.2.3 API。我不喜欢这种在线的API文档，我喜欢做成chm格式的那种，可以搜索，很方便。

网上增删改查这种例子非常多，这里就不一一列举了。