【HBase】HBase Getting Started(HBase 入门指南)

时间:2022-05-14 10:42:14

入门指南

1. 简介

Quickstart 会让你启动和运行一个单节点单机HBase。

2. 快速启动 – 单点HBase

这部分描述单节点单机HBase的配置。一个单例拥有所有的HBase守护线程—Master,RegionServers和ZooKeeper,运行一个单独JVM持久化到本地文件系统。这是我们最基础的部署文档。我们将会向你展示如何通过hbase shell CLI在HBase中创建一个表格,在表中插入行,执行put和scan操作,让表使能和启动和停止HBase等等操作。

除了下载HBase,这个过程大概需要不到10分钟地时间。

 

HBase 0.94.x之前的版本希望回送IP地址为127.0.0.1,而UBuntu和其他发行版默认是127.0.1.1,这将会给你造成麻烦。查看 Why does HBase care about /etc/hosts? 获得更多细节

在Ubuntu上运行0.94.x之前版本的HBase,/etc/hosts文档应该以下面所写的模板来保证正常运行

127.0.0.1 localhost

127.0.0.1 ubuntu.ubuntu-domain ubuntu

hbase-0.96.0版本之后的已经修复了。

2.1. JDK 版本要求

HBase 需要安装JDK。查看 Java 来获得每个HBase版本所支持的JDK版本。

2.2. 开始使用 HBase

过程: 下载, 配置, 启动单机模式HBase

  1. 从 Apache Download Mirrors列表中选一个下载节点。点击显示的链接。这将会带你到一个HBase发布版本的镜像。点击名字为stable的文件夹然后下载文件结尾为.tar.gz的二进制文件到你的本地文件系统中。不要下载文件结尾为src.tar.gz的文件。
  2. 提取下载文件并且将它放到新建的目录。
  3. $ tar xzvf hbase-2.0.0-SNAPSHOT-bin.tar.gz
  4. $ cd hbase-2.0.0-SNAPSHOT/
  5. 你需要在启动HBase之前设置好JAVA_HOME环境变量,你可以通过你的操作系统常用方法来设置这个变量,但是HBase提供了一种*机制,conf/hbse-env.sh。编辑这个文档,将JAVA-HOME这一行的注释给取消,然后将他的值设为你的操作系统中JAVA的安装路径。JAVA_HOME变量应该设置包含可执行文件bin/java的路径。大多数现代的Linux操作系统提供一种机制,例如在RHEL或者CentOS是/usr/bin/alternatives,为了能够显示地切换Java版本。在这种情况,你可以在设置JAVA_HOME为包含bin/java符号链接的目录,通常是/usr。

    JAVA_HOME=/usr

6. 编辑conf/hbase-site.xml,该文档是HBase配置文件。在这个时间点你只需要在本地文件系统中指定HBase和ZooKeeper写数据的目录。默认情况下,会在/tmp目录下创建一个新目录。许多服务器会配置为一旦reboot那么会删除/tmp目录下的内容,所以你应该在别的地方存储数据。接下来的配置将会存储HBase的数据在hbase目录下,放在用户testuser的主目录下。新安装的HBase下<configuration> 标签里面的内容是空,粘贴 <property>标签到<configuration> 下进行配置。

Example 1. Example hbase-site.xml for Standalone HBase

<configuration>

<property>

<name>hbase.rootdir</name>

<value>file:///home/testuser/hbase</value>

</property>

<property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/home/testuser/zookeeper</value>

</property>

</configuration>

你不需要创建HBase数据目录。HBase将会为你创建。如果你自己创建了,HBase将会试图一个你并不想要的迁移。

 

上面例子中hbase.rootdir  指向本地文件系统的目录。我们用‘file:/’前缀来表示本地文件系统。将HBase的home目录配置在已有的HDFS实例上,设置hbase.rootdir 指向你的HDFS实例,例如 hdfs://namenode.example.org:8020/hbase.关于这个变量的细节,请查看下面在HDFS上部署单机HBase部分。

7. bin/start-hbase.sh脚本将提供一个简便的方式来启动HBase。发出这个命令并且运行良好的话,一条标准的成功启动的信息会打印在控制台上。你可以通过jps命令来判断你是否已经运行一个HMaster进程。在单价模式下,HBase会在这个单独的JVM中启动HMater,HRegionServer和ZooKeeper守护进程。在 http://localhost:16010查看HBase WebUI 。

 

需要安装Java并且使之可用。如果你已经安装了,但是却报错提示你尚未安装,可能安装在一个非标准路径下,编辑conf/hbase-env.sh并且修改JAVA_HOME,将包含bin/java的目录赋给它

过程: 首次使用HBase

  1)连接HBase

  使在你HBase安装目录下的bin/ 下用hbase shell命令行来连接HBase。在这个例子中,会打印一些你在启动的HBase shell用时遗漏的用法和版本信息。HBase Shell用>符号来表示结束。

  $ ./bin/hbase shell

  hbase(main):001:0>

  2)显示HBase帮助文本

  输出help按下Enter,显示HBase Shell的基础使用信息,以及一些示例命令。需要注意的是表名,行,列都必须用引用符号。

  • 创建表

  使用create命令来创建一个新表。你必须指定表名和列族名

  hbase(main):001:0> create 'test', 'cf'

  0 row(s) in 0.4170 seconds

  => Hbase::Table - test

  • 列出表的信息

使用list命令

hbase(main):002:0> list 'test'

TABLE

test

1 row(s) in 0.0180 seconds

=> ["test"]

  • 插入数据到表中

使用put命令来插入数据.

hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'

0 row(s) in 0.0850 seconds

hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'

0 row(s) in 0.0110 seconds

hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'

0 row(s) in 0.0100 seconds

在这里,我们每次插入一条数据,总共三条。第一次将value1插入到row1,列cf:a中。HBase中的列以列族名为前缀,例子中是cf,后面是冒号和列的限定符后缀,例子中是a。

  • 一次查看所有数据

使用scan命令行来扫描表格的数据。你可以限制你的扫描,但是此时此刻,所有的数据都被获取了。

hbase(main):006:0> scan 'test'

ROW                                      COLUMN+CELL

row1                                    column=cf:a, timestamp=1421762485768, value=value1

row2                                    column=cf:b, timestamp=1421762491785, value=value2

row3                                    column=cf:c, timestamp=1421762496210, value=value3

3 row(s) in 0.0230 seconds

  • 获得一行的数据

用get命令一次获取一行数据

hbase(main):007:0> get 'test', 'row1'

COLUMN                                   CELL

cf:a                                    timestamp=1421762485768, value=value1

1 row(s) in 0.0350 seconds

  • 禁用表

如果你想要删除一个表或者改变它的配置,以及其他一些情况,你首先需要用disable命令来禁用表。

hbase(main):008:0> disable 'test'

0 row(s) in 1.1820 seconds

hbase(main):009:0> enable 'test'

0 row(s) in 0.1770 seconds

在启用‘test’之后再次禁用‘test’

hbase(main):010:0> disable 'test'

0 row(s) in 1.1820 seconds

  • 删除表

用drop命令来删除表

hbase(main):011:0> drop 'test'

0 row(s) in 0.1370 seconds

  • 退出HBase Shell

使用exit来与HBase断开连接,但HBase仍然在后台运行

过程: 关闭 HBase

跟bin/start-hbase.sh脚本一样方便地启动HBase,用bin/stop-hbase.sh脚本来停止它。

$ ./bin/stop-hbase.sh

stopping hbase....................

$

在发出这个命令之后,将花费几分钟的时间来关闭。使用jps来确保HMaster和HRegionServer已经关闭。

上面的内容已经向你展示了如何启动和停止一个单机HBase。在下一部分我们将提供其他模式的部署。

2.3. 伪分布式本地安装

在通过 quickstart 启动了单机模式之后,你可以重新配置来运行伪分布式模式。伪分布式模式意味着HBase仍然运行在一个节点上,但是每个HBase的守护进程(HMaster, HRegionServer, and ZooKeeper)运行在单独的进程中:在单机模式中所有的守护进程都运行在一个JVM实例中。默认情况下,除非你配置像 quickstart中所描述的配置 hbase.rootdir属性,你的数据仍然存储在/tmp/中。在这次演示中,我们将数据存储在HDFS中,确保你HDFS是可用的。你可以跳过HDFS配置继续将数据存储在本地文件系统中

 

 Hadoop配置

这个过程假设你已经在本地系统或者远程系统中配置好Hadoop和HDFS,并且能够运行和确保可用。也假定你使用Hadoop2. Setting up a Single Node Cluster 将引导如何搭建单节点Hadoop

1)如果HBase还在运行请停止它

  如果你已经完成 quickstart 中的指导并且HBase仍然在运行,请停止他。这个过程将创建一个新的目录来储存它的数据,所以之前你创建的数据库将会丢失。

2)配置HBase

编辑hbase-site.xml 进行配置. 第一,添加下面 property来 指导 HBase运行分布式模式, 每个守护进程运行在一个JVM上。

<property>

<name>hbase.cluster.distributed</name>

<value>true</value>

</property>

接下来, 将hbase.rootdir由本地系统改为HDFS实例的地址, 使用 hdfs://// URI 语法. 在这个例子当中, HDFS 运行在端口 8020上.

<property>

<name>hbase.rootdir</name>

<value>hdfs://localhost:8020/hbase</value>

</property>

你不需要在HDFS上创建一个目录。HBase会自己创建。如果你自己创建了,HBase会试图做一些你并不想要的迁移。

3)启动HBase

使用 bin/start-hbase.sh 命令来启动HBase. 如果你的系统配置是正确的话,使用jps命令将会看到HMaster和HRegionServer已经运行。

4)检查HBase在HDFS中的目录

如果所有都运行正确的话,HBase将会在HDFS中创建它的目录。在上面的配置中,它将存储在HDFS的/hbase中。你可以在Hadoop的bin/下使用hadoop fs命令行来列出这个目录下的所有文件。

$ ./bin/hadoop fs -ls /hbase

Found 7 items

drwxr-xr-x   - hbase users          0 2014-06-25 18:58 /hbase/.tmp

drwxr-xr-x   - hbase users          0 2014-06-25 21:49 /hbase/WALs

drwxr-xr-x   - hbase users          0 2014-06-25 18:48 /hbase/corrupt

drwxr-xr-x   - hbase users          0 2014-06-25 18:58 /hbase/data

-rw-r--r--   3 hbase users         42 2014-06-25 18:41 /hbase/hbase.id

-rw-r--r--   3 hbase users          7 2014-06-25 18:41 /hbase/hbase.version

drwxr-xr-x   - hbase users          0 2014-06-25 21:49 /hbase/oldWALs

5)创建一个表格并插入数据

你可以使用HBase Shell来创建一个表格,插入数据,扫描和获取数据,使用方法和 shell exercises所展示的一样。

6)启动和停止一个HMaster备用服务器

 

在同一个硬件环境上运行多个HMaster实例的情况不能出现在生产环境,同样伪分布式也是不允许的。这个步骤只适用于测试和学习

HMaster服务器控制HBase 集群。你可以启动9个HMaster服务器,那么10个HMaster一起执行计算。使用local-master-backup.sh来启动一个HMaster备用服务器。你想要启动的每个备用服务器都要添加一个代表master的端口参数。每个备用HMaster使用三个端口(默认是16010,16020,16030)端口都是以默认默认端口进行偏移的,偏移量为2的话,备用HMaster的端口会是16012,16022,16032。下面的指令用来启动3个端口分别为16012/16022/16032、 16013/16023/16033和16015/16025/16035的HMaster。

$ ./bin/local-master-backup.sh 2 3 5

想要杀掉一个备用master而不是关掉整个进程,你需要找到他的ID(PID)。PID存储在一个名字为/tmp/hbase-USER-X-master.pid的文件中。该文件里面的内容只有PID。你可以使用kill-9命令来杀掉PID。下面的命令杀掉端口为偏移量1的master,而集群仍然运行:

$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9

7)启动和停止另外的RegionServers

HRegionServer被HMaster指导管理它StoreFiles里的数据。通常来说,集群中的每个节点都运行一个HReigionServer,运行多个HRegionServer在同一系统当中可以用来测试伪分布式模式。使用local-regionservers.sh命令运行多个RegionServers。跟local-master-backup.sh一样,为每个实例提供端口偏移量。每个RegionServer需要两个端口,默认端口为16020和16030。然而,1.0.0版本的基本端口已经被HMaster所使用,所以RegionServer无法使用默认端口。所有基本端口改为16200和16300。你可以在一个服务中运行99额外RegionServer而不是一个HMaster或者HMaster。下面的命令用来启动端口从16202/16302开始连续的额外的RegionServer。

$ .bin/local-regionservers.sh start 2 3 4 5

使用local-regionservers.sh 命令 和要关闭的server的偏移量参数来手动停止RegionServer。

$ .bin/local-regionservers.sh stop 3

8)停止 HBase

你可以使用 quickstart 中阐述的命令 bin/stop-hbase.sh 来停止HBase。

2.4. 高级 – 全分布式

事实上,你需要一个全分布式的配置来测试完整的HBase并且将它用在真实世界的应用场景中。在一个分布式配置中,集群包括多个节点,每个节点运行一个或者多个HBase守护进程。这些包括主要的和备用Master实例,多个ZooKeeper节点和多个RegionServer节点。

这个高级配置比quickstart中多添加了两个节点,结构如下:

Table 1. Distributed Cluster Demo Architecture

Node Name

Master

ZooKeeper

RegionServer

node-a.example.com

yes

yes

no

node-b.example.com

backup

yes

yes

node-c.example.com

no

yes

yes

这个快速启动设定每个节点都是一个虚拟机而且他们在同样的网络上。它搭建在之前的quickstart和Pseudo-Distributed Local Install之上,设定你之前配置系统为node-a。在继续操作之前请停止HBase。

 

防火墙也应该关闭确保所节点都能够互相通信。如果你看到no route to host的报错,检查你的防火墙。

过程: 配置无密钥 SSH 登陆

node-a 需要登录到node-b和node-c来启动守护进程。最简单的实现方法是在所有的主机上使用相同用户名,配置无密钥SSH登陆。

1)在 node-a上生成密钥对

登陆那个要运行HBase的用户,使用下面命令生成一个SSH密钥对:

$ ssh-keygen -t rsa

如果该命令成功执行,那么密钥对的路径就会打印到标准输出。公钥的默认名字为 id_rsa.pub

2)在其他节点创建用来储存密钥的路径。

在node-b和node-c,登陆HBase用户并且在用户的home目录下创建.ssh/目录,如果该目录不存在的话。如果已经存在,要意识到他可能已经包含其他密钥了。

3)复制密钥到其他节点

使用scp或者其他安全的方式将密钥安全地从node-a复制到其他每个节点上。每个节点上如果不存在 .ssh/authorized_keys 这个文件的话,那么创建一个,然后将id_rsa.pub 文件的内容添加到该文件末端。需要说明的是你需要在node-a做同样的操作。

$ cat id_rsa.pub >> ~/.ssh/authorized_keys

4)测试无密钥登陆.

如果一切运行顺利的话,那么你可以使用SSH用相同的用户名而不需要密钥的情况下登陆其他节点。

5)因为node-b将会运行一个备用Master,重复上述的过程,将能看到的node-a都换成node-b。确保不要覆盖已经存在的 .ssh/authorized_keys 的文档,但可以用>>符号将密钥追加到已存在的文档后面。

过程: 预备 node-a

node-a将会运行主master和ZooKeeper进程,但是没有RegionServers。在node-a将RegionServer停掉。

1)编辑conf/regionservers 和移除包含localhost的那一行。添加node-b和node-c的主机名和IP地址。

尽管你想要在node-a运行一个RegionServer,你应该给他指定一个主机名便于其他服务可以和它通讯。在这个例子当中,主机名为node-a.example.com。这使得你可以分布配置到集群每个节点来避免主机名冲突。保存文档。

2)将node-b配置为一个备用master。

所以在conf/目录下创建一个名为backup-master的新文件,然后添加一行node-b的主机名。在这个示例当中,主机名为node-b.example.com

3)配置ZooKeeper

事实上,你应该认真的配置你的ZooKeeper。你可以在 zookeeper找到更多关于ZooKeeper的细节。这个配置会指导HBase的启动和管理集群的每个节点中的ZooKeeper实例。

On node-a, edit conf/hbase-site.xml and add the following properties.

<property>

<name>hbase.zookeeper.quorum</name>

<value>node-a.example.com,node-b.example.com,node-c.example.com</value>

</property>

<property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/usr/local/zookeeper</value>

</property>

4)在你的配置中把node-a配置为主机的地方改变指向主机名的引用以致其他节点可以使用它来代表node-a。在这个示例当中,主机名是node-a.example.com。

过程: 预备 node-b  node-c

node-b 将会运行一个备用master 服务器和一个ZooKeeper 实例.

1)下载和解压HBase.

在node-b下下载和解压HBase,跟你在quickstart和伪分布式中所做的一样。

2)从node-a复制配置信息到node-b和node-c

集群中的每个节点需要相同的配置信息。复制conf/下的内容到node-b和node-c下conf/。

过程: 启动和测试你的集群

1)确保任何节点上没有运行HBase

如果你在之前测试中忘记停止HBase,就会出错。用jps命令行检查HBase是否运行。看看HMaster,HRegionServer和HQuorumPeer是否存在,如果存在,那么杀掉。

2)启动集群

在node-a上,运行start-hbase.sh命令。就会打出类似下面的输出:

$ bin/start-hbase.sh

node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out

node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out

node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out

starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out

node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out

node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out

node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out

先启动ZooKeeper,接着master,然后RegionServers,最后是备用masters。

3)检查进程是否运行

在集群中的每个节点,运行jps命令检查每个服务是否正常运行。你可能会看到其他用于其他目的Java进程也运行着。

Example 2. node-a jps Output

$ jps

20355 Jps

20071 HQuorumPeer

20137 HMaster

Example 3. node-b jps Output

$ jps

15930 HRegionServer

16194 Jps

15838 HQuorumPeer

16010 HMaster

Example 4. node-a jps Output

$ jps

13901 Jps

13639 HQuorumPeer

13737 HRegionServer

 

ZooKeeper 进程名字

HQuorumPeer 进程就是ZooKeeper实例由HBase启动用来控制HBase的。如果你在这里使用ZooKeeper,那么会限制集群中每个节点有一个实例并且只适用于测试。如果ZooKeeper运行在HBase之外,那么进程名为QuorumPeer。请到 zookeeper查看更多关于ZooKeeper配置包括如果用外部ZooKeeper控制HBase。

4)浏览Web

 

Web访问端口改变

如果HBase的版本高于0.98.x,那么登陆master的端口由60010改为16010,登陆RegionServer的端口由60030改为16030。

如果配置都正确的话,你应该能够使用浏览器通过 http://node-a.example.com:16010/ 连接Master,通过 http://node-b.example.com:16010/ 连接备用Master。如果你只能通过本地主机登陆而其他主机不能,检查你的防火墙规则。你可以通过ip:16030来连接RegionServers,也可以在Master的Web界面中点击相关链接来登陆。

5)当节点或者服务消失时测试一下发生了什么

正如你配置的三个节点,事情并不总是如你所想。你可以通过杀死进程观察log来看看当主Master或者RegionServer消失时发生了什么?

下面是原文


Getting Started

1. Introduction

Quickstart will get you up and running on a single-node, standalone instance of HBase.

2. Quick Start - Standalone HBase

This section describes the setup of a single-node standalone HBase. A standalone instance has all HBase daemons — the Master, RegionServers, and ZooKeeper — running in a single JVM persisting to the local filesystem. It is our most basic deploy profile. We will show you how to create a table in HBase using the hbase shell CLI, insert rows into the table, perform put and scan operations against the table, enable or disable the table, and start and stop HBase.

Apart from downloading HBase, this procedure should take less than 10 minutes.

 

Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions default to 127.0.1.1 and this will cause problems for you. See Why does HBase care about /etc/hosts? for detail

The following /etc/hosts file works correctly for HBase 0.94.x and earlier, on Ubuntu. Use this as a template if you run into trouble.

127.0.0.1 localhost
127.0.0.1 ubuntu.ubuntu-domain ubuntu

This issue has been fixed in hbase-0.96.0 and beyond.

2.1. JDK Version Requirements

HBase requires that a JDK be installed. See Java for information about supported JDK versions.

2.2. Get Started with HBase

Procedure: Download, Configure, and Start HBase in Standalone Mode
  1. Choose a download site from this list of Apache Download Mirrors. Click on the suggested top link. This will take you to a mirror of HBase Releases. Click on the folder named stable and then download the binary file that ends in .tar.gz to your local filesystem. Do not download the file ending in src.tar.gz for now.

  2. Extract the downloaded file, and change to the newly-created directory.

    $ tar xzvf hbase-2.0.0-SNAPSHOT-bin.tar.gz
    $ cd hbase-2.0.0-SNAPSHOT/
  3. You are required to set the JAVA_HOME environment variable before starting HBase. You can set the variable via your operating system’s usual mechanism, but HBase provides a central mechanism, conf/hbase-env.sh. Edit this file, uncomment the line starting with JAVA_HOME, and set it to the appropriate location for your operating system. The JAVA_HOME variable should be set to a directory which contains the executable file bin/java. Most modern Linux operating systems provide a mechanism, such as /usr/bin/alternatives on RHEL or CentOS, for transparently switching between versions of executables such as Java. In this case, you can set JAVA_HOME to the directory containing the symbolic link to bin/java, which is usually /usr.

    JAVA_HOME=/usr
  4. Edit conf/hbase-site.xml, which is the main HBase configuration file. At this time, you only need to specify the directory on the local filesystem where HBase and ZooKeeper write data. By default, a new directory is created under /tmp. Many servers are configured to delete the contents of /tmp upon reboot, so you should store the data elsewhere. The following configuration will store HBase’s data in the hbase directory, in the home directory of the user called testuser. Paste the <property> tags beneath the <configuration> tags, which should be empty in a new HBase install.

    Example 1. Example hbase-site.xml for Standalone HBase
    <configuration>
    <property>
    <name>hbase.rootdir</name>
    <value>file:///home/testuser/hbase</value>
    </property>
    <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/testuser/zookeeper</value>
    </property>
    </configuration>

    You do not need to create the HBase data directory. HBase will do this for you. If you create the directory, HBase will attempt to do a migration, which is not what you want.

      The hbase.rootdir in the above example points to a directory in the local filesystem. The 'file:/' prefix is how we denote local filesystem. To home HBase on an existing instance of HDFS, set the hbase.rootdir to point at a directory up on your instance: e.g. hdfs://namenode.example.org:8020/hbase. For more on this variant, see the section below on Standalone HBase over HDFS.
  5. The bin/start-hbase.sh script is provided as a convenient way to start HBase. Issue the command, and if all goes well, a message is logged to standard output showing that HBase started successfully. You can use the jpscommand to verify that you have one running process called HMaster. In standalone mode HBase runs all daemons within this single JVM, i.e. the HMaster, a single HRegionServer, and the ZooKeeper daemon. Go to http://localhost:16010 to view the HBase Web UI.

      Java needs to be installed and available. If you get an error indicating that Java is not installed, but it is on your system, perhaps in a non-standard location, edit the conf/hbase-env.sh file and modify the JAVA_HOME setting to point to the directory that contains bin/javayour system.
Procedure: Use HBase For the First Time
  1. Connect to HBase.

    Connect to your running instance of HBase using the hbase shell command, located in the bin/ directory of your HBase install. In this example, some usage and version information that is printed when you start HBase Shell has been omitted. The HBase Shell prompt ends with a > character.

    $ ./bin/hbase shell
    hbase(main):001:0>
  2. Display HBase Shell Help Text.

    Type help and press Enter, to display some basic usage information for HBase Shell, as well as several example commands. Notice that table names, rows, columns all must be enclosed in quote characters.

  3. Create a table.

    Use the create command to create a new table. You must specify the table name and the ColumnFamily name.

    hbase(main):001:0> create 'test', 'cf'
    0 row(s) in 0.4170 seconds => Hbase::Table - test
  4. List Information About your Table

    Use the list command to

    hbase(main):002:0> list 'test'
    TABLE
    test
    1 row(s) in 0.0180 seconds => ["test"]
  5. Put data into your table.

    To put data into your table, use the put command.

    hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
    0 row(s) in 0.0850 seconds hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
    0 row(s) in 0.0110 seconds hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
    0 row(s) in 0.0100 seconds

    Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in this case.

  6. Scan the table for all data at once.

    One of the ways to get data from HBase is to scan. Use the scan command to scan the table for data. You can limit your scan, but for now, all data is fetched.

    hbase(main):006:0> scan 'test'
    ROW COLUMN+CELL
    row1 column=cf:a, timestamp=1421762485768, value=value1
    row2 column=cf:b, timestamp=1421762491785, value=value2
    row3 column=cf:c, timestamp=1421762496210, value=value3
    3 row(s) in 0.0230 seconds
  7. Get a single row of data.

    To get a single row of data at a time, use the get command.

    hbase(main):007:0> get 'test', 'row1'
    COLUMN CELL
    cf:a timestamp=1421762485768, value=value1
    1 row(s) in 0.0350 seconds
  8. Disable a table.

    If you want to delete a table or change its settings, as well as in some other situations, you need to disable the table first, using the disable command. You can re-enable it using the enable command.

    hbase(main):008:0> disable 'test'
    0 row(s) in 1.1820 seconds hbase(main):009:0> enable 'test'
    0 row(s) in 0.1770 seconds

    Disable the table again if you tested the enable command above:

    hbase(main):010:0> disable 'test'
    0 row(s) in 1.1820 seconds
  9. Drop the table.

    To drop (delete) a table, use the drop command.

    hbase(main):011:0> drop 'test'
    0 row(s) in 0.1370 seconds
  10. Exit the HBase Shell.

    To exit the HBase Shell and disconnect from your cluster, use the quit command. HBase is still running in the background.

Procedure: Stop HBase
  1. In the same way that the bin/start-hbase.sh script is provided to conveniently start all HBase daemons, the bin/stop-hbase.sh script stops them.

    $ ./bin/stop-hbase.sh
    stopping hbase....................
    $
  2. After issuing the command, it can take several minutes for the processes to shut down. Use the jps to be sure that the HMaster and HRegionServer processes are shut down.

The above has shown you how to start and stop a standalone instance of HBase. In the next sections we give a quick overview of other modes of hbase deploy.

2.3. Pseudo-Distributed Local Install

After working your way through quickstart standalone mode, you can re-configure HBase to run in pseudo-distributed mode. Pseudo-distributed mode means that HBase still runs completely on a single host, but each HBase daemon (HMaster, HRegionServer, and ZooKeeper) runs as a separate process: in standalone mode all daemons ran in one jvm process/instance. By default, unless you configure the hbase.rootdir property as described in quickstart, your data is still stored in /tmp/. In this walk-through, we store your data in HDFS instead, assuming you have HDFS available. You can skip the HDFS configuration to continue storing your data in the local filesystem.

 
Hadoop Configuration

This procedure assumes that you have configured Hadoop and HDFS on your local system and/or a remote system, and that they are running and available. It also assumes you are using Hadoop 2. The guide onSetting up a Single Node Cluster in the Hadoop documentation is a good starting point.

  1. Stop HBase if it is running.

    If you have just finished quickstart and HBase is still running, stop it. This procedure will create a totally new directory where HBase will store its data, so any databases you created before will be lost.

  2. Configure HBase.

    Edit the hbase-site.xml configuration. First, add the following property. which directs HBase to run in distributed mode, with one JVM instance per daemon.

    <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    </property>

    Next, change the hbase.rootdir from the local filesystem to the address of your HDFS instance, using the hdfs://// URI syntax. In this example, HDFS is running on the localhost at port 8020.

    <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:8020/hbase</value>
    </property>

    You do not need to create the directory in HDFS. HBase will do this for you. If you create the directory, HBase will attempt to do a migration, which is not what you want.

  3. Start HBase.

    Use the bin/start-hbase.sh command to start HBase. If your system is configured correctly, the jps command should show the HMaster and HRegionServer processes running.

  4. Check the HBase directory in HDFS.

    If everything worked correctly, HBase created its directory in HDFS. In the configuration above, it is stored in /hbase/ on HDFS. You can use the hadoop fs command in Hadoop’s bin/ directory to list this directory.

    $ ./bin/hadoop fs -ls /hbase
    Found 7 items
    drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp
    drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs
    drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt
    drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data
    -rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id
    -rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version
    drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs
  5. Create a table and populate it with data.

    You can use the HBase Shell to create a table, populate it with data, scan and get values from it, using the same procedure as in shell exercises.

  6. Start and stop a backup HBase Master (HMaster) server.

      Running multiple HMaster instances on the same hardware does not make sense in a production environment, in the same way that running a pseudo-distributed cluster does not make sense for production. This step is offered for testing and learning purposes only.

    The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster, use the local-master-backup.sh. For each backup master you want to start, add a parameter representing the port offset for that master. Each HMaster uses three ports (16010, 16020, and 16030 by default). The port offset is added to these ports, so using an offset of 2, the backup HMaster would use ports 16012, 16022, and 16032. The following command starts 3 backup servers using ports 16012/16022/16032, 16013/16023/16033, and 16015/16025/16035.

    $ ./bin/local-master-backup.sh 2 3 5

    To kill a backup master without killing the entire cluster, you need to find its process ID (PID). The PID is stored in a file with a name like /tmp/hbase-USER-X-master.pid. The only contents of the file is the PID. You can use the kill -9 command to kill that PID. The following command will kill the master with port offset 1, but leave the cluster running:

    $ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
  7. Start and stop additional RegionServers

    The HRegionServer manages the data in its StoreFiles as directed by the HMaster. Generally, one HRegionServer runs per node in the cluster. Running multiple HRegionServers on the same system can be useful for testing in pseudo-distributed mode. The local-regionservers.sh command allows you to run multiple RegionServers. It works in a similar way to the local-master-backup.sh command, in that each parameter you provide represents the port offset for an instance. Each RegionServer requires two ports, and the default ports are 16020 and 16030. However, the base ports for additional RegionServers are not the default ports since the default ports are used by the HMaster, which is also a RegionServer since HBase version 1.0.0. The base ports are 16200 and 16300 instead. You can run 99 additional RegionServers that are not a HMaster or backup HMaster, on a server. The following command starts four additional RegionServers, running on sequential ports starting at 16202/16302 (base ports 16200/16300 plus 2).

    $ .bin/local-regionservers.sh start 2 3 4 5

    To stop a RegionServer manually, use the local-regionservers.sh command with the stop parameter and the offset of the server to stop.

    $ .bin/local-regionservers.sh stop 3
  8. Stop HBase.

    You can stop HBase the same way as in the quickstart procedure, using the bin/stop-hbase.sh command.

2.4. Advanced - Fully Distributed

In reality, you need a fully-distributed configuration to fully test HBase and to use it in real-world scenarios. In a distributed configuration, the cluster contains multiple nodes, each of which runs one or more HBase daemon. These include primary and backup Master instances, multiple ZooKeeper nodes, and multiple RegionServer nodes.

This advanced quickstart adds two more nodes to your cluster. The architecture will be as follows:

Table 1. Distributed Cluster Demo Architecture
Node Name Master ZooKeeper RegionServer

node-a.example.com

yes

yes

no

node-b.example.com

backup

yes

yes

node-c.example.com

no

yes

yes

This quickstart assumes that each node is a virtual machine and that they are all on the same network. It builds upon the previous quickstart, Pseudo-Distributed Local Install, assuming that the system you configured in that procedure is now node-a. Stop HBase on node-a before continuing.

  Be sure that all the nodes have full access to communicate, and that no firewall rules are in place which could prevent them from talking to each other. If you see any errors like no route to host, check your firewall.
Procedure: Configure Passwordless SSH Access

node-a needs to be able to log into node-b and node-c (and to itself) in order to start the daemons. The easiest way to accomplish this is to use the same username on all hosts, and configure password-less SSH login from node-a to each of the others.

  1. On node-a, generate a key pair.

    While logged in as the user who will run HBase, generate a SSH key pair, using the following command:

    $ ssh-keygen -t rsa

    If the command succeeds, the location of the key pair is printed to standard output. The default name of the public key is id_rsa.pub.

  2. Create the directory that will hold the shared keys on the other nodes.

    On node-b and node-c, log in as the HBase user and create a .ssh/ directory in the user’s home directory, if it does not already exist. If it already exists, be aware that it may already contain other keys.

  3. Copy the public key to the other nodes.

    Securely copy the public key from node-a to each of the nodes, by using the scp or some other secure means. On each of the other nodes, create a new file called .ssh/authorized_keys if it does not already exist, and append the contents of the id_rsa.pub file to the end of it. Note that you also need to do this for node-aitself.

    $ cat id_rsa.pub >> ~/.ssh/authorized_keys
  4. Test password-less login.

    If you performed the procedure correctly, if you SSH from node-a to either of the other nodes, using the same username, you should not be prompted for a password.

  5. Since node-b will run a backup Master, repeat the procedure above, substituting node-b everywhere you see node-a. Be sure not to overwrite your existing .ssh/authorized_keys files, but concatenate the new key onto the existing file using the >> operator rather than the > operator.

Procedure: Prepare node-a

node-a will run your primary master and ZooKeeper processes, but no RegionServers. . Stop the RegionServer from starting on node-a.

  1. Edit conf/regionservers and remove the line which contains localhost. Add lines with the hostnames or IP addresses for node-b and node-c.

    Even if you did want to run a RegionServer on node-a, you should refer to it by the hostname the other servers would use to communicate with it. In this case, that would be node-a.example.com. This enables you to distribute the configuration to each node of your cluster any hostname conflicts. Save the file.

  2. Configure HBase to use node-b as a backup master.

    Create a new file in conf/ called backup-masters, and add a new line to it with the hostname for node-b. In this demonstration, the hostname is node-b.example.com.

  3. Configure ZooKeeper

    In reality, you should carefully consider your ZooKeeper configuration. You can find out more about configuring ZooKeeper in zookeeper. This configuration will direct HBase to start and manage a ZooKeeper instance on each node of the cluster.

    On node-a, edit conf/hbase-site.xml and add the following properties.

    <property>
    <name>hbase.zookeeper.quorum</name>
    <value>node-a.example.com,node-b.example.com,node-c.example.com</value>
    </property>
    <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/usr/local/zookeeper</value>
    </property>
  4. Everywhere in your configuration that you have referred to node-a as localhost, change the reference to point to the hostname that the other nodes will use to refer to node-a. In these examples, the hostname is node-a.example.com.

Procedure: Prepare node-b and node-c

node-b will run a backup master server and a ZooKeeper instance.

  1. Download and unpack HBase.

    Download and unpack HBase to node-b, just as you did for the standalone and pseudo-distributed quickstarts.

  2. Copy the configuration files from node-a to node-b.and node-c.

    Each node of your cluster needs to have the same configuration information. Copy the contents of the conf/directory to the conf/ directory on node-b and node-c.

Procedure: Start and Test Your Cluster
  1. Be sure HBase is not running on any node.

    If you forgot to stop HBase from previous testing, you will have errors. Check to see whether HBase is running on any of your nodes by using the jps command. Look for the processes HMasterHRegionServer, and HQuorumPeer. If they exist, kill them.

  2. Start the cluster.

    On node-a, issue the start-hbase.sh command. Your output will be similar to that below.

    $ bin/start-hbase.sh
    node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
    node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
    node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
    starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
    node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
    node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
    node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out

    ZooKeeper starts first, followed by the master, then the RegionServers, and finally the backup masters.

  3. Verify that the processes are running.

    On each node of the cluster, run the jps command and verify that the correct processes are running on each server. You may see additional Java processes running on your servers as well, if they are used for other purposes.

    Example 2. node-a jps Output
    $ jps
    20355 Jps
    20071 HQuorumPeer
    20137 HMaster
    Example 3. node-b jps Output
    $ jps
    15930 HRegionServer
    16194 Jps
    15838 HQuorumPeer
    16010 HMaster
    Example 4. node-a jps Output
    $ jps
    13901 Jps
    13639 HQuorumPeer
    13737 HRegionServer
     
    ZooKeeper Process Name

    The HQuorumPeer process is a ZooKeeper instance which is controlled and started by HBase. If you use ZooKeeper this way, it is limited to one instance per cluster node, , and is appropriate for testing only. If ZooKeeper is run outside of HBase, the process is called QuorumPeer. For more about ZooKeeper configuration, including using an external ZooKeeper instance with HBase, see zookeeper.

  4. Browse to the Web UI.

     
    Web UI Port Changes

    Web UI Port Changes

    In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI changed from 60010 for the Master and 60030 for each RegionServer to 16010 for the Master and 16030 for the RegionServer.

    If everything is set up correctly, you should be able to connect to the UI for the Master http://node-a.example.com:16010/ or the secondary master at http://node-b.example.com:16010/ for the secondary master, using a web browser. If you can connect via localhost but not from another host, check your firewall rules. You can see the web UI for each of the RegionServers at port 16030 of their IP addresses, or by clicking their links in the web UI for the Master.

  5. Test what happens when nodes or services disappear.

    With a three-node cluster like you have configured, things will not be very resilient. Still, you can test what happens when the primary Master or a RegionServer disappears, by killing the processes and watching the logs.