也是前几天才知道,hive这玩意可以和hbase整合,就在网上找了些资料,准备用我的单节点试一试,单节点上面安装的apache版本的hbase和hive,导致整合的时候不兼容,一直报错:
org.apache.hadoop.hbase.HTableDescriptor.addFamily
找不到addFamily方法,所以只有用cdh的版本了(chd有对应的安装包的版本,兼容好)
已经安装hbase :http://blog.csdn.net/qq_20641565/article/details/54410271
已经安装hive :http://blog.csdn.net/qq_20641565/article/details/55211393
hbase安装在cdhnode3 (master),cdhnode4 ,cdhnode5
hive客户端在cdhnode5
- 1.进入hive的和hbase的shell客户端
[hadoop@cdhnode3 ~]$ ./app/hbase-1.0.0-cdh5.4.5/bin/hbase shell
[hadoop@cdhnode5 ~]$ hive
- 2.在hive下面创建
CREATE TABLE hbase01(key string, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name")
TBLPROPERTIES ("hbase.table.name" = "lijie01");
这里是在hive中创建一个hbase01的映射表
“hbase.columns.mapping” = “:key,cf1:name”:这里是字段映射
“hbase.table.name” = “lijie01”:这里是hbase的表映射(自动创建)
- 3.查看hbase的表
这里多了一个list01
hbase(main):006:0> list
TABLE
lijie
lijie01
2 row(s) in 0.0160 seconds
=> ["lijie", "lijie01"]
- 4.创建一个hive临时并插入数据
#创建临时表
hive> create table lijietemp(
> key string,
> name string
> )
> row format delimited fields terminated by ','
> stored as textfile;
OK
Time taken: 0.292 seconds
#向临时表插入数据
hive> load data local inpath '/home/hadoop/test.txt' overwrite into table lijietemp;
Loading data to table default.lijietemp
Table default.lijietemp stats: [numFiles=1, numRows=0, totalSize=35, rawDataSize=0]
OK
Time taken: 0.38 seconds
- 5.向hive中的hbase映射表hbase01中插入数据
hive> insert into hbase01 select * from lijietemp;
- 6.查看hive中的数据,以及hbase中的数据(数据应该一致)
hive数据查看:
hive> select * from hbase01;
OK
1001 lijie
1002 zhangsan
1003 lisi
Time taken: 0.175 seconds, Fetched: 3 row(s)
hbase数据查看:
hbase(main):007:0> scan 'lijie01'
ROW COLUMN+CELL
1001 column=cf1:name, timestamp=1487149773218, value=lijie
1002 column=cf1:name, timestamp=1487149773218, value=zhangsan
1003 column=cf1:name, timestamp=1487149773218, value=lisi
3 row(s) in 0.4880 seconds
- 7.利用hbase向表中添加数据,查看hive中的数据是否一致
hbase表中插入数据:
hbase(main):008:0> put 'lijie01','1004','cf1:name','hbaseputdata'
0 row(s) in 0.2580 seconds
查看hive表中是否一致(发现也多了上面添加的数据):
hive> select * from hbase01;
OK
1001 lijie
1002 zhangsan
1003 lisi
1004 hbaseputdata
Time taken: 0.124 seconds, Fetched: 4 row(s)
- 8.hive映射hbase中已经存在的表,先在hbase中创建一个表,并且插入数据
建表:
hbase(main):010:0> create 'lijie02','cf1','cf2'
0 row(s) in 0.4530 seconds
=> Hbase::Table - lijie02
插入一条测试数据:
hbase(main):011:0> put 'lijie02','0000001','cf1:name','lijie'
0 row(s) in 0.0820 seconds
hbase(main):012:0> put 'lijie02','0000001','cf2:age','24'
0 row(s) in 0.0640 seconds
- 9.在hive中创建一个外部映射表(创建内部表会报错:hbase已经存在这个表)
hive> CREATE EXTERNAL TABLE hbase02(key string, name string, age int)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name,cf2:age")
> TBLPROPERTIES ("hbase.table.name" = "lijie02");
OK
Time taken: 0.204 seconds
- 10.验证是否映射成功
hive> select * from hbase02;
OK
0000001 lijie 24
Time taken: 0.144 seconds, Fetched: 1 row(s)
发现有hbase中的数据,说明整合成功
- 11.测试向hive中添加数据
再创建一个hive临时表
hive>
> create table lijietemp1(
> key string,
> name string,
> age int
> )
> row format delimited fields terminated by ','
> stored as textfile;
添加数据
hive> load data local inpath '/home/hadoop/test.txt' overwrite into table lijietemp1;
Loading data to table default.lijietemp1
Table default.lijietemp1 stats: [numFiles=1, numRows=0, totalSize=44, rawDataSize=0]
OK
Time taken: 0.271 seconds
向hive中插入数据
hive> insert into hbase02 select * from lijietemp1;
- 11.查看hive表中的数据和hbase的数据是否同步
hive表查询结果:
hive> select * from hbase02;
OK
0000001 lijie 24
1001 lijie 24
1002 zhangsan 25
1003 lisi 26
Time taken: 0.097 seconds, Fetched: 4 row(s)
hbase表查询结果:
hbase(main):014:0> scan 'lijie02'
ROW COLUMN+CELL
0000001 column=cf1:name, timestamp=1487151262045, value=lijie
0000001 column=cf2:age, timestamp=1487151262045, value=24
1001 column=cf1:name, timestamp=1487151621148, value=lijie
1001 column=cf2:age, timestamp=1487151621148, value=24
1002 column=cf1:name, timestamp=1487151621148, value=zhangsan
1002 column=cf2:age, timestamp=1487151621148, value=25
1003 column=cf1:name, timestamp=1487151621148, value=lisi
1003 column=cf2:age, timestamp=1487151621148, value=26
4 row(s) in 0.0350 seconds
结果hbase表的数据和hive表中的数据一致
注意:
如果是创建的hive内部表,删除了hive中的表,hbase中的表也会被删除。
如果是创建的hive外部表,删除了hive中的表,hbase表的数据没影响。