hadoop笔记之Hive的数据存储(分区表)

Hive的数据存储(分区表)

Partition对应于数据库的Partition列的密集索引
在Hive中，表中的一个Partition对应于表下的一个目录，所有的Partition的数据都存储在对应的目录中

那么如果我们要查询男性的身高，那么只需要扫描gender=’M’的分区就好了

○如何建立一张基于性别的分区表

create table partition_table
(sid int,sname string)
partitioned by (gender string)
row format delimited fields terminated by ',';

hadoop笔记之Hive的数据存储(分区表)

Partition Information指明分区信息

创建分区条件为gender=’M’的子目录

insert into table partition_table partition(gender='M') select sid,sname from sample data where gender='M';

同样对女学生进行分区

insert into table partition_table partition(gender='F') select sid,sname from sample data where gender='F';

进入网页中查看，/user/hive/warehouse下多了个partition_table的目录，目录下有gender=F和gender=M两个分区

(我们可以使用explain select * from sample_data where gender='M';和explain select * from partition_table where gender='M';来查看对比执行计划)

明显使用分区表的数据要比sample数据查询计划要少，执行速度也更快

秒客网