Hive只有有限的索引功能。没有关系型数据库中键的概念。
8.1 创建索引
对如下管理分区表建立索引
CREATE TABLE employees (
name STRING,
salary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (country STRING, state STRING);
CREATE INDEX employees_index
ON TABLE employees (country)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD
IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time')
IN TABLE employees_index_table
PARTITIONED BY (country, name)
COMMENT 'Employees indexed by country and name.';
AS ...
指定了索引处理器(一个实现了索引接口的Java类),Hive本身提供了一些典型的索引实现。
Bitmap索引
bitmap
创建索引,用于排重后值较少的列。如:
CREATE INDEX employees_index
ON TABLE employees (country)
AS 'BITMAP'
WITH DEFERRED REBUILD
IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time')
IN TABLE employees_index_table
PARTITIONED BY (country, name)
COMMENT 'Employees indexed by country and name.';
8.2 重建索引
若指定了WITH DEFERRED REBUILD
,新索引将是空的。随时可以进行第一次索引创建或者使用ALTER INDEX
重建索引。
ALTER INDEX employees_index
ON TABLE employees
PARTITION (country = 'US')
REBUILD;
若省略掉PARTITION
将会对所有分区进行索引重建。
8.3 显示索引
SHOW FORMATTED INDEX ON employees;
FORMATTED
是可选的。增加这个关键字可以使输出中包含有列名。可以替换为INDEX
或INDEXES
,可以输出更多的索引信息。
8.4 删除索引
DROP INDEX IF EXISTS employees_index ON TABLE employees;
删除索引将会删除索引表。
Hive不允许使用DROP TABLES
删除索引表。
如果被索引的表被删除了,索引和索引表也会被删除。
如果表的某个分区被删除了,那么这个分区对应的分区索引也会同时被删除掉。