hive 创建表的三种方式

时间:2025-01-13 17:10:55

hive 创建表的三种方式

官网地址:/confluence/display/Hive/LanguageManual+DDL

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY '' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
 
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];
 
data_type
  : primitive_type
  | array_type
  | map_type
  | struct_type
  | union_type  -- (Note: Available in Hive 0.7.0 and later)
 
primitive_type
  : TINYINT
  | SMALLINT
  | INT
  | BIGINT
  | BOOLEAN
  | FLOAT
  | DOUBLE
  | STRING
  | BINARY      -- (Note: Available in Hive 0.8.0 and later)
  | TIMESTAMP   -- (Note: Available in Hive 0.8.0 and later)
  | DECIMAL     -- (Note: Available in Hive 0.11.0 and later)
  | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)
  | DATE        -- (Note: Available in Hive 0.12.0 and later)
  | VARCHAR     -- (Note: Available in Hive 0.12.0 and later)
  | CHAR        -- (Note: Available in Hive 0.13.0 and later)
 
array_type
  : ARRAY < data_type >
 
map_type
  : MAP < primitive_type, data_type >
 
struct_type
  : STRUCT < col_name : data_type [COMMENT col_comment], ...>
 
union_type
   : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0 and later)
 
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
 
file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on  configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname


准备数据为:
175.42.93.145	[25/Sep/2013:00:10:08 +0800]	"GET /mapreduce/hadoop-rumen-introduction HTTP/1.1"	301	427
61.135.216.104	[25/Sep/2013:00:10:10 +0800]	"GET /search-engine/thrift-framework-intro/feed/ HTTP/1.1"	304	160
175.42.93.145	[25/Sep/2013:00:10:11 +0800]	"GET /mapreduce/hadoop-rumen-introduction HTTP/1.1"	301	427
175.42.93.145	[25/Sep/2013:00:10:12 +0800]	"GET /mapreduce/hadoop-rumen-introduction/ HTTP/1.1"	200	20875


1、第一种方式

CREATE TABLE IF NOT EXISTS (
ip string ,
time string ,
req_url string ,
status string ,
size string 
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;


hive> CREATE TABLE IF NOT EXISTS (
    > ip string ,
    > time string ,
    > req_url string ,
    > status string ,
    > size string 
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;
OK
Time taken: 33.812 seconds


加载数据:

hive> load data local inpath '/opt/hive-0.13.1/' into table  ;
Copying data from file:/opt/hive-0.13.1/
Copying file: file:/opt/hive-0.13.1/
Loading data to table 
Table  stats: [numFiles=1, numRows=0, totalSize=425, rawDataSize=0]
OK
Time taken: 8.374 seconds


查看数据:

hive> select * from ;
OK
175.42.93.145	[25/Sep/2013:00:10:08 +0800]	"GET /mapreduce/hadoop-rumen-introduction HTTP/1.1"	301	427
61.135.216.104	[25/Sep/2013:00:10:10 +0800]	"GET /search-engine/thrift-framework-intro/feed/ HTTP/1.1"	304  160
175.42.93.145	[25/Sep/2013:00:10:11 +0800]	"GET /mapreduce/hadoop-rumen-introduction HTTP/1.1"	301	427
175.42.93.145	[25/Sep/2013:00:10:12 +0800]	"GET /mapreduce/hadoop-rumen-introduction/ HTTP/1.1"	200	20875
Time taken: 2.6 seconds, Fetched: 4 row(s)


2、第二种方式

CREATE TABLE default.weblog_comm
AS select ip, time, req_url from ; 


hive> CREATE TABLE default.weblog_comm
    > AS select ip, time, req_url from ; 
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1445142802171_0005, Tracking URL = N/A
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1445142802171_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-10-18 08:54:38,271 Stage-1 map = 0%,  reduce = 0%
2015-10-18 08:55:39,255 Stage-1 map = 0%,  reduce = 0%
2015-10-18 08:56:22,875 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.87 sec
MapReduce Total cumulative CPU time: 5 seconds 870 msec
Ended Job = job_1445142802171_0005
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://cluster/tmp/hive-root/hive_2015-10-18_08-52-50_940_3759961352595656019-1/-ext-10001
Moving data to: hdfs://cluster/user/hive/warehouse/weblog_comm
Table default.weblog_comm stats: [numFiles=1, numRows=0, totalSize=389, rawDataSize=0]
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 6.46 sec   HDFS Read: 631 HDFS Write: 473 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 460 msec
OK
Time taken: 232.372 seconds


查看表结构:

hive> desc default.weblog_comm;
OK
ip                  	string              	                    
time                	string              	                    
req_url             	string              	                    
Time taken: 0.56 seconds, Fetched: 3 row(s)


3、第三种方式

CREATE TABLE IF NOT EXISTS default.weblog_20150923
    LIKE  ;


hive> CREATE TABLE IF NOT EXISTS default.weblog_20150923
    > LIKE  ;
OK
Time taken: 0.585 seconds


hive> select * from weblog_20150923;
OK
Time taken: 0.441 seconds


hive> desc weblog_20150923;
OK
ip                  	string              	                    
time                	string              	                    
req_url             	string              	                    
status              	string              	                    
size                	string              	                    
Time taken: 0.563 seconds, Fetched: 5 row(s)