hive行转列

时间:2023-03-08 23:44:54
hive行转列

一、问题

hive如何将

a       1,2,3
b 4,7
c 5

转化成为:

a       1
a 2
a 3
b 4
b 7
c 5

二、原始数据

cat row_column.txt
a 1,2,3
b 4,7
c 5

三、解决方案

3.1 遍历每一列

3.1.1 创建表

-- 创建表
create table tmp.row_column
(
col1 string,
col3 string
)
row format delimited fields terminated by '\t'
stored as textfile;
-- 载入数据
load data local inpath '/tmp/row_column.txt' into table row_column;

3.1.2 查看数据:

hive> select * from row_column;
OK
a 1,2,3
b 4,7
c 5

3.1.3 遍历每一列

select col1,name
from tmp.row_column
lateral view explode(split(col3,',')) col3 as name;
---------------------------------------------------------------
Total MapReduce CPU Time Spent: 2 seconds 20 msec
OK
a 1
a 2
a 3
b 4
b 7
c 5

3.2 数组遍历

3.2.1 创建表

create table tmp.row_column_array
(
col1 string,
col3 array<int>
)
row format delimited
fields terminated by '\t'
collection items terminated by ','
stored as textfile;

3.2.2 加载数据

load data local inpath '/tmp/row_column.txt' into table tmp.row_column_array;

3.2.3 查看数据

hive> select * from tmp.row_column_array;
OK
a [1,2,3]
b [4,7]
c []

3.2.4 查看每一列

select col1,name
from tmp.row_column_array
lateral view explode(col3) col3 as name;

3.2.5 结果

a       1
a 2
a 3
b 4
b 7
c 5

四、补充

查看使用逗号分割的列

select t.list[],t.list[],t.list[] from (
select (split(col3,',')) list from tmp.row_column)t;
Total MapReduce CPU Time Spent: 1 seconds 740 msec
OK
1 2 3
4 7 NULL
5 NULL NULL
Time taken: 15.264 seconds, Fetched: 3 row(s)

查看长度

select col1, size(split(col3,',')) list from tmp.row_column;
Total MapReduce CPU Time Spent: 1 seconds 690 msec
OK
a 3
b 2
c 1