Hive之collect_list/collect_set函数

时间:2024-10-19 11:54:23

Hive中collect相关的函数有 collect_listcollect_set

它们都是将分组中的某列转为一个数组返回,不同的是 collect_list 不去重而 collect_set 去重。

例子:

hive中一张测试表 dual

col1	col2
A	1
B	2
A	3
B	4
C	5
A	3

hive>select col1, collect_list(col2) from dual group by col1;

运行结果:
A	('1','3','3')
B	('2','4')
C	('5')

hive>select col1, collect_set(col2) from dual group by col1;

运行结果:
A	('1','3')
B	('2','4')
C	('5')

结论: collect_list 不去重,而 collect_set 去重