I want to create hive query for the following.
我想为以下内容创建配置单元查询。
insert into tempTableName
select distinct col_a
, first_value(col_b)
over (partition by col_a
order by nvl(col_c,0) desc, length(col_b) asc, col_b asc)
from tableA
As hive does not support first value.I want to know what could be the equivalent in simple query for first_value function. Any suggestions ??
由于配置单元不支持第一个值。我想知道在first_value函数的简单查询中可能是什么等价物。有什么建议么 ??
2 个解决方案
#1
2
I am not exactly familiar with the oracle semantics here, but isn't this just a group by and arg-min? Structs in hive compare in the order of their fields, so you can do something like this:
我对这里的oracle语义并不完全熟悉,但这不仅仅是和arg-min的一组吗? hive中的结构按其字段的顺序进行比较,因此您可以执行以下操作:
select col_a,
min(
named_struct(
'col_c', -coalesce(col_c, 0),
'len' , length(col_b),
'col_b', col_b
)
).col_b
from tableA
group by col_a
#2
1
HIVE 0.11
does support FIRST_VALUE
.
HIVE 0.11确实支持FIRST_VALUE。
But as per HIVE JIRA, there's an open issue that you cannot have more than one ORDER BY
column in first_value
. You haven't reported what error you are getting, but if it's FAILED: SemanticException Range based Window Frame can have only 1 Sort Key
, then you have to modify the ORDER BY
columns.
但是根据HIVE JIRA,有一个公开的问题,你在first_value中不能有多个ORDER BY列。你还没有报告你得到了什么错误,但是如果它失败了:SemanticException基于范围的Window Frame只能有1个Sort Key,那么你必须修改ORDER BY列。
Edit: If you are not on HIVE 0.11
, then I would suggest installing a UDF
for FIRST_VALUE
. I guess that would be the straightforward way to do this. You might want to take a look at these UDFS.
编辑:如果您没有使用HIVE 0.11,那么我建议为FIRST_VALUE安装UDF。我想这将是直截了当的方式。您可能想看看这些UDFS。
#1
2
I am not exactly familiar with the oracle semantics here, but isn't this just a group by and arg-min? Structs in hive compare in the order of their fields, so you can do something like this:
我对这里的oracle语义并不完全熟悉,但这不仅仅是和arg-min的一组吗? hive中的结构按其字段的顺序进行比较,因此您可以执行以下操作:
select col_a,
min(
named_struct(
'col_c', -coalesce(col_c, 0),
'len' , length(col_b),
'col_b', col_b
)
).col_b
from tableA
group by col_a
#2
1
HIVE 0.11
does support FIRST_VALUE
.
HIVE 0.11确实支持FIRST_VALUE。
But as per HIVE JIRA, there's an open issue that you cannot have more than one ORDER BY
column in first_value
. You haven't reported what error you are getting, but if it's FAILED: SemanticException Range based Window Frame can have only 1 Sort Key
, then you have to modify the ORDER BY
columns.
但是根据HIVE JIRA,有一个公开的问题,你在first_value中不能有多个ORDER BY列。你还没有报告你得到了什么错误,但是如果它失败了:SemanticException基于范围的Window Frame只能有1个Sort Key,那么你必须修改ORDER BY列。
Edit: If you are not on HIVE 0.11
, then I would suggest installing a UDF
for FIRST_VALUE
. I guess that would be the straightforward way to do this. You might want to take a look at these UDFS.
编辑:如果您没有使用HIVE 0.11,那么我建议为FIRST_VALUE安装UDF。我想这将是直截了当的方式。您可能想看看这些UDFS。