Hive中的第一个值(Oracle)等价物

时间:2021-11-01 09:11:58

I want to create hive query for the following.

我想为以下内容创建配置单元查询。

insert into  tempTableName  
select distinct col_a
        ,  first_value(col_b)  
            over (partition by col_a 
            order by nvl(col_c,0) desc, length(col_b) asc, col_b asc) 
from tableA

As hive does not support first value.I want to know what could be the equivalent in simple query for first_value function. Any suggestions ??

由于配置单元不支持第一个值。我想知道在first_value函数的简单查询中可能是什么等价物。有什么建议么 ??

2 个解决方案

#1


2  

I am not exactly familiar with the oracle semantics here, but isn't this just a group by and arg-min? Structs in hive compare in the order of their fields, so you can do something like this:

我对这里的oracle语义并不完全熟悉,但这不仅仅是和arg-min的一组吗? hive中的结构按其字段的顺序进行比较,因此您可以执行以下操作:

select col_a,
min(
  named_struct(
    'col_c', -coalesce(col_c, 0),
    'len' , length(col_b),
    'col_b', col_b
  )
).col_b
from tableA
group by col_a

#2


1  

HIVE 0.11 does support FIRST_VALUE.

HIVE 0.11确实支持FIRST_VALUE。

But as per HIVE JIRA, there's an open issue that you cannot have more than one ORDER BY column in first_value. You haven't reported what error you are getting, but if it's FAILED: SemanticException Range based Window Frame can have only 1 Sort Key, then you have to modify the ORDER BY columns.

但是根据HIVE JIRA,有一个公开的问题,你在first_value中不能有多个ORDER BY列。你还没有报告你得到了什么错误,但是如果它失败了:SemanticException基于范围的Window Frame只能有1个Sort Key,那么你必须修改ORDER BY列。

Edit: If you are not on HIVE 0.11, then I would suggest installing a UDF for FIRST_VALUE. I guess that would be the straightforward way to do this. You might want to take a look at these UDFS.

编辑:如果您没有使用HIVE 0.11,那么我建议为FIRST_VALUE安装UDF。我想这将是直截了当的方式。您可能想看看这些UDFS。

#1


2  

I am not exactly familiar with the oracle semantics here, but isn't this just a group by and arg-min? Structs in hive compare in the order of their fields, so you can do something like this:

我对这里的oracle语义并不完全熟悉,但这不仅仅是和arg-min的一组吗? hive中的结构按其字段的顺序进行比较,因此您可以执行以下操作:

select col_a,
min(
  named_struct(
    'col_c', -coalesce(col_c, 0),
    'len' , length(col_b),
    'col_b', col_b
  )
).col_b
from tableA
group by col_a

#2


1  

HIVE 0.11 does support FIRST_VALUE.

HIVE 0.11确实支持FIRST_VALUE。

But as per HIVE JIRA, there's an open issue that you cannot have more than one ORDER BY column in first_value. You haven't reported what error you are getting, but if it's FAILED: SemanticException Range based Window Frame can have only 1 Sort Key, then you have to modify the ORDER BY columns.

但是根据HIVE JIRA,有一个公开的问题,你在first_value中不能有多个ORDER BY列。你还没有报告你得到了什么错误,但是如果它失败了:SemanticException基于范围的Window Frame只能有1个Sort Key,那么你必须修改ORDER BY列。

Edit: If you are not on HIVE 0.11, then I would suggest installing a UDF for FIRST_VALUE. I guess that would be the straightforward way to do this. You might want to take a look at these UDFS.

编辑:如果您没有使用HIVE 0.11,那么我建议为FIRST_VALUE安装UDF。我想这将是直截了当的方式。您可能想看看这些UDFS。