Hive函数:SUM,AVG,MIN,MAX

时间:2022-02-13 17:16:59

转自:http://lxw1234.com/archives/2015/04/176.htm,Hive分析窗口函数(一) SUM,AVG,MIN,MAX

之前看到大数据田地有关于max()over(partition by)的用法,今天恰好工作中用到了它,但是使用中遇到了一个问题:在max(rsrp)over(partition by buildingid,height) as max_rsrp返回的结果不是分组中的最大值。最中找到了问题的原因:max_rsrp数据类型为string而不是double类型,导致的一个bug问题。

再处理的过程中也再次把大数据田地的中关于sum,avg,max,min的函数用法做了demo,因此有了该参考后的文章。

数据准备:

echo ''>data_file.txt
vim data_file.txt
cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4
cookie2,2015-04-10,6
cookie2,2015-04-11,5
cookie2,2015-04-12,7
cookie2,2015-04-13,4
cookie2,2015-04-14,3
cookie2,2015-04-15,5
cookie2,2015-04-16,5
hadoop fs -rm -r /user/jrf/test_data
hadoop fs -mkdir /user/jrf/test_data
hadoop fs -copyFromLocal data_file.txt /user/jrf/test_data/
drop table if exists test_data;
create EXTERNAL TABLE test_data (
cookieid string,
createtime string, --day
pv INT
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile location '/user/jrf/test_data/';
select * from test_data;
+---------------------+-----------------------+---------------+--+
| test_data.cookieid | test_data.createtime | test_data.pv |
+---------------------+-----------------------+---------------+--+
| cookie1 | 2015-04-10 | 1 |
| cookie1 | 2015-04-11 | 5 |
| cookie1 | 2015-04-12 | 7 |
| cookie1 | 2015-04-13 | 3 |
| cookie1 | 2015-04-14 | 2 |
| cookie1 | 2015-04-15 | 4 |
| cookie1 | 2015-04-16 | 4 |
| cookie2 | 2015-04-10 | 6 |
| cookie2 | 2015-04-11 | 5 |
| cookie2 | 2015-04-12 | 7 |
| cookie2 | 2015-04-13 | 4 |
| cookie2 | 2015-04-14 | 3 |
| cookie2 | 2015-04-15 | 5 |
| cookie2 | 2015-04-16 | 5 |
+---------------------+-----------------------+---------------+--+

SUM — 注意,结果和ORDER BY相关,默认为升序

SELECT cookieid,createtime,pv,
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
SUM(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 26 | 1 | 6 | 26 |
| cookie1 | 2015-04-11 | 5 | 6 | 6 | 26 | 6 | 13 | 25 |
| cookie1 | 2015-04-12 | 7 | 13 | 13 | 26 | 13 | 16 | 20 |
| cookie1 | 2015-04-13 | 3 | 16 | 16 | 26 | 16 | 18 | 13 |
| cookie1 | 2015-04-14 | 2 | 18 | 18 | 26 | 17 | 21 | 10 |
| cookie1 | 2015-04-15 | 4 | 22 | 22 | 26 | 16 | 20 | 8 |
| cookie1 | 2015-04-16 | 4 | 26 | 26 | 26 | 13 | 13 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 35 | 6 | 11 | 35 |
| cookie2 | 2015-04-11 | 5 | 11 | 11 | 35 | 11 | 18 | 29 |
| cookie2 | 2015-04-12 | 7 | 18 | 18 | 35 | 18 | 22 | 24 |
| cookie2 | 2015-04-13 | 4 | 22 | 22 | 35 | 22 | 25 | 17 |
| cookie2 | 2015-04-14 | 3 | 25 | 25 | 35 | 19 | 24 | 13 |
| cookie2 | 2015-04-15 | 5 | 30 | 30 | 35 | 19 | 24 | 10 |
| cookie2 | 2015-04-16 | 5 | 35 | 35 | 35 | 17 | 17 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
pv2: 同pv1
pv3: 分组内(cookie1)所有的pv累加
pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点

–其他AVG,MIN,MAX,和SUM用法一样。

--AVG
SELECT cookieid,createtime,pv,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
AVG(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
| cookie1 | 2015-04-10 | 1 | 1.0 | 1.0 | 3.7142857142857144 | 1.0 | 3.0 | 3.7142857142857144 |
| cookie1 | 2015-04-11 | 5 | 3.0 | 3.0 | 3.7142857142857144 | 3.0 | 4.333333333333333 | 4.166666666666667 |
| cookie1 | 2015-04-12 | 7 | 4.333333333333333 | 4.333333333333333 | 3.7142857142857144 | 4.333333333333333 | 4.0 | 4.0 |
| cookie1 | 2015-04-13 | 3 | 4.0 | 4.0 | 3.7142857142857144 | 4.0 | 3.6 | 3.25 |
| cookie1 | 2015-04-14 | 2 | 3.6 | 3.6 | 3.7142857142857144 | 4.25 | 4.2 | 3.3333333333333335 |
| cookie1 | 2015-04-15 | 4 | 3.6666666666666665 | 3.6666666666666665 | 3.7142857142857144 | 4.0 | 4.0 | 4.0 |
| cookie1 | 2015-04-16 | 4 | 3.7142857142857144 | 3.7142857142857144 | 3.7142857142857144 | 3.25 | 3.25 | 4.0 |
| cookie2 | 2015-04-10 | 6 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 5.0 |
| cookie2 | 2015-04-11 | 5 | 5.5 | 5.5 | 5.0 | 5.5 | 6.0 | 4.833333333333333 |
| cookie2 | 2015-04-12 | 7 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 4.8 |
| cookie2 | 2015-04-13 | 4 | 5.5 | 5.5 | 5.0 | 5.5 | 5.0 | 4.25 |
| cookie2 | 2015-04-14 | 3 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 4.333333333333333 |
| cookie2 | 2015-04-15 | 5 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 5.0 |
| cookie2 | 2015-04-16 | 5 | 5.0 | 5.0 | 5.0 | 4.25 | 4.25 | 5.0 |
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
--MIN
SELECT cookieid,createtime,pv,
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2,--从起点到当前行,结果同pv1
MIN(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| cookie1 | 2015-04-11 | 5 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-12 | 7 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-13 | 3 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-14 | 2 | 1 | 1 | 1 | 2 | 2 | 2 |
| cookie1 | 2015-04-15 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
| cookie1 | 2015-04-16 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 3 | 6 | 5 | 3 |
| cookie2 | 2015-04-11 | 5 | 5 | 5 | 3 | 5 | 5 | 3 |
| cookie2 | 2015-04-12 | 7 | 5 | 5 | 3 | 5 | 4 | 3 |
| cookie2 | 2015-04-13 | 4 | 4 | 4 | 3 | 4 | 3 | 3 |
| cookie2 | 2015-04-14 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| cookie2 | 2015-04-15 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
| cookie2 | 2015-04-16 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
--MAX
SELECT cookieid,createtime,pv,
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 7 | 1 | 5 | 7 |
| cookie1 | 2015-04-11 | 5 | 5 | 5 | 7 | 5 | 7 | 7 |
| cookie1 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| cookie1 | 2015-04-13 | 3 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-14 | 2 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-15 | 4 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-16 | 4 | 7 | 7 | 7 | 4 | 4 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 7 | 6 | 6 | 7 |
| cookie2 | 2015-04-11 | 5 | 6 | 6 | 7 | 6 | 7 | 7 |
| cookie2 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| cookie2 | 2015-04-13 | 4 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-14 | 3 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-15 | 5 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-16 | 5 | 7 | 7 | 7 | 5 | 5 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+ SELECT cookieid,
createtime,
pv,
min(pv) OVER(PARTITION BY cookieid) AS min_pv,
max(pv) OVER(PARTITION BY cookieid) AS max_pv
FROM test_data;
+-----------+-------------+-----+---------+---------+--+
| cookieid | createtime | pv | min_pv | max_pv |
+-----------+-------------+-----+---------+---------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 7 |
| cookie1 | 2015-04-16 | 4 | 1 | 7 |
| cookie1 | 2015-04-15 | 4 | 1 | 7 |
| cookie1 | 2015-04-14 | 2 | 1 | 7 |
| cookie1 | 2015-04-13 | 3 | 1 | 7 |
| cookie1 | 2015-04-12 | 7 | 1 | 7 |
| cookie1 | 2015-04-11 | 5 | 1 | 7 |
| cookie2 | 2015-04-16 | 5 | 3 | 7 |
| cookie2 | 2015-04-15 | 5 | 3 | 7 |
| cookie2 | 2015-04-14 | 3 | 3 | 7 |
| cookie2 | 2015-04-13 | 4 | 3 | 7 |
| cookie2 | 2015-04-12 | 7 | 3 | 7 |
| cookie2 | 2015-04-11 | 5 | 3 | 7 |
| cookie2 | 2015-04-10 | 6 | 3 | 7 |
+-----------+-------------+-----+---------+---------+--+

Hive函数:SUM,AVG,MIN,MAX的更多相关文章

  1. Hive分析窗口函数(一) SUM,AVG,MIN,MAX

    Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive中提供了越来越多的分析函数,用于完成负责的统计分析.抽时间将所有的分析窗 ...

  2. Hive学习之路 (十三)Hive分析窗口函数(一) SUM,AVG,MIN,MAX

    数据准备 数据格式 cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, 创建数据库及表 create datab ...

  3. C# 中奇妙的函数–6. 五个序列聚合运算(Sum, Average, Min, Max,Aggregate)

    今天,我们将着眼于五个用于序列的聚合运算.很多时候当我们在对序列进行操作时,我们想要做基于这些序列执行某种汇总然后,计算结果. Enumerable 静态类的LINQ扩展方法可以做到这一点 .就像之前 ...

  4. SQL模糊查询,sum,AVG,MAX,min函数

    cmd mysql -hlocalhost -uroot -p select * from emp where ename like '___' -- 三个横线, - 代表字符,可以查询 三个enam ...

  5. 三、函数 (SUM、MIN、MAX、COUNT、AVG)

    第八章 使用数据处理函数 8.1 函数 SQL支持利用函数来处理数据.函数一般是在数据上执行的,给数据的转换和处理提供了方便. 每一个DBMS都有特定的函数.只有少数几个函数被所有主要的DBMS等同的 ...

  6. LINQ to SQL Count/Sum/Min/Max/Avg Join

    public class Linq { MXSICEDataContext Db = new MXSICEDataContext(); // LINQ to SQL // Count/Sum/Min/ ...

  7. LINQ to SQL 语句(3) 之 Count/Sum/Min/Max/Avg

    LINQ  to SQL 语句(3) 之  Count/Sum/Min/Max/Avg [1] Count/Sum 讲解 [2] Min 讲解 [3] Max 讲解 [4] Average 和 Agg ...

  8. [转]LINQ语句之Select/Distinct和Count/Sum/Min/Max/Avg

    在讲述了LINQ,顺便说了一下Where操作,这篇开始我们继续说LINQ语句,目的让大家从语句的角度了解LINQ,LINQ包括LINQ to Objects.LINQ to DataSets.LINQ ...

  9. Linq to SQL -- Select、Distinct和Count、Sum、Min、Max、Avg

    Select/Distinct操作符 适用场景:o(∩_∩)o… 查询呗. 说明:和SQL命令中的select作用相似但位置不同,查询表达式中的select及所接子句是放在表达式最后并把子句中的变量也 ...

随机推荐

  1. 多线程Server client

    项目结构 项目设计 客户端同时大量请求服务端,服务端多线程处理连接,并发序列化获得客户端发送的数据,并做出处理. IClients package simple.socket; import java ...

  2. [MySQL FAQ]系列 — 为什么InnoDB表要建议用自增列做主键

    我们先了解下InnoDB引擎表的一些关键特征: InnoDB引擎表是基于B+树的索引组织表(IOT): 每个表都需要有一个聚集索引(clustered index): 所有的行记录都存储在B+树的叶子 ...

  3. Beautyacticle

    Original: https://github.com/rizhilee/Beautyacticle Backup: https://github.com/eltld/Beautyacticle

  4. JavaScript之JS实现动画效果

    在前面的随笔中介绍了如何用DOM技术修改文档的央样式信息,用JavaScript添加样式信息可以节约我们的时间和精力,但总的来说,CSS仍是完成这类任务的最佳工具.但是有一个应用领域是目前的CSS无能 ...

  5. MyBatis动态代理

    一.项目结构 二.代码实现 import java.util.List; import java.util.Map; import com.jmu.bean.Student; public inter ...

  6. Redis开机自启动

    1.新建redis文件 vi /etc/init.d/redis 2.修改/tec/init.d/redis文件,要修改Redis安装目录及配置文件路径 EXEC=/usr/local/redis/b ...

  7. [Android]Android布局优化之 ViewStub

    转载请标注:转载于http://www.cnblogs.com/Liuyt-61/p/6602926.html -------------------------------------------- ...

  8. angular记录

    1. <h1>{{title}}</h1> 双花括号语法是 Angular 的插值绑定语法. 这个插值绑定的意思是把组件的 title 属性的值绑定到 HTML 中的 h1 标 ...

  9. java中的 public protected friendly private

    1.public:public表明该数据成员.成员函数是对所有用户开放的,所有用户都可以直接进行调用 2.private:private表示私有,私有的意思就是除了class自己之外,任何人都不可以直 ...

  10. 20145233《网络对抗》Exp6 信息收集和漏洞扫描

    20145233<网络对抗>Exp6 信息收集和漏洞扫描 实验问题思考 哪些组织负责DNS,IP的管理 全球根服务器均由美国*授权的ICANN统一管理,负责DNS和IP地址管理.全球一共 ...