前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Hive窗口函数01-SUM、MIN、MAX、AVG

Hive窗口函数01-SUM、MIN、MAX、AVG

作者头像
CoderJed
发布2018-09-13 10:59:52
2.5K0
发布2018-09-13 10:59:52
举报
文章被收录于专栏:Jed的技术阶梯

Hive窗口函数sum、min、max、avg入门

1. 数据说明

现有 hive 表 cookie1, 内容如下:

代码语言:javascript
复制
hive> select * from cookie1;
cookie1 2015-04-10  1
cookie1 2015-04-11  5
cookie1 2015-04-12  7
cookie1 2015-04-13  3
cookie1 2015-04-14  2
cookie1 2015-04-15  4
cookie1 2015-04-16  4
  • 其中字段意义: cookieid(string), createtime(string), pv(int)
  • 分别代表: cookieid, 创建时间, 该cookieid当天的pv值

2. sum()操作

(1) 对组内的某个字段的全部数据进行累加,例如,求cookie1的总pv值

代码语言:javascript
复制
select cookieid, createtime, pv, 
sum(pv) over(partition by cookieid) as sum1 
from cookie1; 

结果:
cookie1 2015-04-16  4   26
cookie1 2015-04-15  4   26
cookie1 2015-04-14  2   26
cookie1 2015-04-13  3   26
cookie1 2015-04-12  7   26
cookie1 2015-04-11  5   26
cookie1 2015-04-10  1   26

partition by cookieid : 按照cookieid分组

(2) 从第一行开始累加到当前行

代码语言:javascript
复制
select cookieid, createtime, pv, 
sum(pv) over(partition by cookieid order by createtime) as sum2
from cookie1;

# 结果
cookie1 2015-04-10  1   1
cookie1 2015-04-11  5   6
cookie1 2015-04-12  7   13
cookie1 2015-04-13  3   16
cookie1 2015-04-14  2   18
cookie1 2015-04-15  4   22
cookie1 2015-04-16  4   26

以上SQL语句是下面这个SQL语句的简写:
select 
    cookieid,
    createtime,
    pv, 
    sum(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as sum3
from cookie1;

order by : 在同一个组内,先累加完相同createtime的pv,再累加其他createtime的pv, 比如 : 现在在表末尾加一条数据cookie1 2015-04-10 1 1,那么结果就是 :

代码语言:javascript
复制
cookie1 2015-04-10  1   1
cookie1 2015-04-10  1   2
cookie1 2015-04-11  5   7
cookie1 2015-04-12  7   14
cookie1 2015-04-13  3   17
cookie1 2015-04-14  2   19
cookie1 2015-04-15  4   23
cookie1 2015-04-16  4   27

cookie1 2015-04-10 1 1这两条数据是否相邻无关。

unbounded : 起点 preceding : 往前 following : 往后 between...end... : 指定操作的范围 current row : 当前行,计算到哪一行就是哪一行 between unbounded preceding and current row : 从起点到当前行,往前(对应表中的从上到下的顺序)累加 默认就是从起点到当前行往前累加,所以between unbounded preceding and current row这个条件可以不写。

(3) 计算分组内当前行到剩余所有行的和

代码语言:javascript
复制
select cookieid, createtime, pv, 
sum(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as sum4
from cookie1; 

结果分析 : 
cookie1 2015-04-10  1   26  (1+5+7+3+2+4+4=26)
cookie1 2015-04-11  5   25  (5+7+3+2+4+4=25)
cookie1 2015-04-12  7   20  (7+3+2+4+4=20)
cookie1 2015-04-13  3   13  (3+2+4+4=13)
cookie1 2015-04-14  2   10  (2+4+4=10)
cookie1 2015-04-15  4   8   (4+4=8)
cookie1 2015-04-16  4   4   (4)

(4) 指定计算的范围

代码语言:javascript
复制
select cookieid, createtime, pv, 
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as sum5
from cookie1; 

结果分析 : 
cookie1 2015-04-10  1   1   (这一行前面没有记录,只能取本身的值)
cookie1 2015-04-11  5   6   (这一行前面有1,不够取3行的值,只能取一行与本身相加,取5+1=6)
cookie1 2015-04-12  7   13  (这一行前面有1,5,不够取3行的值,只能取两行与本身相加,取7+5+1=13)
cookie1 2015-04-13  3   16  (这一行前面有3行可以取,依次往回取三行,取3+7+5+1=16)
cookie1 2015-04-14  2   17  (同理,取2+3+7+5=17)
cookie1 2015-04-15  4   16  (同理,取4+2+3+7=16)
cookie1 2015-04-16  4   13  (同理,取4+4+2+3=13)

select cookieid, createtime, pv, 
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as sum6
from cookie1; 

结果分析 : 
cookie1 2015-04-10  1   6   (这一行前面没有记录,后面可以取一条记录,1+5=6)
cookie1 2015-04-11  5   13  (这一行前面只能取一条记录,后面可以取一条记录,1+5+7=13)
cookie1 2015-04-12  7   16  (这一行前面只能取两条记录,后面可以取一条记录,1+5+7+3=16)
cookie1 2015-04-13  3   18  (这一行前面可以取三条记录,后面可以取一条记录,1+5+7+3+2=18)
cookie1 2015-04-14  2   21  (同理,5+7+3+2+4=21) 
cookie1 2015-04-15  4   20  (同理,7+3+2+4+4=20)
cookie1 2015-04-16  4   13  (这一行前面可以取三条记录,后面没有记录,3+2+4+4=13)

3. avg()操作

sum、avg、max、min这些窗口函数的语法都是一样的,以下用一个SQL语句来演示 :

代码语言:javascript
复制
select cookieid,createtime,pv,
avg(pv) over(partition by cookieid) as avg1,
avg(pv) over(partition by cookieid order by createtime) as avg2,
avg(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as avg3,
avg(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as avg4,
avg(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as avg5,                
avg(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as avg6
from cookie1;

结果 : 
cookie1 2015-04-10  1       3.7142857142857144      1.0                 1.0                 3.7142857142857144      1.0                 3.0
cookie1 2015-04-11  5       3.7142857142857144      3.0                 3.0                 4.166666666666667       3.0                 4.333333333333333
cookie1 2015-04-12  7       3.7142857142857144      4.333333333333333   4.333333333333333   4.0                     4.333333333333333   4.0
cookie1 2015-04-13  3       3.7142857142857144      4.0                 4.0                 3.25                    4.0                 3.6
cookie1 2015-04-14  2       3.7142857142857144      3.6                 3.6                 3.3333333333333335      4.25                4.2
cookie1 2015-04-15  4       3.7142857142857144      3.6666666666666665  3.6666666666666665  4.0                     4.0                 4.0
cookie1 2015-04-16  4       3.7142857142857144      3.7142857142857144  3.7142857142857144  4.0                     3.25                3.25

4. min()操作

代码语言:javascript
复制
select cookieid, createtime, pv,
min(pv) over(partition by cookieid) as min1,
min(pv) over(partition by cookieid order by createtime) as min2, 
min(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as min3,
min(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as min4,
min(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as min5,
min(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as min6
from cookie1;

结果 : 
cookie1 2015-04-10  1   1   1   1   1   1   1
cookie1 2015-04-11  5   1   1   1   2   1   1
cookie1 2015-04-12  7   1   1   1   2   1   1
cookie1 2015-04-13  3   1   1   1   2   1   1
cookie1 2015-04-14  2   1   1   1   2   2   2
cookie1 2015-04-15  4   1   1   1   4   2   2
cookie1 2015-04-16  4   1   1   1   4   2   2

5. max()操作

代码语言:javascript
复制
select cookieid,createtime,pv,
max(pv) over(partition by cookieid) as max1,
max(pv) over(partition by cookieid order by createtime) as max2,
max(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as max3,
max(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as max4,
max(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as max5,
max(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as max6
from cookie1;

结果 : 
cookie1 2015-04-10  1   7   1   1   7   1   5
cookie1 2015-04-11  5   7   5   5   7   5   7
cookie1 2015-04-12  7   7   7   7   7   7   7
cookie1 2015-04-13  3   7   7   7   4   7   7
cookie1 2015-04-14  2   7   7   7   4   7   7
cookie1 2015-04-15  4   7   7   7   4   7   7
cookie1 2015-04-16  4   7   7   7   4   4   4

参考文章 : Hive分析窗口函数(一) SUM,AVG,MIN,MAX

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2017.12.15 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1. 数据说明
  • 2. sum()操作
    • (1) 对组内的某个字段的全部数据进行累加,例如,求cookie1的总pv值
      • (2) 从第一行开始累加到当前行
        • (3) 计算分组内当前行到剩余所有行的和
          • (4) 指定计算的范围
          • 3. avg()操作
          • 4. min()操作
          • 5. max()操作
          领券
          问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档