前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Hive窗口函数04-LAG、LEAD、FIRST_VALUE、LAST_VALUE

Hive窗口函数04-LAG、LEAD、FIRST_VALUE、LAST_VALUE

作者头像
CoderJed
发布2018-09-13 10:57:20
1.9K0
发布2018-09-13 10:57:20
举报
文章被收录于专栏:Jed的技术阶梯

Hive窗口函数LAG、LEAD、FIRST_VALUE、LAST_VALUE入门

1. 数据说明

现有 hive 表 cookie4, 内容如下:

代码语言:javascript
复制
hive> select * from cookie4;

cookie4.cookieid    cookie4.createtime          cookie4.url
cookie1             2015-04-10 10:00:02         url2
cookie1             2015-04-10 10:00:00         url1
cookie1             2015-04-10 10:03:04         1url3
cookie1             2015-04-10 10:50:05         url6
cookie1             2015-04-10 11:00:00         url7
cookie1             2015-04-10 10:10:00         url4
cookie1             2015-04-10 10:50:01         url5
cookie2             2015-04-10 10:00:02         url22
cookie2             2015-04-10 10:00:00         url11
cookie2             2015-04-10 10:03:04         1url33
cookie2             2015-04-10 10:50:05         url66
cookie2             2015-04-10 11:00:00         url77
cookie2             2015-04-10 10:10:00         url44
cookie2             2015-04-10 10:50:01         url55
  • 其中字段意义: cookieid(string), createtime(string), url(int)
  • 分别代表: cookieid, 创建时间, 访问的url

2. lag()操作

LAG(col,n,DEFAULT)用于统计窗口内往上第n行值 第一个参数为列名 第二个参数为往上第n行(可选,默认为1) 第三个参数为默认值(当往上第n行为NULL时候,取默认值,如不指定,则为NULL)

代码语言:javascript
复制
--(1)
hive> SELECT cookieid, createtime, url,
    > LAG(createtime, 2) OVER (PARTITION BY cookieid ORDER BY createtime) AS last_2_time 
    > FROM cookie4;
    
结果:因为没有设置默认值,当没有上两行时显示为NULL
cookieid    createtime          url     last_2_time
cookie1     2015-04-10 10:00:00 url1    NULL 
cookie1     2015-04-10 10:00:02 url2    NULL
cookie1     2015-04-10 10:03:04 1url3   2015-04-10 10:00:00
cookie1     2015-04-10 10:10:00 url4    2015-04-10 10:00:02
cookie1     2015-04-10 10:50:01 url5    2015-04-10 10:03:04
cookie1     2015-04-10 10:50:05 url6    2015-04-10 10:10:00
cookie1     2015-04-10 11:00:00 url7    2015-04-10 10:50:01
cookie2     2015-04-10 10:00:00 url11   NULL
cookie2     2015-04-10 10:00:02 url22   NULL
cookie2     2015-04-10 10:03:04 1url33  2015-04-10 10:00:00
cookie2     2015-04-10 10:10:00 url44   2015-04-10 10:00:02
cookie2     2015-04-10 10:50:01 url55   2015-04-10 10:03:04
cookie2     2015-04-10 10:50:05 url66   2015-04-10 10:10:00
cookie2     2015-04-10 11:00:00 url77   2015-04-10 10:50:01

--(2)
hive> SELECT cookieid, createtime, url,
    > LAG(createtime,1,'1970-01-01 00:00:00') OVER(PARTITION BY cookieid ORDER BY createtime) AS last_1_time 
    > FROM cookie4;
    
结果:
cookieid    createtime          url     last_1_time
cookie1     2015-04-10 10:00:00 url1    1970-01-01 00:00:00 (显示默认值)
cookie1     2015-04-10 10:00:02 url2    2015-04-10 10:00:00
cookie1     2015-04-10 10:03:04 1url3   2015-04-10 10:00:02
cookie1     2015-04-10 10:10:00 url4    2015-04-10 10:03:04
cookie1     2015-04-10 10:50:01 url5    2015-04-10 10:10:00
cookie1     2015-04-10 10:50:05 url6    2015-04-10 10:50:01
cookie1     2015-04-10 11:00:00 url7    2015-04-10 10:50:05
cookie2     2015-04-10 10:00:00 url11   1970-01-01 00:00:00 (显示默认值)
cookie2     2015-04-10 10:00:02 url22   2015-04-10 10:00:00
cookie2     2015-04-10 10:03:04 1url33  2015-04-10 10:00:02
cookie2     2015-04-10 10:10:00 url44   2015-04-10 10:03:04
cookie2     2015-04-10 10:50:01 url55   2015-04-10 10:10:00
cookie2     2015-04-10 10:50:05 url66   2015-04-10 10:50:01
cookie2     2015-04-10 11:00:00 url77   2015-04-10 10:50:05

3. lead()操作

lead的作用与lag相反 LEAD(col,n,DEFAULT)用于统计窗口内往下第n行值 第一个参数为列名 第二个参数为往下第n行(可选,默认为1) 第三个参数为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULL)

代码语言:javascript
复制
hive> SELECT cookieid, createtime, url,
    > LEAD(createtime, 2) OVER(PARTITION BY cookieid ORDER BY createtime) AS next_2_time,
    > LEAD(createtime, 1, '1970-01-01 00:00:00') OVER(PARTITION BY cookieid ORDER BY createtime) AS next_1_time
    > FROM cookie4;
    
结果:
cookieid    createtime          url     next_2_time         next_1_time
cookie1     2015-04-10 10:00:00 url1    2015-04-10 10:03:04 2015-04-10 10:00:02
cookie1     2015-04-10 10:00:02 url2    2015-04-10 10:10:00 2015-04-10 10:03:04
cookie1     2015-04-10 10:03:04 1url3   2015-04-10 10:50:01 2015-04-10 10:10:00
cookie1     2015-04-10 10:10:00 url4    2015-04-10 10:50:05 2015-04-10 10:50:01
cookie1     2015-04-10 10:50:01 url5    2015-04-10 11:00:00 2015-04-10 10:50:05
cookie1     2015-04-10 10:50:05 url6    NULL                2015-04-10 11:00:00
cookie1     2015-04-10 11:00:00 url7    NULL                1970-01-01 00:00:00
cookie2     2015-04-10 10:00:00 url11   2015-04-10 10:03:04 2015-04-10 10:00:02
cookie2     2015-04-10 10:00:02 url22   2015-04-10 10:10:00 2015-04-10 10:03:04
cookie2     2015-04-10 10:03:04 1url33  2015-04-10 10:50:01 2015-04-10 10:10:00
cookie2     2015-04-10 10:10:00 url44   2015-04-10 10:50:05 2015-04-10 10:50:01
cookie2     2015-04-10 10:50:01 url55   2015-04-10 11:00:00 2015-04-10 10:50:05
cookie2     2015-04-10 10:50:05 url66   NULL                2015-04-10 11:00:00
cookie2     2015-04-10 11:00:00 url77   NULL                1970-01-01 00:00:00

4.FIRST_VALUE()操作

取分组内排序后,截止到当前行,第一个值

代码语言:javascript
复制
hive> SELECT cookieid, createtime, url,
    > FIRST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS first 
    > FROM cookie4;
    
结果:
cookieid    createtime          url     first
cookie1     2015-04-10 10:00:00 url1    url1
cookie1     2015-04-10 10:00:02 url2    url1
cookie1     2015-04-10 10:03:04 1url3   url1
cookie1     2015-04-10 10:10:00 url4    url1
cookie1     2015-04-10 10:50:01 url5    url1
cookie1     2015-04-10 10:50:05 url6    url1
cookie1     2015-04-10 11:00:00 url7    url1
cookie2     2015-04-10 10:00:00 url11   url11
cookie2     2015-04-10 10:00:02 url22   url11
cookie2     2015-04-10 10:03:04 1url33  url11
cookie2     2015-04-10 10:10:00 url44   url11
cookie2     2015-04-10 10:50:01 url55   url11
cookie2     2015-04-10 10:50:05 url66   url11
cookie2     2015-04-10 11:00:00 url77   url11

5.LAST_VALUE()操作

取分组内排序后,截止到当前行,最后一个值

代码语言:javascript
复制
hive> SELECT cookieid, createtime, url,
    > LAST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS last
    > FROM cookie4;
    
结果:注意,是截止到当前行的最后一个值,其实就是它本身
cookieid    createtime          url     last
cookie1     2015-04-10 10:00:00 url1    url1
cookie1     2015-04-10 10:00:02 url2    url2
cookie1     2015-04-10 10:03:04 1url3   1url3
cookie1     2015-04-10 10:10:00 url4    url4
cookie1     2015-04-10 10:50:01 url5    url5
cookie1     2015-04-10 10:50:05 url6    url6
cookie1     2015-04-10 11:00:00 url7    url7
cookie2     2015-04-10 10:00:00 url11   url11
cookie2     2015-04-10 10:00:02 url22   url22
cookie2     2015-04-10 10:03:04 1url33  1url33
cookie2     2015-04-10 10:10:00 url44   url44
cookie2     2015-04-10 10:50:01 url55   url55
cookie2     2015-04-10 10:50:05 url66   url66
cookie2     2015-04-10 11:00:00 url77   url77

参考文章:Hive分析窗口函数(四) LAG,LEAD,FIRST_VALUE,LAST_VALUE

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2017.12.18 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1. 数据说明
  • 2. lag()操作
  • 3. lead()操作
  • 4.FIRST_VALUE()操作
  • 5.LAST_VALUE()操作
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档