前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Hive窗口函数02-NTILE、ROW_NUMBER、RANK、DENSE_RANK

Hive窗口函数02-NTILE、ROW_NUMBER、RANK、DENSE_RANK

作者头像
CoderJed
发布2018-09-13 10:59:39
1.3K0
发布2018-09-13 10:59:39
举报
文章被收录于专栏:Jed的技术阶梯Jed的技术阶梯

Hive窗口函数NTILE、ROW_NUMBER、RANK、DENSE_RANK入门

1. 数据说明

现有 hive 表 cookie2, 内容如下:

代码语言:javascript
复制
hive> select * from cookie2;

cookie1 2015-04-10  1
cookie1 2015-04-11  5
cookie1 2015-04-12  7
cookie1 2015-04-13  3
cookie1 2015-04-14  2
cookie1 2015-04-15  4
cookie1 2015-04-16  4
cookie2 2015-04-10  2
cookie2 2015-04-11  3
cookie2 2015-04-12  5
cookie2 2015-04-13  6
cookie2 2015-04-14  3
cookie2 2015-04-15  9
cookie2 2015-04-16  7
  • 其中字段意义: cookieid(string), createtime(string), pv(int)
  • 分别代表: cookieid, 创建时间, 该cookieid当天的pv值

2. ntile()操作

ntile(n),用于将分组数据按照顺序切分成n片,返回当前切片值,ntile不支持rows between,比如 ntile(2) over(partition by cookieid order by createtime rows between 3 preceding and current row),如果切片不均匀,默认增加第一个切片的分布。

(1) 将分组数组按照顺序切分成2份

代码语言:javascript
复制
select cookieid, createtime, pv,
ntile(2) over(partition by cookieid order by createtime) as nt1
from cookie2;

结果:因为每组有7条数据,所以默认第一个分片加1条数据
cookie1 2015-04-10  1   1
cookie1 2015-04-11  5   1
cookie1 2015-04-12  7   1
cookie1 2015-04-13  3   1
cookie1 2015-04-14  2   2
cookie1 2015-04-15  4   2
cookie1 2015-04-16  4   2
cookie2 2015-04-10  2   1
cookie2 2015-04-11  3   1
cookie2 2015-04-12  5   1
cookie2 2015-04-13  6   1
cookie2 2015-04-14  3   2
cookie2 2015-04-15  9   2
cookie2 2015-04-16  7   2

(2) 将分组数组按照顺序切分成n份

代码语言:javascript
复制
select cookieid, createtime, pv,
ntile(3) over(partition by cookieid order by createtime) as nt2,
ntile(4) over(partition by cookieid order by createtime) as nt3
from cookie2;

cookie1 2015-04-10  1   1   1
cookie1 2015-04-11  5   1   1
cookie1 2015-04-12  7   1   2
cookie1 2015-04-13  3   2   2
cookie1 2015-04-14  2   2   3
cookie1 2015-04-15  4   3   3
cookie1 2015-04-16  4   3   4
cookie2 2015-04-10  2   1   1
cookie2 2015-04-11  3   1   1
cookie2 2015-04-12  5   1   2
cookie2 2015-04-13  6   2   2
cookie2 2015-04-14  3   2   3
cookie2 2015-04-15  9   3   3
cookie2 2015-04-16  7   3   4

(3)应用场景:统计一个每个cookie的pv数最多的前1/3的天

代码语言:javascript
复制
select cookieid, createtime, pv,
ntile(3) over(partition by cookieid order by pv desc) as nt4
from cookie2;

结果:nt4 = 1 数据就是我们要求的结果
cookie1 2015-04-12  7   1
cookie1 2015-04-11  5   1
cookie1 2015-04-16  4   1
cookie1 2015-04-15  4   2
cookie1 2015-04-13  3   2
cookie1 2015-04-14  2   3
cookie1 2015-04-10  1   3
cookie2 2015-04-15  9   1
cookie2 2015-04-16  7   1
cookie2 2015-04-13  6   1
cookie2 2015-04-12  5   2
cookie2 2015-04-11  3   2
cookie2 2015-04-14  3   3
cookie2 2015-04-10  2   3

3. row_number()操作

row_number():从1开始,按照顺序,生成分组内记录的行号

(1)组内排序后生成行号

代码语言:javascript
复制
select cookieid, createtime,  pv,
row_number() over(partition by cookieid order by pv desc) as index
from cookie2;

结果:
cookie1 2015-04-12  7   1
cookie1 2015-04-11  5   2
cookie1 2015-04-16  4   3
cookie1 2015-04-15  4   4
cookie1 2015-04-13  3   5
cookie1 2015-04-14  2   6
cookie1 2015-04-10  1   7
cookie2 2015-04-15  9   1
cookie2 2015-04-16  7   2
cookie2 2015-04-13  6   3
cookie2 2015-04-12  5   4
cookie2 2015-04-11  3   5
cookie2 2015-04-14  3   6
cookie2 2015-04-10  2   7

(2)常用场景:分组取TopN, 比如求每个cookie排名前三的pv

代码语言:javascript
复制
create table tmp as
select cookieid, createtime,  pv,
row_number() over(partition by cookieid order by pv desc) as index
from cookie2;

select * from tmp where index <= 3;

结果:
cookie1 2015-04-12  7   1
cookie1 2015-04-11  5   2
cookie1 2015-04-16  4   3
cookie2 2015-04-15  9   1
cookie2 2015-04-16  7   2
cookie2 2015-04-13  6   3

4. rank()操作

rank:生成数据项在分组中的排名,相同的值编相同的号,排名相等会在名次中留下空位

代码语言:javascript
复制
select cookieid, createtime, pv,
rank() over(partition by cookieid order by pv desc) as rank
from cookie2
where cookieid = 'cookie1';

结果:
cookie1 2015-04-12  7   1
cookie1 2015-04-11  5   2
cookie1 2015-04-16  4   3
cookie1 2015-04-15  4   3
cookie1 2015-04-13  3   5
cookie1 2015-04-14  2   6
cookie1 2015-04-10  1   7

5. dense_rank()操作

dense_rank:按顺序编号,相同的值编相同的号,不留空位

代码语言:javascript
复制
select cookieid, createtime, pv,
dense_rank() over(partition by cookieid order by pv desc) as rank
from cookie2
where cookieid = 'cookie1';

结果:
cookie1 2015-04-12  7   1
cookie1 2015-04-11  5   2
cookie1 2015-04-16  4   3
cookie1 2015-04-15  4   3
cookie1 2015-04-13  3   4
cookie1 2015-04-14  2   5
cookie1 2015-04-10  1   6

参考文献:Hive分析窗口函数(二) NTILE,ROW_NUMBER,RANK,DENSE_RANK

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2017.12.15 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1. 数据说明
  • 2. ntile()操作
    • (1) 将分组数组按照顺序切分成2份
      • (2) 将分组数组按照顺序切分成n份
        • (3)应用场景:统计一个每个cookie的pv数最多的前1/3的天
        • 3. row_number()操作
          • (1)组内排序后生成行号
            • (2)常用场景:分组取TopN, 比如求每个cookie排名前三的pv
            • 4. rank()操作
            • 5. dense_rank()操作
            领券
            问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档