现有 hive 表 record, 内容如下:
hive> select * from record;
OK
A   2015-01     5
A   2015-01     15
B   2015-01     5
A   2015-01     8
B   2015-01     25
A   2015-01     5
A   2015-02     4
A   2015-02     6
B   2015-02     10
B   2015-02     5
A   2015-03     16
A   2015-03     22
B   2015-03     23
B   2015-03     10
B   2015-03     11其中字段意义: userid(string) month(string) count(int) 分别代表: 用户id 月份 该月访问次数 需求: 统计每个用户截止到当月为止的最大单月访问次数和累计到该月的总访问次数 最终结果为:
用户    月份        本月访问次数      截止到当月总访问次数      截止到当月最大访问次数
A       2015-01     33              33                       33
A       2015-02     10              43                       33
A       2015-03     38              81                       38
B       2015-01     30              30                       30
B       2015-02     15              45                       30
B       2015-03     44              89                       44--(1)
# 先求出每个用户每个月总访问量
CREATE TABLE record_2 AS
SELECT userid, month, sum(count) as count 
FROM record
GROUP BY userid, month;
# record_2 表中内容为:
A   2015-01     33
A   2015-02     10
A   2015-03     38
B   2015-01     30
B   2015-02     15
B   2015-03     44
--(2)
SELECT t1.userid, t1.month, t1.count, sum(t2.count) sum_count, max(t2.count) max_count
FROM record_2 t1 INNER JOIN record_2 t2
ON t1.userid = t2.userid
WHERE t1.month >= t2.month 
GROUP BY t1.userid, t1.month, t1.count 
ORDER BY t1.userid, t1.month;
# 最终结果:
A   2015-01 33  33  33
A   2015-02 10  43  33
A   2015-03 38  81  38
B   2015-01 30  30  30
B   2015-02 15  45  30
B   2015-03 44  89  44select userid, month, count, 
sum(count) over(partition by userid order by month) as sum_count,
max(count) over(partition by userid order by month) as max_count
from record_2;
结果:
A   2015-01 33  33  33
A   2015-02 10  43  33
A   2015-03 38  81  38
B   2015-01 30  30  30
B   2015-02 15  45  30
B   2015-03 44  89  44