有用户点击日志记录表 t_click_log_025,包含user_id(用户ID),click_time(点击时间),请查询出连续点击三次的用户数;
连续点击三次:指点击记录中同一用户连续点击,中间无其他用户点击;
+----------+--------------+
| user_id | click_time |
+----------+--------------+
| 1 | 1736337600 |
| 2 | 1736337670 |
| 1 | 1736337710 |
| 1 | 1736337715 |
| 1 | 1736337750 |
| 2 | 1736337760 |
| 3 | 1736337820 |
| 3 | 1736337840 |
| 3 | 1736337850 |
| 3 | 1736337910 |
| 4 | 1736337915 |
+----------+--------------+
1.连续问题,最大连续次数的变种问题;
2.思路一:累积求和分组法(此种方式比连续N天登录略难一些)
(2.1)按照时间排序之后,使用lag()函数可以判断出当前行用户与上一行用户,是否是同一个用户;
(2.2)与上一行是同一个用户的日志,则给该行打标0,不同打标1;(属于经验:要累积求和,0和不变,1和+1.)
(2.3)对打标完成的标签,进行累积求和,和相同代表属于同一用户连续(聚合函数开窗);
(2.4)对累积求和的值进行统计,相同值的个数>=3 表示连续登录;
3.思路二:双重排序差值法(难度同连续N天登录)
(3.1)与连续登录天数类似。按照点击时间(click_time) 进行全排序,按照用户ID(user_id)分组,按照点击时间排序;
(3.2)对两次排序计算差值,按照用户和差值进行分组,相同用户,差值相同说明连续;
(3.3)计算属于同一分组的数量>=3则是连续,统计用户数得出结果;
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️ |
1)增加一列is_same_user,判断是否与上一行是同一用户点击,是取0,否取1,第一行默认为0;
select
user_id,
click_time,
case when lag(user_id)over(order by click_time asc) is null then 0
when user_id =lag(user_id)over(order by click_time asc) then 0
else 1 end as is_same_user
from t_click_log_025
查询结果
2)对是否是同一用户分组进行累积求和
select
user_id,
click_time,
sum(is_same_user)over(order by click_time asc) as sum_order
from
(
select
user_id,
click_time,
case when lag(user_id)over(order by click_time asc) is null then 0
when user_id =lag(user_id)over(order by click_time asc) then 0
else 1 end as is_same_user
from t_click_log_025
) t
查询结果
3)查询sum_order值个数>3的用户
select
user_id,sum_order,count(1)
from
(
select
user_id,
click_time,
sum(is_same_user)over(order by click_time asc) as sum_order
from
(
select
user_id,
click_time,
case when lag(user_id)over(order by click_time asc) is null then 0
when user_id =lag(user_id)over(order by click_time asc) then 0
else 1 end as is_same_user
from t_click_log_025
) t
)tt
group by user_id,sum_order
having count(1) >=3
查询结果
4)查询最终用户数量
select
count(distinct user_id) as user_num
from
(
select
user_id,
sum_order,
count(1)
from
(
select
user_id,
click_time,
sum(is_same_user)over(order by click_time asc) as sum_order
from
(
select
user_id,
click_time,
case when lag(user_id)over(order by click_time asc) is null then 0
when user_id =lag(user_id)over(order by click_time asc) then 0
else 1 end as is_same_user
from t_click_log_025
) t
)tt
group by user_id,sum_order
having count(1) >=3
)ttt
查询结果
1)分别按照时间,按照不分组和按照用户分组进行排序;
select
user_id,
click_time,
row_number()over(order by click_time asc) as row_num1,
row_number()over(partition by user_id order by click_time asc) as row_num2
from t_click_log_025
查询结果
2)计算差值并按照用户和差值进行分组
select
user_id,
diff,
count(1) as aa
from
(select
user_id,
click_time,
row_number()over(order by click_time asc) - row_number()over(partition by user_id order by click_time asc) as diff
from t_click_log_025
) t
group by user_id,diff
查询结果
3)查询分组行数>=3的用户差值分组
select
user_id,diff
from
(
select
user_id,
diff,
count(1) aa
from
(select
user_id,
click_time,
row_number()over(order by click_time asc) - row_number()over(partition by user_id order by click_time asc) as diff
from t_click_log_025
) t
group by user_id,diff
) tt
where aa>=3
查询结果
4)计算用户数
select
count(distinct user_id) as user_num
from
(
select
user_id,
diff,
count(1) aa
from
(select
user_id,
click_time,
row_number()over(order by click_time asc) - row_number()over(partition by user_id order by click_time asc) as diff
from t_click_log_025
) t
group by user_id,diff
) tt
where aa>=3
查询结果
--建表语句
CREATE TABLE t_click_log_025 (
user_id BIGINT,
click_time BIGINT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
--插入数据
insert into t_click_log_025 (user_id,click_time)
values
(1,1736337600),
(2,1736337670),
(1,1736337710),
(1,1736337715),
(1,1736337750),
(2,1736337760),
(3,1736337820),
(3,1736337840),
(3,1736337850),
(3,1736337910),
(4,1736337915)