支出表: t2_spending
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| user_id | int |
| spend_date | string |
| platform | string |
| amount | int |
+-------------+---------+
写一段 SQL 来查找每天 仅 使用手机端用户、仅 使用桌面端用户和 同时 使用桌面端和手机端的用户人数和总支出金额。
查询结果格式如下例所示: t2_spending table:
+---------+------------+----------+--------+
| user_id | spend_date | platform | amount |
+---------+------------+----------+--------+
| 1 | 2019-07-01 | mobile | 100 |
| 1 | 2019-07-01 | desktop | 100 |
| 2 | 2019-07-01 | mobile | 100 |
| 2 | 2019-07-02 | mobile | 100 |
| 3 | 2019-07-01 | desktop | 100 |
| 3 | 2019-07-02 | desktop | 100 |
+---------+------------+----------+--------+
Result table:
+------------+----------+--------------+-------------+
| spend_date | platform | total_amount | total_users |
+------------+----------+--------------+-------------+
| 2019-07-01 | desktop | 100 | 1 |
| 2019-07-01 | mobile | 100 | 1 |
| 2019-07-01 | both | 200 | 1 |
| 2019-07-02 | desktop | 100 | 1 |
| 2019-07-02 | mobile | 100 | 1 |
| 2019-07-02 | both | 0 | 0 |
+------------+----------+--------------+-------------+
在 2019-07-01, 用户1 同时 使用桌面端和手机端购买, 用户2 仅 使用了手机端购买,而用户3 仅 使用了桌面端购买。 在 2019-07-02, 用户2 仅 使用了手机端购买, 用户3 仅 使用了桌面端购买,且没有用户 同时 使用桌面端和手机端购买。
本题难点在于:1。对于仅使用一个平台的列出其平台,而对于使用两个平台的需要标记为both;2.对于07-02没有both的,需要进行显示为0的处理。
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️ |
由于spark不支持count(distinct )开窗,所以我们先计算一下每个用户每天使用的平台个数。 然后原表与新表关联,计算出每个用户每天使用的new_platform
平台类型。
执行SQL
select t1.spend_date,
t1.user_id,
t1.platform,
t1.amount,
t2.plat_cnt,
if(t2.plat_cnt = 1, t1.platform, 'both') as new_platform
from t2_spending t1
left join
(select spend_date,
user_id,
count(distinct platform) as plat_cnt
from t2_spending
group by spend_date,
user_id) t2
on t1.spend_date = t2.spend_date
and t1.user_id = t2.user_id
SQL结果
+-------------+----------+-----------+---------+-----------+---------------+
| spend_date | user_id | platform | amount | plat_cnt | new_platform |
+-------------+----------+-----------+---------+-----------+---------------+
| 2019-01-01 | 1 | mobile | 100 | 2 | both |
| 2019-01-01 | 1 | desktop | 100 | 2 | both |
| 2019-01-01 | 2 | mobile | 100 | 1 | mobile |
| 2019-01-02 | 2 | mobile | 100 | 1 | mobile |
| 2019-01-01 | 3 | desktop | 100 | 1 | desktop |
| 2019-01-02 | 3 | desktop | 100 | 1 | desktop |
+-------------+----------+-----------+---------+-----------+---------------+
我们使用新的new_platform来计算各项结果
执行SQL
select spend_date,
new_platform,
sum(amount) as total_amount,
count(distinct user_id) as total_users
from (select t1.spend_date,
t1.user_id,
t1.platform,
t1.amount,
t2.plat_cnt,
if(t2.plat_cnt = 1, t1.platform, 'both') as new_platform
from t2_spending t1
left join
(select spend_date,
user_id,
count(distinct platform) as plat_cnt
from t2_spending
group by spend_date,
user_id) t2
on t1.spend_date = t2.spend_date
and t1.user_id = t2.user_id) tt
group by spend_date,
new_platform
order by 1 asc
SQL结果
+-------------+---------------+---------------+--------------+
| spend_date | new_platform | total_amount | total_users |
+-------------+---------------+---------------+--------------+
| 2019-01-01 | mobile | 100 | 1 |
| 2019-01-01 | desktop | 100 | 1 |
| 2019-01-01 | both | 200 | 1 |
| 2019-01-02 | desktop | 100 | 1 |
| 2019-01-02 | mobile | 100 | 1 |
+-------------+---------------+---------------+--------------+
需要每天展示不同平台的全部数据,所以我们先创建一张包含mobile
,desktop
,both
类型与全量日期组合的数据表
执行SQL
select ttt1.spend_date,
ttt1.platform,
coalesce(total_amount, 0) as total_amount,
coalesce(total_users, 0) as total_users
from (
--生成日期+平台类型的全量的数据
select spend_date, 'desktop' as platform
from t2_spending
group by spend_date
union all
select spend_date, 'mobile' as platform
from t2_spending
group by spend_date
union all
select spend_date, 'both' as platform
from t2_spending
group by spend_date) ttt1
left join
(select spend_date,
new_platform,
sum(amount) as total_amount,
count(distinct user_id) as total_users
from (select t1.spend_date,
t1.user_id,
t1.platform,
t1.amount,
t2.plat_cnt,
if(t2.plat_cnt = 1, t1.platform, 'both') as new_platform
from t2_spending t1
left join
(select spend_date,
user_id,
count(distinct platform) as plat_cnt
from t2_spending
group by spend_date,
user_id) t2
on t1.spend_date = t2.spend_date
and t1.user_id = t2.user_id) tt
group by spend_date,
new_platform) ttt2
on ttt1.platform = ttt2.new_platform
and ttt1.spend_date = ttt2.spend_date
order by 1 asc
SQL结果
+-------------+-----------+---------------+--------------+
| spend_date | platform | total_amount | total_users |
+-------------+-----------+---------------+--------------+
| 2019-01-01 | desktop | 100 | 1 |
| 2019-01-01 | mobile | 100 | 1 |
| 2019-01-01 | both | 200 | 1 |
| 2019-01-02 | desktop | 100 | 1 |
| 2019-01-02 | mobile | 100 | 1 |
| 2019-01-02 | both | 0 | 0 |
+-------------+-----------+---------------+--------------+
--建表语句
CREATE TABLE t2_spending(
user_id int,
spend_date string,
platform string,
amount int
) COMMENT '支出表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
;
-- 插入数据
insert into t2_spending(user_id,spend_date,platform,amount)
values
(1,'2019-01-01','mobile',100),
(1,'2019-01-01','desktop',100),
(2,'2019-01-01','mobile',100),
(2,'2019-01-02','mobile',100),
(3,'2019-01-01','desktop',100),
(3,'2019-01-02','desktop',100);