有一个用来记录每日客户消耗数据的表 t,它的表结构如下:
字段 | 类型 | 描述 |
---|---|---|
created_day | Date | 消耗日期 |
customer_id | Integer | 客户ID |
amount | Integer | 消耗金额 |
要求:
统计出头部客户、腰部客户、尾部客户在上个月(2020-06-01 ~ 2020-06-30)的留存情况。
输出结果的格式:
层级 | 客户数量 | 留存数量 |
---|---|---|
头部客户 | ||
腰部客户 | ||
尾部客户 |
数据定义:
今天这个问题和我上一篇文章(SQL 订单揽收统计)里面的问题很相似,只不过这里要求多统计一列,因此,解决的思路也差不多。
下面我将用 CTE 来演示每个步骤。
第一步,计算出上个月每个客户的消耗金额。
with t1 AS
(SELECT
customer_id,
SUM(amount) AS amount
FROM
t
WHERE created_day BETWEEN '2020-06-01'
AND '2020-06-30'
GROUP BY customer_id)
SELECT * FROM t1
第二步,在第一步的基础上,统计头部客户、腰部客户、尾部客户的数量。
t2 AS
(SELECT
CASE
WHEN amount >= 30000
THEN 1
WHEN amount >= 10000
THEN 2
ELSE 3
END AS customer_level,
COUNT(*) AS customter_cnt
FROM
t1
GROUP BY customer_level)
SELECT * FROM t2
在脚本中使用代码 1、2、3 分别表示头部客户、腰部客户、尾部客户。
需要注意的是,在 GROUP BY
子句中使用了 SELECT
子句中的字段别名 customer_level
,这种语法在 MySQL 上能编译通过,在其它数据库中则不行。
第三步,计算留存。根据留存的定义,只要客户在本月中有消耗,就计入留存数。比如客户 A,A 在上个月的消耗金额是 40000,那么 A 属于头部客户,假如 A 在本月的消耗金额大于 0,A 就为【留存】贡献了 1 。
我们在 t1
之后插入表达式 t12
,t12
用来计算每个客户的消耗金额和留存状态,它的脚本如下:
t12 AS
(SELECT
t1.customer_id,
t1.amount,
IF(tmp.amount > 0, 1, 0) AS keep_state
FROM
t1
LEFT JOIN
(SELECT
customer_id,
SUM(amount) AS amount
FROM
t
WHERE created_day BETWEEN '2020-07-01'
AND '2020-07-31'
GROUP BY customer_id) tmp
ON tmp.customer_id = t1.customer_id)
SELECT * FROM t12
t12
中的左连接也可以改造成标量子查询。
完整的 SQL 实现:
with t1 AS
(SELECT
customer_id,
SUM(amount) AS amount
FROM
t
WHERE created_day BETWEEN '2020-06-01'
AND '2020-06-30'
GROUP BY customer_id),
t12 AS
(SELECT
t1.customer_id,
t1.amount,
IF(tmp.amount > 0, 1, 0) AS keep_state
FROM
t1
LEFT JOIN
(SELECT
customer_id,
SUM(amount) AS amount
FROM
t
WHERE created_day BETWEEN '2020-07-01'
AND '2020-07-31'
GROUP BY customer_id) tmp
ON tmp.customer_id = t1.customer_id),
t2 AS
(SELECT
CASE
WHEN amount >= 30000
THEN 1
WHEN amount >= 10000
THEN 2
ELSE 3
END AS customer_level,
COUNT(*) AS customter_cnt,
SUM(keep_state) AS keep_cnt
FROM
t12
GROUP BY customer_level)
SELECT
CASE
customer_level
WHEN 1
THEN '头部客户'
WHEN 2
THEN '腰部客户'
ELSE '尾部客户'
END AS '层级',
customter_cnt AS '客户数量',
keep_cnt AS '留存数量'
FROM
t2
ORDER BY customer_level
感兴趣的朋友可以尝试不使用左连接或者标量子查询的写法,而是只查一次 t 表就能实现需求。
封面图片由 Daniel Hannah 在 Pixabay 上发布。