clickhouse实现漏斗功能,假设我们已经得到某个业务的转化,从注册->登录->浏览首页->进房等路径实现分析转化率,从渠道a进来的转化,从渠道b过来的转化,哪个转化率更加好, 漏斗就是很好的分析此类问题的工具。 漏斗模型是一个倒置的金字塔形状,主要用来分析页面与页面 功能模块之前的转化情况,下面一层都是基于紧邻的上一层转化而来的,也就是说前一个条件是后一个条件成立的基础;解决此类场景clickhouse提供了一个名叫windowFunnel的函数来实现:
windowFunnel(window)(timestamp, cond1, cond2, ..., condN)
window:窗口大小,从第一个事件开始,往后推移一个窗口大小来提取事件数据
timestamp:可以是时间或时间戳类型,用来对时事件进行排序
cond:每层满足的事件
为了便于大家理解,这里举个简单的例子:
创建一张用户行为表,至少包含时间、事件、用户id
CREATE TABLE test.action
(
`uid` Int32,
`event_type` String,
`time` datetime
)
ENGINE = MergeTree()
PARTITION BY uid
ORDER BY xxHash32(uid)
SAMPLE BY xxHash32(uid)
SETTINGS index_granularity = 8192
插入测试数据
insert into action values(1,'注册','2020-01-02 11:00:00');
insert into action values(1,'登录','2020-01-02 11:10:00');
insert into action values(1,'浏览首页','2020-01-02 11:20:00');
insert into action values(1,'进房','2020-01-02 11:30:00');
insert into action values(2,'浏览首页','2020-01-02 11:00:00');
insert into action values(2,'进房','2020-01-02 11:10:00');
insert into action values(1,'注册','2020-01-02 11:00:00');
insert into action values(3,'注册','2020-01-02 11:20:00');
insert into action values(3,'登录','2020-01-02 12:00:00');
insert into action values(4,'注册','2020-01-02 11:50:00');
insert into action values(4,'登录','2020-01-02 12:00:00');
insert into action values(5,'注册','2020-01-02 11:50:00');
insert into action values(5,'登录','2020-01-02 12:00:00');
insert into action values(5,'浏览首页','2020-01-02 11:10:00');
insert into action values(6,'注册','2020-01-02 11:50:00');
insert into action values(6,'登录','2020-01-02 12:00:00');
insert into action values(6,'浏览首页','2020-01-02 12:10:00');
以30分钟作为一个时间窗口,看下windowFunnel返回了什么样的数据
SELECT
user_id,
windowFunnel(1800)(time, event_type = '注册', event_type = '登录', event_type = '浏览首页', event_type = '进房') AS level
FROM
(
SELECT
time,
event_type,
uid AS user_id
FROM action
)
GROUP BY user_id
┌─user_id─┬─level─┐
│ 3 │ 1 │
│ 2 │ 0 │
│ 5 │ 2 │
│ 1 │ 4 │
│ 6 │ 3 │
└─────────┴───────┘
这里level只记录了路径中最后一次事件所属的层级,如果直接对level分组统计就会丢失之前的层级数据,导致漏斗不能呈现金字塔状
继续使用上面的测试数据,通过数组的高阶函数对上述结果数据进行二次加工处理以获取完整漏斗展示效果。
分析”2020-01-02”这天路径为“注册->登录->浏览首页->进房”的转化情况。
SELECT level_index,count(1) FROM
(
SELECT user_id,
arrayWithConstant(level, 1) levels,
arrayJoin(arrayEnumerate( levels )) level_index
FROM (
SELECT
user_id,
windowFunnel(1800)(
time,
event_type = '注册',
event_type = '登录' ,
event_type = '浏览首页',
event_type = '进房'
) AS level
FROM (
SELECT time, event_type , uid as user_id
FROM test.action
WHERE toDate(time) = '2020-01-02'
)
GROUP BY user_id
)
)
group by level_index
ORDER BY level_index
┌─level_index─┬─count(1)─┐
│ 1 │ 5 │
│ 2 │ 4 │
│ 3 │ 2 │
│ 4 │ 1 │
└─────────────┴──────────┘
上面这个例子是多个事件在一个表里面,实际上可能我们的实现是每个行为事件会单独存在一个不同的表里面,下面我们看下多个事件存在不同的表里面如何实现。
# 注册表
CREATE TABLE test.action1
(
`uid` Int32,
`time` datetime
)
ENGINE = MergeTree()
PARTITION BY uid
ORDER BY xxHash32(uid)
SAMPLE BY xxHash32(uid)
SETTINGS index_granularity = 8192
# 登录
CREATE TABLE test.action2
(
`uid` Int32,
`time` datetime
)
ENGINE = MergeTree()
PARTITION BY uid
ORDER BY xxHash32(uid)
SAMPLE BY xxHash32(uid)
SETTINGS index_granularity = 8192
# 浏览首页
CREATE TABLE test.action3
(
`uid` Int32,
`time` datetime
)
ENGINE = MergeTree()
PARTITION BY uid
ORDER BY xxHash32(uid)
SAMPLE BY xxHash32(uid)
SETTINGS index_granularity = 8192
# 进房
CREATE TABLE test.action4
(
`uid` Int32,
`time` datetime
)
ENGINE = MergeTree()
PARTITION BY uid
ORDER BY xxHash32(uid)
SAMPLE BY xxHash32(uid)
SETTINGS index_granularity = 8192
insert into action1 values(1,'2020-01-02 11:00:00');
insert into action2 values(1,'2020-01-02 11:10:00');
insert into action3 values(1,'2020-01-02 11:20:00');
insert into action4 values(1,'2020-01-02 11:30:00');
insert into action3 values(2,'2020-01-02 11:00:00');
insert into action4 values(2,'2020-01-02 11:10:00');
insert into action1 values(1,'2020-01-02 11:00:00');
insert into action1 values(3,'2020-01-02 11:20:00');
insert into action2 values(3,'2020-01-02 12:00:00');
insert into action1 values(4,'2020-01-02 11:50:00');
insert into action2 values(4,'2020-01-02 12:00:00');
insert into action1 values(5,'2020-01-02 11:50:00');
insert into action2 values(5,'2020-01-02 12:00:00');
insert into action3 values(5,'2020-01-02 11:10:00');
insert into action1 values(6,'2020-01-02 11:50:00');
insert into action2 values(6,'2020-01-02 12:00:00');
insert into action3 values(6,'2020-01-02 12:10:00');
窗口函数实现如下:
SELECT
user_id,
windowFunnel(1800)(time, event_type = 'action1', event_type = 'action2', event_type = 'action3', event_type = 'action4') AS level
FROM
(
SELECT
time,
'action1' as event_type,
uid AS user_id
FROM action1
union all
SELECT
time,
'action2' as event_type,
uid AS user_id
FROM action2
union all
SELECT
time,
'action3' as event_type,
uid AS user_id
FROM action3
union all
SELECT
time,
'action4' as event_type,
uid AS user_id
FROM action4
)
GROUP BY user_id
分析”2020-01-02”这天路径为“注册->登录->浏览首页->进房”的转化情况。
SELECT level_index,count(1) FROM
(
SELECT user_id,
arrayWithConstant(level, 1) levels,
arrayJoin(arrayEnumerate( levels )) level_index
FROM (
SELECT
user_id,
windowFunnel(1800)(time, event_type = 'action1', event_type = 'action2', event_type = 'action3', event_type = 'action4') AS level
FROM
(
SELECT
time,
'action1' as event_type,
uid AS user_id
FROM action1
union all
SELECT
time,
'action2' as event_type,
uid AS user_id
FROM action2
union all
SELECT
time,
'action3' as event_type,
uid AS user_id
FROM action3
union all
SELECT
time,
'action4' as event_type,
uid AS user_id
FROM action4
)
GROUP BY user_id
)
)
group by level_index
ORDER BY level_index
┌─level_index─┬─count(1)─┐
│ 1 │ 5 │
│ 2 │ 4 │
│ 3 │ 2 │
│ 4 │ 1 │
└─────────────┴──────────┘
好了,clickhouse的漏斗已经完了。