前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >clickhouse实现漏斗功能

clickhouse实现漏斗功能

作者头像
公众号-利志分享
发布2022-04-25 09:06:51
5720
发布2022-04-25 09:06:51
举报
文章被收录于专栏:利志分享

clickhouse实现漏斗功能,假设我们已经得到某个业务的转化,从注册->登录->浏览首页->进房等路径实现分析转化率,从渠道a进来的转化,从渠道b过来的转化,哪个转化率更加好, 漏斗就是很好的分析此类问题的工具。 漏斗模型是一个倒置的金字塔形状,主要用来分析页面与页面 功能模块之前的转化情况,下面一层都是基于紧邻的上一层转化而来的,也就是说前一个条件是后一个条件成立的基础;解决此类场景clickhouse提供了一个名叫windowFunnel的函数来实现:

代码语言:javascript
复制
windowFunnel(window)(timestamp, cond1, cond2, ..., condN)



window:窗口大小,从第一个事件开始,往后推移一个窗口大小来提取事件数据

timestamp:可以是时间或时间戳类型,用来对时事件进行排序

cond:每层满足的事件
 

为了便于大家理解,这里举个简单的例子:

创建一张用户行为表,至少包含时间、事件、用户id

代码语言:javascript
复制
CREATE TABLE test.action

(

    `uid` Int32,

    `event_type` String,

    `time` datetime

)

ENGINE = MergeTree()

PARTITION BY uid

ORDER BY xxHash32(uid)

SAMPLE BY xxHash32(uid)

SETTINGS index_granularity = 8192
 

插入测试数据

代码语言:javascript
复制
insert into action values(1,'注册','2020-01-02 11:00:00');

insert into action values(1,'登录','2020-01-02 11:10:00');

insert into action values(1,'浏览首页','2020-01-02 11:20:00');

insert into action values(1,'进房','2020-01-02 11:30:00');



insert into action values(2,'浏览首页','2020-01-02 11:00:00');

insert into action values(2,'进房','2020-01-02 11:10:00');



insert into action values(1,'注册','2020-01-02 11:00:00');



insert into action values(3,'注册','2020-01-02 11:20:00');

insert into action values(3,'登录','2020-01-02 12:00:00');



insert into action values(4,'注册','2020-01-02 11:50:00');

insert into action values(4,'登录','2020-01-02 12:00:00');



insert into action values(5,'注册','2020-01-02 11:50:00');

insert into action values(5,'登录','2020-01-02 12:00:00');

insert into action values(5,'浏览首页','2020-01-02 11:10:00');



insert into action values(6,'注册','2020-01-02 11:50:00');

insert into action values(6,'登录','2020-01-02 12:00:00');

insert into action values(6,'浏览首页','2020-01-02 12:10:00');
 

以30分钟作为一个时间窗口,看下windowFunnel返回了什么样的数据

代码语言:javascript
复制
SELECT 

    user_id,

    windowFunnel(1800)(time, event_type = '注册', event_type = '登录', event_type = '浏览首页', event_type = '进房') AS level

FROM 

(

    SELECT 

        time,

        event_type,

        uid AS user_id

    FROM action

)

GROUP BY user_id



┌─user_id─┬─level─┐

│       3 │     1 │

│       2 │     0 │

│       5 │     2 │

│       1 │     4 │

│       6 │     3 │

└─────────┴───────┘
 

这里level只记录了路径中最后一次事件所属的层级,如果直接对level分组统计就会丢失之前的层级数据,导致漏斗不能呈现金字塔状

继续使用上面的测试数据,通过数组的高阶函数对上述结果数据进行二次加工处理以获取完整漏斗展示效果。

分析”2020-01-02”这天路径为“注册->登录->浏览首页->进房”的转化情况。

代码语言:javascript
复制
SELECT level_index,count(1) FROM

(

    SELECT  user_id, 

        arrayWithConstant(level, 1) levels, 

        arrayJoin(arrayEnumerate( levels )) level_index

      FROM (

        SELECT

          user_id,

          windowFunnel(1800)(

            time,

            event_type = '注册',

            event_type = '登录' ,

            event_type = '浏览首页',

            event_type = '进房'

          ) AS level

        FROM (

          SELECT  time,  event_type , uid as user_id

          FROM test.action 

          WHERE toDate(time) = '2020-01-02' 

        )

        GROUP BY user_id

    )

)

group by level_index

ORDER BY level_index



┌─level_index─┬─count(1)─┐

│           1 │        5 │

│           2 │        4 │

│           3 │        2 │

│           4 │        1 │

└─────────────┴──────────┘
 

上面这个例子是多个事件在一个表里面,实际上可能我们的实现是每个行为事件会单独存在一个不同的表里面,下面我们看下多个事件存在不同的表里面如何实现。

代码语言:javascript
复制
# 注册表

CREATE TABLE test.action1

(

    `uid` Int32,

    `time` datetime

)

ENGINE = MergeTree()

PARTITION BY uid

ORDER BY xxHash32(uid)

SAMPLE BY xxHash32(uid)

SETTINGS index_granularity = 8192



# 登录

CREATE TABLE test.action2

(

    `uid` Int32,

    `time` datetime

)

ENGINE = MergeTree()

PARTITION BY uid

ORDER BY xxHash32(uid)

SAMPLE BY xxHash32(uid)

SETTINGS index_granularity = 8192



# 浏览首页

CREATE TABLE test.action3

(

    `uid` Int32,

    `time` datetime

)

ENGINE = MergeTree()

PARTITION BY uid

ORDER BY xxHash32(uid)

SAMPLE BY xxHash32(uid)

SETTINGS index_granularity = 8192



# 进房

CREATE TABLE test.action4

(

    `uid` Int32,

    `time` datetime

)

ENGINE = MergeTree()

PARTITION BY uid

ORDER BY xxHash32(uid)

SAMPLE BY xxHash32(uid)

SETTINGS index_granularity = 8192



insert into action1 values(1,'2020-01-02 11:00:00');

insert into action2 values(1,'2020-01-02 11:10:00');

insert into action3 values(1,'2020-01-02 11:20:00');

insert into action4 values(1,'2020-01-02 11:30:00');



insert into action3 values(2,'2020-01-02 11:00:00');

insert into action4 values(2,'2020-01-02 11:10:00');



insert into action1 values(1,'2020-01-02 11:00:00');



insert into action1 values(3,'2020-01-02 11:20:00');

insert into action2 values(3,'2020-01-02 12:00:00');



insert into action1 values(4,'2020-01-02 11:50:00');

insert into action2 values(4,'2020-01-02 12:00:00');



insert into action1 values(5,'2020-01-02 11:50:00');

insert into action2 values(5,'2020-01-02 12:00:00');

insert into action3 values(5,'2020-01-02 11:10:00');



insert into action1 values(6,'2020-01-02 11:50:00');

insert into action2 values(6,'2020-01-02 12:00:00');

insert into action3 values(6,'2020-01-02 12:10:00');
 

窗口函数实现如下:

代码语言:javascript
复制
SELECT 

    user_id,

    windowFunnel(1800)(time, event_type = 'action1', event_type = 'action2', event_type = 'action3', event_type = 'action4') AS level

FROM 

(

    SELECT 

        time,

        'action1' as event_type,

        uid AS user_id

    FROM action1

    union all

    SELECT 

        time,

        'action2' as event_type,

        uid AS user_id

    FROM action2

    union all

    SELECT 

        time,

        'action3' as event_type,

        uid AS user_id

    FROM action3

    union all

    SELECT 

        time,

        'action4' as event_type,

        uid AS user_id

    FROM action4

)

GROUP BY user_id
 

分析”2020-01-02”这天路径为“注册->登录->浏览首页->进房”的转化情况。

代码语言:javascript
复制
SELECT level_index,count(1) FROM

(

    SELECT  user_id, 

        arrayWithConstant(level, 1) levels, 

        arrayJoin(arrayEnumerate( levels )) level_index

      FROM (

        SELECT 

            user_id,

            windowFunnel(1800)(time, event_type = 'action1', event_type = 'action2', event_type = 'action3', event_type = 'action4') AS level

        FROM 

        (

            SELECT 

                time,

                'action1' as event_type,

                uid AS user_id

            FROM action1

            union all

            SELECT 

                time,

                'action2' as event_type,

                uid AS user_id

            FROM action2

            union all

            SELECT 

                time,

                'action3' as event_type,

                uid AS user_id

            FROM action3

            union all

            SELECT 

                time,

                'action4' as event_type,

                uid AS user_id

            FROM action4

        )

        GROUP BY user_id

    )

)

group by level_index

ORDER BY level_index





┌─level_index─┬─count(1)─┐

│           1 │        5 │

│           2 │        4 │

│           3 │        2 │

│           4 │        1 │

└─────────────┴──────────┘
 

好了,clickhouse的漏斗已经完了。

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-03-19,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 利志分享 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档