前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >clickhouse实现留存数和留存率计算

clickhouse实现留存数和留存率计算

作者头像
公众号-利志分享
发布2022-04-25 09:08:13
1.5K1
发布2022-04-25 09:08:13
举报
文章被收录于专栏:利志分享

关于留存的计算是我们分享业务走向一种非常重要的方式,所谓留存用户,是用户在某一段时间内开始使用某个功能,到后面某一段时间又使用这个应用的用户,我们称这批用户叫留存用户。后面我们定义留存其实更多是基于两个事件来,比如某一天完成a事件,然后又完成b事件,这个我们叫b的留存,当然我们这个可以基于用户也可以基于设备。

下面我们来看clickhouse实现留存,关于clickhouse有个专用的留存函数,来处理留存用户数。

代码语言:javascript
复制
语法

    retention(cond1, cond2, ..., cond32);

参数

    cond — 返回 UInt8结果(1或0)的表达式。

返回值

    数组为1或0。

        1 — 条件满足。

        0 — 条件不满足。
 

下面我们来看例子: 创建表:

代码语言:javascript
复制
CREATE TABLE reg(date Date, uid Int32,channel Int32) ENGINE = Memory;



CREATE TABLE charge(date Date, uid Int32, money Int32) ENGINE = Memory;



# 插入数据:

INSERT INTO reg SELECT '2020-01-01', 1,10000;

INSERT INTO reg SELECT '2020-01-01', 2,10001;

INSERT INTO reg SELECT '2020-01-01', 3,10001;

INSERT INTO reg SELECT '2020-01-02', 4,10000;

INSERT INTO reg SELECT '2020-01-03', 5,10000;

INSERT INTO charge SELECT '2020-01-01', 1,10;

INSERT INTO charge SELECT '2020-01-01', 1,10;

INSERT INTO charge SELECT '2020-01-01', 2,200;

INSERT INTO charge SELECT '2020-01-01', 3,500;

INSERT INTO charge SELECT '2020-01-02', 4,10000;

INSERT INTO charge SELECT '2020-01-02', 2,300;

INSERT INTO charge SELECT '2020-01-02', 3,400;

INSERT INTO charge SELECT '2020-01-02', 3,400;

INSERT INTO charge SELECT '2020-01-03', 5,10000;

INSERT INTO charge SELECT '2020-01-03', 3,400;
 

假如查询2020-01-01日的0日,1日,2日注册到充值留存数,我们可以先看下uid的每个条件的实现结果:

代码语言:javascript
复制
SELECT

    uid,

    retention((date = '2020-01-01') AND (type = 1), (date = '2020-01-01') AND (type = 2), (date = '2020-01-02') AND (type = 2), (date = '2020-01-03') AND (type = 2)) AS r

FROM

(

    SELECT

        uid,

        date,

        1 AS type

    FROM reg

    WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')

    UNION ALL

    SELECT

        uid,

        date,

        2 AS type

    FROM charge

    WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')

)

WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')

GROUP BY uid

ORDER BY uid ASC



┌─uid─┬─r─────────┐

│   1 │ [1,1,0,0] │

│   2 │ [1,1,1,0] │

│   3 │ [1,1,1,1] │

│   4 │ [0,0,0,0] │

│   5 │ [0,0,0,0] │

└─────┴───────────┘
 

对一下上面我们的写入数据,符合我们的结果。 下面我们写个完整sql实现这个留存功能:

代码语言:javascript
复制
SELECT

    sum(r[1]) AS a,

    sum(r[2]) AS b,

    sum(r[3]) AS c,

    sum(r[4]) AS d

FROM

(

    SELECT

        uid,

        retention((date = '2020-01-01') AND (type = 1), (date = '2020-01-01') AND (type = 2), (date = '2020-01-02') AND (type = 2), (date = '2020-01-03') AND (type = 2)) AS r

    FROM

    (

        SELECT

            uid,

            date,

            1 AS type

        FROM reg

        WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')

        UNION ALL

        SELECT

            uid,

            date,

            2 AS type

        FROM charge

        WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')

    )

    WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')

    GROUP BY uid

    ORDER BY uid ASC

)



┌─a─┬─b─┬─c─┬─d─┐

│ 3 │ 3 │ 2 │ 1 │

└───┴───┴───┴───┘
 

上面分别是 a:2020-01-01 注册数 b:2020-01-01 注册并充值人数(新用户充值数) c:2020-01-02 注册并充值的1日留存数 d:2020-01-03 注册并充值的2日留存数

下面我们看下如何计算活跃,上面的例子需要多插入几条数据,我们需要计算3日活跃。

代码语言:javascript
复制
INSERT INTO charge SELECT '2020-01-04', 3,400;

.
 

计算活跃sql:

代码语言:javascript
复制
SELECT

    sum(r[1]) AS a,

    sum(r[2]) AS b,

    sum(r[3]) AS c,

    sum(r[4]) AS d

FROM

(

    SELECT

        uid,

        retention((date = '2020-01-01') AND (type = 1), (date = '2020-01-01') AND (type = 2), (date >= '2020-01-01') AND (date <= '2020-01-02') AND (type = 2), (date >= '2020-01-02') AND (date <= '2020-01-04') AND (type = 2)) AS r

    FROM

    (

        SELECT

            uid,

            date,

            1 AS type

        FROM reg

        WHERE (date >= '2020-01-01') AND (date <= '2020-01-03')

        UNION ALL

        SELECT

            uid,

            date,

            2 AS type

        FROM charge

        WHERE (date >= '2020-01-01') AND (date <= '2020-01-04')

    )

    WHERE (date >= '2020-01-01') AND (date <= '2020-01-04')

    GROUP BY uid

    ORDER BY uid ASC

)



┌─a─┬─b─┬─c─┬─d─┐

│ 3 │ 3 │ 3 │ 2 │

└───┴───┴───┴───┘
 

上面分别是 a:2020-01-01 注册数 b:2020-01-01 注册并充值人数活跃数 c:2020-01-02 注册并充值的1日活跃数 d:2020-01-03 注册并充值的2日活跃数

关于留存率或者活跃率我没有写例子,其实就是上面的数据b,c,d除以a,a是基准数据。

留存计算一般其实是针对于一天来计算 活跃是算一段时间的累计数

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-03-21,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 利志分享 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档