前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >大数据实战【千亿级数仓】阶段四

大数据实战【千亿级数仓】阶段四

作者头像
大数据梦想家
发布2021-01-27 16:15:14
6360
发布2021-01-27 16:15:14
举报

写在前面: 博主是一名软件工程系大数据应用开发专业大二的学生,昵称来源于《爱丽丝梦游仙境》中的Alice和自己的昵称。作为一名互联网小白,写博客一方面是为了记录自己的学习历程,一方面是希望能够帮助到很多和自己一样处于起步阶段的萌新。由于水平有限,博客中难免会有一些错误,有纰漏之处恳请各位大佬不吝赐教!个人小站:http://alices.ibilibili.xyz/ , 博客主页:https://alice.blog.csdn.net/ 尽管当前水平可能不及各位大佬,但我还是希望自己能够做得更好,因为一天的生活就是一生的缩影。我希望在最美的年华,做最好的自己

本篇博客,博主为大家带来的是关于大数据实战【千亿级数仓】阶段四的内容。

在这里插入图片描述
在这里插入图片描述

在该阶段中,我们需要编写SQL实现以下业务分析

  • 基于日期的订单指标分析(4)
  • 基于地域的订单分类指标分析(18)
  • 基于用户的订单指标分析(24)
  • 基于用户的退货指标分析(5)

括号中的数字代表的是每个指标的需求数量。四个指标,分别从订单时间维度,地域分类维度,用户消费维度,退货维度来对数据展开分析。

下面我们就以第三个指标,为大家演示如何完成用户订单指标业务开发。

用户订单指标业务开发

1. 需求分析

电商平台往往需要根据用户的购买数据来分析用户的行为,此处。我们基于用户的订单情况进行一些统计分析,用于将来的用户行为分析。根据用户的消费行为习惯,对运营部门提供用户分析数据指标。表是订单表!! 以下为本需求需要统计的基于用户的订单指标:

以下为本需求需要统计的基于用户的订单指标:

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

2. 创建dw层表

2.1创建itcast_dw.dim_user表

字段名称

说明

userId

用户id

loginName

登录名

userSex

性别

代码语言:javascript
复制
drop table if exists `itcast_dw`.`dim_user`;
create table `itcast_dw`.`dim_user`(
  userId          bigint,
  loginName       string,
  userSex         bigint
)
partitioned by (dt string)
STORED AS PARQUET;
2.2 创建订单临时表tmp_order_wide

在dw层创建 订单临时表tmp_order_wide

字段名称

说明

orderId

订单ID

orderStatus

订单状态

payType

支付类型

userId

用户id

userAddressId

收货地址id

payTime

支付时间

payPrice

支付金额

createtime

创建时间

代码语言:javascript
复制
-- 订单临时订单表
drop table if exists `itcast_dw`.`tmp_order_wide`;
create table `itcast_dw`.`tmp_order_wide`(
  orderId bigint,
  orderStatus bigint,
  payType bigint,
  userId bigint,
  userAddressId bigint,
  payTime string,
  payMoney double,
  createtime string
)
partitioned by(dt string)
STORED AS PARQUET;
2.3 创建订单时间标志宽表tmp_order_datetag_wide

字段名称

说明

flag30

近30天标志

flag60

近60天标志

flag90

近90天标志

flag180

近180天标志

flagTimeBucket

一天时间段标志(凌晨、早晨、上午…

代码语言:javascript
复制
drop table `itcast_dw`.`tmp_order_datetag_wide`;
create table `itcast_dw`.`tmp_order_datetag_wide`(
  orderId bigint,
  orderStatus bigint,
  payType bigint,
  userId bigint,
  userAddressId bigint,
  payTime string,
  payMoney double,
  createtime string,
  flag30 bigint,
  flag60 bigint,
  flag90 bigint,
  flag180 bigint,
  flagTimeBucket string
)
partitioned by(dt string)
STORED AS PARQUET;
2.4 创建订单时间标志、地址标志宽表 fact_order_wide

字段名称

说明

othername

地址标志(家里、学校、工作单位…)

代码语言:javascript
复制
-- 地址拉宽
drop table `itcast_dw`.`fact_order_wide`;
create table `itcast_dw`.`fact_order_wide`(
  orderId bigint,
  orderStatus bigint,
  payType bigint,
  userId bigint,
  userAddressId bigint,
  payTime string,
  payMoney double,
  createtime string,
  flag30 bigint,
  flag60 bigint,
  flag90 bigint,
  flag180 bigint,
  flagTimeBucket string,
  othername string
)
partitioned by(dt string)
STORED AS PARQUET;

3. 订单宽表ETL处理

3.1 加载用户维度数据:
代码语言:javascript
复制
insert overwrite table `itcast_dw`.`dim_user` partition(dt='20190908')
select 
  userId,   
  loginName,
  userSex  
from
  `itcast_ods`.`itcast_users` ;
   
--验证
select * from itcast_dw.dim_user limit 10;
3.2 导入订单数据:
代码语言:javascript
复制
insert overwrite table `itcast_dw`.`tmp_order_wide` partition (dt='20190908')
select
  orderid,
  orderstatus,
  paytype,
  userid,
  useraddressid,
  paytime,
  totalmoney,
  createtime
from `itcast_ods`.`itcast_orders` where dt='20190908' ;

-- 测试
select * from `itcast_dw`.`tmp_order_wide` limit 10;
3.3 时间近30天、90天、180天、订单上午、下午时间拉宽
代码语言:javascript
复制
insert overwrite table `itcast_dw`.`tmp_order_datetag_wide` partition(dt='20190908')
select
  orderId,
  orderStatus,
  payType,
  userId,
  userAddressId,
  payTime,
  paymoney,
  createtime,
  case when datediff(current_timestamp, createtime) <= 30
      then 1
  else 0
  end as flag_30,
  case when datediff(current_timestamp, createtime) <= 60
      then 1
  else 0
  end as flag_60,
  case when datediff(current_timestamp, createtime) <= 90
      then 1
  else 0
  end as flag_90, 
  case when datediff(current_timestamp, createtime) <= 180
  then 1
  else 0
  end as flag_180,
  case when hour(createtime) >= 0 and hour(createtime) < 6
      then '凌晨'
  when hour(createtime) >= 6 and hour(createtime) < 12
      then '上午'
  when hour(createtime) >= 12 and hour(createtime) < 14
      then '中午'
  when hour(createtime) >= 14 and hour(createtime) < 18
      then '下午'
  else '晚上'
  end as flag_time_bucket
from 
  `itcast_dw`.`tmp_order_wide`
where dt='20190908';
       
-- 测试语句
select * from `itcast_dw`.`tmp_order_datetag_wide` limit 5; 
3.4 与地址表合并加入收货地址信息
代码语言:javascript
复制
--创建dw层dim_user_address表
drop table if exists `itcast_dw`.`dim_user_address`;
create table `itcast_dw`.`dim_user_address`(
  addressId    bigint,
  userId       bigint,
  userName     string,
  otherName   string,
  userPhone   string,
  areaIdPath   string,
  areaId       bigint,
  userAddress string,
  isDefault    bigint,
  dataFlag     bigint,
  createTime   string
)
partitioned by (dt string)
STORED AS PARQUET;
​
--从ods层itcast_user_address导出数据到dim_user_address表
insert overwrite table `itcast_dw`.`dim_user_address` partition(dt="20190908")
select 
addressId,
userId,
userName, 
otherName,
userPhone,
areaIdPath,
areaId,
userAddress,
isDefault,
dataFlag, 
createTime 
from `itcast_ods`.`itcast_user_address` where dt="20190908";
​
​
--地址表合并加入收货地址信息
insert overwrite table `itcast_dw`.`fact_order_wide` partition(dt='20190909')
select
  t1.orderId,
  t1.orderStatus,
  t1.payType,
  t1.userId,
  t1.userAddressId,
  t1.payTime,
  t1.paymoney,
  t1.createtime,
  t1.flag30,
  t1.flag60,
  t1.flag90,
  t1.flag180,
  t1.flagTimeBucket,
  t2.othername
from
  (select * from `itcast_dw`.`tmp_order_datetag_wide` where dt='20190909') t1
  left join
  (select * from `itcast_dw`.`dim_user_address` where dt='20190909') t2
   on t1.userAddressId = t2.addressId;
​
-- 测试
select * from `itcast_dw`.`fact_order_wide` limit 10;

4. 指标开发

4.1 指标开发一

指标

字段说明

第一次消费时间

支付时间

最近一次消费时间

支付时间

首单距今时间

支付时间

尾单距今时间------分析用户什么时候来购买商品以及多久没有购买了

支付时间

最小消费金额

订单支付金额

最大消费金额

订单支付金额

参考代码:

代码语言:javascript
复制
select
  t1.userid,
  t1.loginname,
  MIN(t2.payTime) as first_paytime, --首次下单时间
  MAX(t2.payTime) as lastest_paytime, --尾单时间
  DATEDIFF(CURRENT_TIMESTAMP, MIN(t2.payTime)) as first_day_during_days,--首单距今
  DATEDIFF(CURRENT_TIMESTAMP, MAX(t2.payTime)) as lastest_day_durning_days, --尾单距今
  MIN(t2.paymoney) as min_paymoney,
  MAX(t2.paymoney) as max_paymoney
from
  (select * from `itcast_dw`.`fact_order_wide` where dt='20190909') as t2
  left join
    (select * from `itcast_dw`.`dim_user` where dt='20190909') as t1
   on t1.userId = t2.userId
group by t1.userid,t1.loginname
limit 5;
4.2 指标开发二

指标

累计消费次数(不含退拒)

累计消费金额(不含退拒)

近30天购买次数(不含退拒)

近30天购买金额(不含退拒)

近30天购买次数(含退拒)

近30天购买金额(含退拒)----分析用户最近的消费能力

参考代码:

代码语言:javascript
复制
select
  t2.userid,
  t2.loginname,
  MIN(t1.payTime) as first_paytime,
  MAX(t1.payTime) as lastest_paytime,
  DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,
  DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,
  MIN(t1.paymoney) as min_paymoney,
  MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=-3  --订单状态 -3:用户拒收 -2:未付款的订单 -1:用户取消 0:待发货 1:配送中 2:用户确认收货
  then 1
  else 0
  end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus != -3
  then t1.paymoney
  else 0
  end
  ) as total_money_without_back, --累计消费金额不含退拒
  --累计近30天消费次数不含退拒
  sum(case when t1.flag30 =1 and t1.orderstatus != -3
    then 1
      else 0
      end
    ) as total_count_without_back_30,
   --累计近30天消费金额不含退拒
    sum(case when t1.flag30 =1 and t1.orderstatus != -3
    then t1.paymoney
      else 0
      end
    ) as total_money_without_back_30,
       --累计近30天消费次数含退拒
  sum(case when t1.flag30 =1 
    then 1
      else 0
      end
    ) as total_count_without_30,
   --累计近30天消费金额含退拒
    sum(case when t1.flag30 =1 
    then t1.paymoney
      else 0
      end
    ) as total_money_with_back_30
     
from 
(select * from itcast_dw.fact_order_wide where dt="20190909") t1 
left join  
(select * from itcast_dw.dim_user where dt="20190909") t2 on 
t1.userid=t2.userid
group by t2.userid,t2.loginname limit 5;
4.3 指标开发三

指标

客单价(含退拒)

客单价(不含退拒)

近60天客单价(含退拒)

近60天客单价(不含退拒)

参考代码:

代码语言:javascript
复制
-- 指标开发三
-- 1. 客单价(含退拒)
-- 2. 客单价(不含退拒)
-- 3. 近60天客单价(含退拒)-----分析用户消费水平
-- 4. 近60天客单价(不含退拒)
​
select
  t2.userid,
  t2.loginname,
  MIN(t1.payTime) as first_paytime,
  MAX(t1.payTime) as lastest_paytime,
  DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,
  DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,
  MIN(t1.paymoney) as min_paymoney,
  MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus != -3
  then 1
  else 0
  end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus != -3
  then t1.paymoney
  else 0
  end
  ) as total_money_without_back, --累计消费金额不含退拒
  --累计近30天消费次数不含退拒
  sum(case when t1.flag30 =1 and t1.orderstatus != -3
    then 1
      else 0
      end
    ) as total_count_without_back_30,
   --累计近30天消费金额不含退拒
    sum(case when t1.flag30 =1 and t1.orderstatus != -3
    then t1.paymoney
      else 0
      end
    ) as total_money_without_back_30,
       --累计近30天消费次数含退拒
  sum(case when t1.flag30 =1 
    then 1
      else 0
      end
    ) as total_count_without_30,
   --累计近30天消费金额含退拒
    sum(case when t1.flag30 =1 
    then t1.paymoney
      else 0
      end
    ) as total_money_with_back_30,
     --客单价含退拒
    SUM(t1.paymoney) / SUM(1) AS atv,
      --客单价不含退拒
  SUM(case when t1.orderStatus !=-3  then t1.paymoney else 0 end) / 
      SUM(case when t1.orderStatus !=-3  then 1 else 0 end) AS atv_withoutback,
             --近60天客单价含退拒
  SUM(case when t1.flag60 = 1 
      then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1
          then 1
          else 0
        end) AS atv_60,
         --近60天不含退拒
  SUM(case when t1.orderStatus !=-3  and t1.flag60 = 1
      then t1.paymoney
      else 0
      end) / SUM(case when (t1.orderStatus !=-3 ) and t1.flag60 = 1
      then 1
      else 0
    end) AS atv_60_withoutback
     
from 
(select * from itcast_dw.fact_order_wide where dt="20190909") t1 
left join  
(select * from itcast_dw.dim_user where dt="20190909") t2 on 
t1.userid=t2.userid
group by t2.userid,t2.loginname limit 5;
4.4 指标开发四

指标

常用收货地址

4.4.1 加载订单地址分析表

参考代码:

代码语言:javascript
复制
-- 创建订单地址分析表
drop table if exists `itcast_ads`.`tmp_order_address`;
create table `itcast_ads`.`tmp_order_address`(
  userId bigint,          -- 用户Id
  otherName string,       -- 地址类型(家里、学校...)
  totalCount bigint,     -- 下单数
  rn bigint       -- 下单排名
)
partitioned by (dt string)
STORED AS PARQUET;
​
--从tmp_order_datetag_wide统计最常用地址
insert overwrite table `itcast_ads`.`tmp_order_address` partition(dt='20190909')
select
t3.userid,
t3.othername,
t3.ordercount,
row_number() over( partition by t3.userid order by ordercount desc ) rn  --partiton by userid:按照用户分组,order by ordercount :按照订单数量排序 降序 ,rn:组内的排序
from 
(select
  t1.userId as userid,
  t1.othername as othername,
  count(t1.orderid) as ordercount  -->每个用户每个订单的数量
from
  (select * from `itcast_dw`.`fact_order_wide` where dt='20190909') t1
  
group by t1.userid,t1.otherName order by t1.userid ) t3 ;
​
-- 测试
select * from `itcast_ads`.`tmp_order_address` order by userId, rn limit 10;
4.4.2 统计常用收货地址指标

参考代码:

代码语言:javascript
复制
select
  t2.userid,
  t2.loginname,
  MIN(t1.payTime) as first_paytime,
  MAX(t1.payTime) as lastest_paytime,
  DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,
  DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,
  MIN(t1.paymoney) as min_paymoney,
  MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11
  then 1
  else 0
  end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11
  then t1.paymoney
  else 0
  end
  ) as total_money_without_back, --累计消费金额不含退拒
  --累计近30天消费次数不含退拒
  sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
    then 1
      else 0
      end
    ) as total_count_without_back_30,
   --累计近30天消费金额不含退拒
    sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
    then t1.paymoney
      else 0
      end
    ) as total_money_without_back_30,
       --累计近30天消费次数含退拒
  sum(case when t1.flag30 =1
    then 1
      else 0
      end
    ) as total_count_without_30,
   --累计近30天消费金额含退拒
    sum(case when t1.flag30 =1 
    then t1.paymoney
      else 0
      end
    ) as total_money_with_back_30,
    SUM(t1.paymoney) / SUM(1) AS atv,
  SUM(case when t1.orderStatus !=-3  
      then t1.paymoney else 0 end) / 
      SUM(case when t1.orderStatus !=-3  then 1 else 0
            end) AS atv_withoutback,
  SUM(case when t1.flag60 = 1 
      then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1
          then 1
          else 0
        end) AS atv_60,
  SUM(case when t1.orderStatus !=-3  and t1.flag60 = 1
      then t1.paymoney
      else 0
      end) / SUM(case when (t1.orderStatus !=-3 ) and t1.flag60 = 1
      then 1
      else 0
    end) AS atv_60_withoutback,
     --最常用地址
max(case when t3.rn =1
then t3.othername
else ''
end) as most_usual_address 
     
from 
(select * from itcast_dw.fact_order_wide where dt="20190909") t1 
left join  
(select * from itcast_dw.dim_user where dt="20190909") t2 on 
t1.userid=t2.userid
  left join
  (select * from `itcast_ads`.`tmp_order_address` where dt='20190909') as t3
   on t1.userId = t3.userId
group by t2.userid,t2.loginname limit 5;
4.5 指标开发五

指标

常用支付方式:某种支付方式对应的订单数量越多则越常用

更新mysql表中的模拟数据:

代码语言:javascript
复制
SET FOREIGN_KEY_CHECKS=0;
​
-- ----------------------------
-- Table structure for `itcast_payments`
-- ----------------------------
DROP TABLE IF EXISTS `itcast_payments`;
CREATE TABLE `itcast_payments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`payCode` varchar(20) DEFAULT NULL,
`payName` varchar(255) DEFAULT NULL,
`payDesc` text,
`payOrder` int(11) DEFAULT '0',
`payConfig` text,
`enabled` tinyint(4) DEFAULT '0',
`isOnline` tinyint(4) DEFAULT '0',
`payFor` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `payCode` (`payCode`,`enabled`,`isOnline`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;
​
-- ----------------------------
-- Records of itcast_payments
-- ----------------------------
INSERT INTO `itcast_payments` VALUES ('0', 'unkown', '未知方式', '未知', '0', null, '0', '0', null);
INSERT INTO `itcast_payments` VALUES ('1', 'alipays', '支付宝(及时到帐)', '支付宝(及时到帐)', '4', '', '0', '1', '1,2,4');
INSERT INTO `itcast_payments` VALUES ('2', 'weixinpays', '微信支付', '微信支付', '0', '', '0', '1', '1,2,3');
INSERT INTO `itcast_payments` VALUES ('3', 'wallets', '余额支付', '余额支付', '5', '', '1', '1', '1,2,3,4');
INSERT INTO `itcast_payments` VALUES ('4', 'cod', '货到付款', '开通城市', '1', '', '1', '0', '1,2,3,4');
​
--更新ods层数据
4.5.1 创建dw层dim_payments表
代码语言:javascript
复制
drop table if exists `itcast_dw`.`dim_payments`;
create table `itcast_dw`.`dim_payments`(
  id         bigint,
  payCode   string,
  payName   string,
  payDesc   string,
  payOrder   bigint,
  payConfig string,
  enabled    bigint,
  isOnline   bigint,
  payFor     string
)
partitioned by (dt string)
STORED AS PARQUET;
--从ods层itcast_payments导出数据到dw层dim_payments表中
insert overwrite table `itcast_dw`.`dim_payments` partition(dt="20190909")
select 
id       ,
payCode ,
payName ,
payDesc ,
payOrder ,
payConfig,
enabled ,
isOnline ,
payFor   
from `itcast_ods`.`itcast_payments` where dt="20190909";
1、加载支付方式排名
-- 创建支付方式分析表
drop table if exists `itcast_ads`.`tmp_order_paytype`;
create table `itcast_ads`.`tmp_order_paytype`(
  userid bigint,              -- 用户id
  payType bigint,           -- 支付类型id
  payCode string,             -- 支付码
  totalCount bigint,          -- 订单总数
  rn bigint                   -- 等级
)
partitioned by (dt string)
STORED AS PARQUET;
​
-- 加载支付方式分析
insert overwrite table `itcast_ads`.`tmp_order_paytype` partition(dt='20190909')
select
  t3.*,
  row_number() over(partition by usrId order by totalCount desc) rn
from
  (select
      t1.userId,
      t1.payType,
      t2.payCode,
      sum(1) as totalCount   --sum(1)等同于count效果
   from
      `itcast_dw`.`dim_payments` t2
      right join
      `itcast_dw`.`tmp_order_datetag_wide` t1
       on t2.id = t1.payType
   group by t1.userId, t1.payType, t2.payCode) t3;
   
-- 测试
select * from `itcast_ads`.`tmp_order_paytype` limit 5;
4.5.2 统计常用支付方式指标

参考代码:

代码语言:javascript
复制
select
  t2.userid,
  t2.loginname,
  MIN(t1.payTime) as first_paytime,
  MAX(t1.payTime) as lastest_paytime,
  DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,
  DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,
  MIN(t1.paymoney) as min_paymoney,
  MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11
  then 1
  else 0
  end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11
  then t1.paymoney
  else 0
  end
  ) as total_money_without_back, --累计消费金额不含退拒
  --累计近30天消费次数不含退拒
  sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
    then 1
      else 0
      end
    ) as total_count_without_back_30,
   --累计近30天消费金额不含退拒
    sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
    then t1.paymoney
      else 0
      end
    ) as total_money_without_back_30,
       --累计近30天消费次数含退拒
  sum(case when t1.flag30 =1 
    then 1
      else 0
      end
    ) as total_count_without_30,
   --累计近30天消费金额含退拒
    sum(case when t1.flag30 =1 
    then t1.paymoney
      else 0
      end
    ) as total_money_with_back_30,
    SUM(t1.paymoney) / SUM(1) AS atv,
  SUM(case when t1.orderStatus !=-3  
      then t1.paymoney else 0 end) / 
      SUM(case when t1.orderStatus !=-3  then 1 else 0
            end) AS atv_withoutback,
  SUM(case when t1.flag60 = 1 
      then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1
          then 1
          else 0
        end) AS atv_60,
  SUM(case when t1.orderStatus !=-3  and t1.flag60 = 1
      then t1.paymoney
      else 0
      end) / SUM(case when (t1.orderStatus !=-3 ) and t1.flag60 = 1
      then 1
      else 0
    end) AS atv_60_withoutback,
     --最常用地址
max(case when t3.rn =1
then t3.othername
else ''
end) as most_usual_address,
--常用的支付方式
    MAX(case when t4.rn = 1
      then t4.payCode
      else ''
      end) as most_usual_paytype
from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1 
left join  
(select * from itcast_dw.dim_user where dt="20190909") t2 on 
t1.userid=t2.userid
  left join
  (select * from `itcast_ads`.`tmp_order_address` where dt='20190909') as t3
   on t1.userId = t3.userId
  left join (select * from `itcast_ads`.`tmp_order_paytype` where dt='20190909') as t4
   on t1.userId = t4.userId
group by t2.userid,t2.loginname limit 5;
4.6 指标开发六

指标

学校下单总数

单位下单总数

家里下单总数

上午下单总数

下午下单总数

晚上下单总数

参考代码

代码语言:javascript
复制
select
  t2.userid,
  t2.loginname,
  MIN(t1.payTime) as first_paytime,
  MAX(t1.payTime) as lastest_paytime,
  DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,
  DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,
  MIN(t1.paymoney) as min_paymoney,
  MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11
  then 1
  else 0
  end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11
  then t1.paymoney
  else 0
  end
  ) as total_money_without_back, --累计消费金额不含退拒
  --累计近30天消费次数不含退拒
  sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
    then 1
      else 0
      end
    ) as total_count_without_back_30,
   --累计近30天消费金额不含退拒
    sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
    then t1.paymoney
      else 0
      end
    ) as total_money_without_back_30,
       --累计近30天消费次数含退拒
  sum(case when t1.flag30 =1 
    then 1
      else 0
      end
    ) as total_count_without_30,
   --累计近30天消费金额含退拒
    sum(case when t1.flag30 =1 
    then t1.paymoney
      else 0
      end
    ) as total_money_with_back_30,
    SUM(t1.paymoney) / SUM(1) AS atv,
  SUM(case when t1.orderStatus !=-3  
      then t1.paymoney else 0 end) / 
      SUM(case when t1.orderStatus !=-3  then 1 else 0
            end) AS atv_withoutback,
  SUM(case when t1.flag60 = 1 
      then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1
          then 1
          else 0
        end) AS atv_60,
  SUM(case when t1.orderStatus !=-3  and t1.flag60 = 1
      then t1.paymoney
      else 0
      end) / SUM(case when (t1.orderStatus !=-3 ) and t1.flag60 = 1
      then 1
      else 0
    end) AS atv_60_withoutback,
     --最常用地址
      max(case when t3.rn =1
      then t3.othername
        else ''
      end) as most_usual_address,
      --最常用支付方式
    max(case when t4.rn = 1
      then t4.payCode
      else ''
      end) as most_usual_paytype,
      SUM(case when t1.otherName = '学校'
      then 1
      else 0
      end) as school_order_count,      -- 学校下单总数
  SUM(case when t1.otherName = '单位'
      then 1
      else 0
      end) as company_order_count,    -- 单位下单总数
  SUM(case when t1.otherName = '家里'
      then 1
      else 0
      end) as home_order_count,        -- 家里下单总数
  SUM(case when t1.flagTimeBucket = '上午'
      then 1
      else 0
      end) as am_order_count,          -- 上午下单总数
  SUM(case when t1.flagTimeBucket = '下午'
      then 1
      else 0
      end) as pm_order_count,          -- 下午下单总数
  SUM(case when t1.flagTimeBucket = '晚上'
      then 1
      else 0
      end) as night_order_count-- 晚上下单总数
from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1 
left join  
(select * from itcast_dw.dim_user where dt="20190909") t2 on 
t1.userid=t2.userid
  left join
  (select * from `itcast_ads`.`tmp_order_address` where dt='20190909') as t3
   on t1.userId = t3.userId
  left join (select * from `itcast_ads`.`tmp_order_paytype` where dt='20190909') as t4
   on t1.userId = t4.userId
group by t2.userid,t2.loginname limit 5;
4.7 创建ads层表、加载数据
代码语言:javascript
复制
drop table if exists `itcast_ads`.`tmp_user_order_measure`;
create table `itcast_ads`.`tmp_user_order_measure`(
  userid string,                          -- 用户id
  username string,                        -- 用户名称
  first_paytime string,                   -- 第一次消费时间
  lastest_paytime string,                 -- 最近一次消费时间
  first_day_during_days bigint,           -- 首单距今时间
  lastest_day_durning_days bigint,        -- 尾单距今时间
  min_paymoney double,                    -- 最小消费金额
  max_paymoney double,                    -- 最大消费金额
  total_count_without_back bigint,        -- 累计消费次数(不含退拒)
  total_money_without_back double,        -- 累计消费金额(不含退拒)
  total_count_without_back_30 bigint,     -- 近30天累计消费次数(不含退拒)
  total_money_without_back_30 double,     -- 近30天累计消费金额(不含退拒)
  total_count_30 bigint,                  -- 近30天累计消费次数(含退拒)
  total_money_30 double,                  -- 近30天累计消费金额(含退拒)
  atv double,                             -- 客单价(含退拒)
  atv_withoutback double,                 -- 客单价(不含退拒)
  atv_60 double,                          -- 近60天客单价(含退拒)
  atv_60_withoutback double,              -- 近60天客单价(不含退拒)
  most_usual_address string,              -- 常用收货地址
  most_usual_paytype string,              -- 常用支付方式
  school_order_count bigint,              -- 学校下单总数
  company_order_count bigint,             -- 单位下单总数
  home_order_count bigint,                -- 家里下单总数
  am_order_count bigint,                  -- 上午下单总数
  pm_order_count bigint,                  -- 下午下单总数
  night_order_count bigint                -- 晚上下单总数
)
partitioned by (dt string)
STORED AS PARQUET;
   
​
insert overwrite table `itcast_ads`.`tmp_user_order_measure` partition (dt='20190909')
select
  t2.userid,
  t2.loginname,
  MIN(t1.payTime) as first_paytime,
  MAX(t1.payTime) as lastest_paytime,
  DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,
  DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,
  MIN(t1.paymoney) as min_paymoney,
  MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11
  then 1
  else 0
  end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11
  then t1.paymoney
  else 0
  end
  ) as total_money_without_back, --累计消费金额不含退拒
  --累计近30天消费次数不含退拒
  sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
    then 1
      else 0
      end
    ) as total_count_without_back_30,
   --累计近30天消费金额不含退拒
    sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
    then t1.paymoney
      else 0
      end
    ) as total_money_without_back_30,
       --累计近30天消费次数含退拒
  sum(case when t1.flag30 =1 
    then 1
      else 0
      end
    ) as total_count_without_30,
   --累计近30天消费金额含退拒
    sum(case when t1.flag30 =1 
    then t1.paymoney
      else 0
      end
    ) as total_money_with_back_30,
    SUM(t1.paymoney) / SUM(1) AS atv,
  SUM(case when t1.orderStatus !=-3  
      then t1.paymoney else 0 end) / 
      SUM(case when t1.orderStatus !=-3  then 1 else 0
            end) AS atv_withoutback,
  SUM(case when t1.flag60 = 1 
      then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1
          then 1
          else 0
        end) AS atv_60,
  SUM(case when t1.orderStatus !=-3  and t1.flag60 = 1
      then t1.paymoney
      else 0
      end) / SUM(case when (t1.orderStatus !=-3 ) and t1.flag60 = 1
      then 1
      else 0
    end) AS atv_60_withoutback,
     --最常用地址
      max(case when t3.rn =1
      then t3.othername
        else ''
      end) as most_usual_address,
    max(case when t4.rn = 1
      then t4.payCode
      else ''
      end) as most_usual_paytype,
      SUM(case when t1.otherName = '学校'
      then 1
      else 0
      end) as school_order_count,      -- 学校下单总数
  SUM(case when t1.otherName = '单位'
      then 1
      else 0
      end) as company_order_count,    -- 单位下单总数
  SUM(case when t1.otherName = '家里'
      then 1
      else 0
      end) as home_order_count,        -- 家里下单总数
  SUM(case when t1.flagTimeBucket = '上午'
      then 1
      else 0
      end) as am_order_count,          -- 上午下单总数
  SUM(case when t1.flagTimeBucket = '下午'
      then 1
      else 0
      end) as pm_order_count,          -- 下午下单总数
  SUM(case when t1.flagTimeBucket = '晚上'
      then 1
      else 0
      end) as night_order_count-- 晚上下单总数
from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1 
left join  
(select * from itcast_dw.dim_user where dt="20190909") t2 on 
t1.userid=t2.userid
  left join
  (select * from `itcast_ads`.`tmp_order_address` where dt='20190909') as t3
   on t1.userId = t3.userId
  left join (select * from `itcast_ads`.`tmp_order_paytype` where dt='20190909') as t4
   on t1.userId = t4.userId
group by t2.userid,t2.loginname ;
​
​
-- 测试
select * from `itcast_ads`.`tmp_user_order_measure` limit 10;

关于指标三所需要使用到的表,以及表与表之间的关联关系,大致流程如下所示:

在这里插入图片描述
在这里插入图片描述

通过查询我们最终生成的宽表,可以发现该表的数据已经包含了指标所需要分析的所有数据。

在这里插入图片描述
在这里插入图片描述

看到类似的效果说明我们就成功了!

小结

大数据实战【千亿级数仓】阶段四的内容到这里就结束了。

如果以上过程中出现了任何的纰漏错误,烦请大佬们指正?

受益的朋友或对大数据技术感兴趣的伙伴记得点赞关注支持一波?

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2020-05-20 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 用户订单指标业务开发
    • 1. 需求分析
      • 2. 创建dw层表
        • 2.1创建itcast_dw.dim_user表
        • 2.2 创建订单临时表tmp_order_wide
        • 2.3 创建订单时间标志宽表tmp_order_datetag_wide
        • 2.4 创建订单时间标志、地址标志宽表 fact_order_wide
      • 3. 订单宽表ETL处理
        • 3.1 加载用户维度数据:
        • 3.2 导入订单数据:
        • 3.3 时间近30天、90天、180天、订单上午、下午时间拉宽
        • 3.4 与地址表合并加入收货地址信息
      • 4. 指标开发
        • 4.1 指标开发一
        • 4.2 指标开发二
        • 4.3 指标开发三
        • 4.4 指标开发四
        • 4.5 指标开发五
        • 4.6 指标开发六
        • 4.7 创建ads层表、加载数据
    • 小结
    相关产品与服务
    大数据
    全栈大数据产品,面向海量数据场景,帮助您 “智理无数,心中有数”!
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档