写在前面: 博主是一名软件工程系大数据应用开发专业大二的学生,昵称来源于《爱丽丝梦游仙境》中的Alice和自己的昵称。作为一名互联网小白,
写博客一方面是为了记录自己的学习历程,一方面是希望能够帮助到很多和自己一样处于起步阶段的萌新
。由于水平有限,博客中难免会有一些错误,有纰漏之处恳请各位大佬不吝赐教!个人小站:http://alices.ibilibili.xyz/ , 博客主页:https://alice.blog.csdn.net/ 尽管当前水平可能不及各位大佬,但我还是希望自己能够做得更好,因为一天的生活就是一生的缩影
。我希望在最美的年华,做最好的自己
!
本篇博客,博主为大家带来的是关于大数据实战【千亿级数仓】阶段四的内容。
在该阶段中,我们需要编写SQL实现以下业务分析
括号中的数字代表的是每个指标的需求数量。四个指标,分别从订单时间维度,地域分类维度,用户消费维度,退货维度来对数据展开分析。
下面我们就以第三个指标,为大家演示如何完成用户订单指标业务开发。
电商平台往往需要根据用户的购买数据来分析用户的行为,此处。我们基于用户的订单情况进行一些统计分析,用于将来的用户行为分析。根据用户的消费行为习惯,对运营部门提供用户分析数据指标。表是订单表!! 以下为本需求需要统计的基于用户的订单指标:
以下为本需求需要统计的基于用户的订单指标:
字段名称 | 说明 |
---|---|
userId | 用户id |
loginName | 登录名 |
userSex | 性别 |
drop table if exists `itcast_dw`.`dim_user`;
create table `itcast_dw`.`dim_user`(
userId bigint,
loginName string,
userSex bigint
)
partitioned by (dt string)
STORED AS PARQUET;
在dw层创建 订单临时表tmp_order_wide
字段名称 | 说明 |
---|---|
orderId | 订单ID |
orderStatus | 订单状态 |
payType | 支付类型 |
userId | 用户id |
userAddressId | 收货地址id |
payTime | 支付时间 |
payPrice | 支付金额 |
createtime | 创建时间 |
-- 订单临时订单表
drop table if exists `itcast_dw`.`tmp_order_wide`;
create table `itcast_dw`.`tmp_order_wide`(
orderId bigint,
orderStatus bigint,
payType bigint,
userId bigint,
userAddressId bigint,
payTime string,
payMoney double,
createtime string
)
partitioned by(dt string)
STORED AS PARQUET;
字段名称 | 说明 |
---|---|
… | … |
flag30 | 近30天标志 |
flag60 | 近60天标志 |
flag90 | 近90天标志 |
flag180 | 近180天标志 |
flagTimeBucket | 一天时间段标志(凌晨、早晨、上午… |
drop table `itcast_dw`.`tmp_order_datetag_wide`;
create table `itcast_dw`.`tmp_order_datetag_wide`(
orderId bigint,
orderStatus bigint,
payType bigint,
userId bigint,
userAddressId bigint,
payTime string,
payMoney double,
createtime string,
flag30 bigint,
flag60 bigint,
flag90 bigint,
flag180 bigint,
flagTimeBucket string
)
partitioned by(dt string)
STORED AS PARQUET;
字段名称 | 说明 |
---|---|
… | … |
othername | 地址标志(家里、学校、工作单位…) |
-- 地址拉宽
drop table `itcast_dw`.`fact_order_wide`;
create table `itcast_dw`.`fact_order_wide`(
orderId bigint,
orderStatus bigint,
payType bigint,
userId bigint,
userAddressId bigint,
payTime string,
payMoney double,
createtime string,
flag30 bigint,
flag60 bigint,
flag90 bigint,
flag180 bigint,
flagTimeBucket string,
othername string
)
partitioned by(dt string)
STORED AS PARQUET;
insert overwrite table `itcast_dw`.`dim_user` partition(dt='20190908')
select
userId,
loginName,
userSex
from
`itcast_ods`.`itcast_users` ;
--验证
select * from itcast_dw.dim_user limit 10;
insert overwrite table `itcast_dw`.`tmp_order_wide` partition (dt='20190908')
select
orderid,
orderstatus,
paytype,
userid,
useraddressid,
paytime,
totalmoney,
createtime
from `itcast_ods`.`itcast_orders` where dt='20190908' ;
-- 测试
select * from `itcast_dw`.`tmp_order_wide` limit 10;
insert overwrite table `itcast_dw`.`tmp_order_datetag_wide` partition(dt='20190908')
select
orderId,
orderStatus,
payType,
userId,
userAddressId,
payTime,
paymoney,
createtime,
case when datediff(current_timestamp, createtime) <= 30
then 1
else 0
end as flag_30,
case when datediff(current_timestamp, createtime) <= 60
then 1
else 0
end as flag_60,
case when datediff(current_timestamp, createtime) <= 90
then 1
else 0
end as flag_90,
case when datediff(current_timestamp, createtime) <= 180
then 1
else 0
end as flag_180,
case when hour(createtime) >= 0 and hour(createtime) < 6
then '凌晨'
when hour(createtime) >= 6 and hour(createtime) < 12
then '上午'
when hour(createtime) >= 12 and hour(createtime) < 14
then '中午'
when hour(createtime) >= 14 and hour(createtime) < 18
then '下午'
else '晚上'
end as flag_time_bucket
from
`itcast_dw`.`tmp_order_wide`
where dt='20190908';
-- 测试语句
select * from `itcast_dw`.`tmp_order_datetag_wide` limit 5;
--创建dw层dim_user_address表
drop table if exists `itcast_dw`.`dim_user_address`;
create table `itcast_dw`.`dim_user_address`(
addressId bigint,
userId bigint,
userName string,
otherName string,
userPhone string,
areaIdPath string,
areaId bigint,
userAddress string,
isDefault bigint,
dataFlag bigint,
createTime string
)
partitioned by (dt string)
STORED AS PARQUET;
--从ods层itcast_user_address导出数据到dim_user_address表
insert overwrite table `itcast_dw`.`dim_user_address` partition(dt="20190908")
select
addressId,
userId,
userName,
otherName,
userPhone,
areaIdPath,
areaId,
userAddress,
isDefault,
dataFlag,
createTime
from `itcast_ods`.`itcast_user_address` where dt="20190908";
--地址表合并加入收货地址信息
insert overwrite table `itcast_dw`.`fact_order_wide` partition(dt='20190909')
select
t1.orderId,
t1.orderStatus,
t1.payType,
t1.userId,
t1.userAddressId,
t1.payTime,
t1.paymoney,
t1.createtime,
t1.flag30,
t1.flag60,
t1.flag90,
t1.flag180,
t1.flagTimeBucket,
t2.othername
from
(select * from `itcast_dw`.`tmp_order_datetag_wide` where dt='20190909') t1
left join
(select * from `itcast_dw`.`dim_user_address` where dt='20190909') t2
on t1.userAddressId = t2.addressId;
-- 测试
select * from `itcast_dw`.`fact_order_wide` limit 10;
指标 | 字段说明 |
---|---|
第一次消费时间 | 支付时间 |
最近一次消费时间 | 支付时间 |
首单距今时间 | 支付时间 |
尾单距今时间------分析用户什么时候来购买商品以及多久没有购买了 | 支付时间 |
最小消费金额 | 订单支付金额 |
最大消费金额 | 订单支付金额 |
参考代码:
select
t1.userid,
t1.loginname,
MIN(t2.payTime) as first_paytime, --首次下单时间
MAX(t2.payTime) as lastest_paytime, --尾单时间
DATEDIFF(CURRENT_TIMESTAMP, MIN(t2.payTime)) as first_day_during_days,--首单距今
DATEDIFF(CURRENT_TIMESTAMP, MAX(t2.payTime)) as lastest_day_durning_days, --尾单距今
MIN(t2.paymoney) as min_paymoney,
MAX(t2.paymoney) as max_paymoney
from
(select * from `itcast_dw`.`fact_order_wide` where dt='20190909') as t2
left join
(select * from `itcast_dw`.`dim_user` where dt='20190909') as t1
on t1.userId = t2.userId
group by t1.userid,t1.loginname
limit 5;
指标 |
---|
累计消费次数(不含退拒) |
累计消费金额(不含退拒) |
近30天购买次数(不含退拒) |
近30天购买金额(不含退拒) |
近30天购买次数(含退拒) |
近30天购买金额(含退拒)----分析用户最近的消费能力 |
参考代码:
select
t2.userid,
t2.loginname,
MIN(t1.payTime) as first_paytime,
MAX(t1.payTime) as lastest_paytime,
DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,
DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,
MIN(t1.paymoney) as min_paymoney,
MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=-3 --订单状态 -3:用户拒收 -2:未付款的订单 -1:用户取消 0:待发货 1:配送中 2:用户确认收货
then 1
else 0
end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus != -3
then t1.paymoney
else 0
end
) as total_money_without_back, --累计消费金额不含退拒
--累计近30天消费次数不含退拒
sum(case when t1.flag30 =1 and t1.orderstatus != -3
then 1
else 0
end
) as total_count_without_back_30,
--累计近30天消费金额不含退拒
sum(case when t1.flag30 =1 and t1.orderstatus != -3
then t1.paymoney
else 0
end
) as total_money_without_back_30,
--累计近30天消费次数含退拒
sum(case when t1.flag30 =1
then 1
else 0
end
) as total_count_without_30,
--累计近30天消费金额含退拒
sum(case when t1.flag30 =1
then t1.paymoney
else 0
end
) as total_money_with_back_30
from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1
left join
(select * from itcast_dw.dim_user where dt="20190909") t2 on
t1.userid=t2.userid
group by t2.userid,t2.loginname limit 5;
指标 |
---|
客单价(含退拒) |
客单价(不含退拒) |
近60天客单价(含退拒) |
近60天客单价(不含退拒) |
参考代码:
-- 指标开发三
-- 1. 客单价(含退拒)
-- 2. 客单价(不含退拒)
-- 3. 近60天客单价(含退拒)-----分析用户消费水平
-- 4. 近60天客单价(不含退拒)
select
t2.userid,
t2.loginname,
MIN(t1.payTime) as first_paytime,
MAX(t1.payTime) as lastest_paytime,
DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,
DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,
MIN(t1.paymoney) as min_paymoney,
MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus != -3
then 1
else 0
end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus != -3
then t1.paymoney
else 0
end
) as total_money_without_back, --累计消费金额不含退拒
--累计近30天消费次数不含退拒
sum(case when t1.flag30 =1 and t1.orderstatus != -3
then 1
else 0
end
) as total_count_without_back_30,
--累计近30天消费金额不含退拒
sum(case when t1.flag30 =1 and t1.orderstatus != -3
then t1.paymoney
else 0
end
) as total_money_without_back_30,
--累计近30天消费次数含退拒
sum(case when t1.flag30 =1
then 1
else 0
end
) as total_count_without_30,
--累计近30天消费金额含退拒
sum(case when t1.flag30 =1
then t1.paymoney
else 0
end
) as total_money_with_back_30,
--客单价含退拒
SUM(t1.paymoney) / SUM(1) AS atv,
--客单价不含退拒
SUM(case when t1.orderStatus !=-3 then t1.paymoney else 0 end) /
SUM(case when t1.orderStatus !=-3 then 1 else 0 end) AS atv_withoutback,
--近60天客单价含退拒
SUM(case when t1.flag60 = 1
then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1
then 1
else 0
end) AS atv_60,
--近60天不含退拒
SUM(case when t1.orderStatus !=-3 and t1.flag60 = 1
then t1.paymoney
else 0
end) / SUM(case when (t1.orderStatus !=-3 ) and t1.flag60 = 1
then 1
else 0
end) AS atv_60_withoutback
from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1
left join
(select * from itcast_dw.dim_user where dt="20190909") t2 on
t1.userid=t2.userid
group by t2.userid,t2.loginname limit 5;
指标 | |
---|---|
常用收货地址 |
参考代码:
-- 创建订单地址分析表
drop table if exists `itcast_ads`.`tmp_order_address`;
create table `itcast_ads`.`tmp_order_address`(
userId bigint, -- 用户Id
otherName string, -- 地址类型(家里、学校...)
totalCount bigint, -- 下单数
rn bigint -- 下单排名
)
partitioned by (dt string)
STORED AS PARQUET;
--从tmp_order_datetag_wide统计最常用地址
insert overwrite table `itcast_ads`.`tmp_order_address` partition(dt='20190909')
select
t3.userid,
t3.othername,
t3.ordercount,
row_number() over( partition by t3.userid order by ordercount desc ) rn --partiton by userid:按照用户分组,order by ordercount :按照订单数量排序 降序 ,rn:组内的排序
from
(select
t1.userId as userid,
t1.othername as othername,
count(t1.orderid) as ordercount -->每个用户每个订单的数量
from
(select * from `itcast_dw`.`fact_order_wide` where dt='20190909') t1
group by t1.userid,t1.otherName order by t1.userid ) t3 ;
-- 测试
select * from `itcast_ads`.`tmp_order_address` order by userId, rn limit 10;
参考代码:
select
t2.userid,
t2.loginname,
MIN(t1.payTime) as first_paytime,
MAX(t1.payTime) as lastest_paytime,
DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,
DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,
MIN(t1.paymoney) as min_paymoney,
MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11
then 1
else 0
end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11
then t1.paymoney
else 0
end
) as total_money_without_back, --累计消费金额不含退拒
--累计近30天消费次数不含退拒
sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
then 1
else 0
end
) as total_count_without_back_30,
--累计近30天消费金额不含退拒
sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
then t1.paymoney
else 0
end
) as total_money_without_back_30,
--累计近30天消费次数含退拒
sum(case when t1.flag30 =1
then 1
else 0
end
) as total_count_without_30,
--累计近30天消费金额含退拒
sum(case when t1.flag30 =1
then t1.paymoney
else 0
end
) as total_money_with_back_30,
SUM(t1.paymoney) / SUM(1) AS atv,
SUM(case when t1.orderStatus !=-3
then t1.paymoney else 0 end) /
SUM(case when t1.orderStatus !=-3 then 1 else 0
end) AS atv_withoutback,
SUM(case when t1.flag60 = 1
then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1
then 1
else 0
end) AS atv_60,
SUM(case when t1.orderStatus !=-3 and t1.flag60 = 1
then t1.paymoney
else 0
end) / SUM(case when (t1.orderStatus !=-3 ) and t1.flag60 = 1
then 1
else 0
end) AS atv_60_withoutback,
--最常用地址
max(case when t3.rn =1
then t3.othername
else ''
end) as most_usual_address
from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1
left join
(select * from itcast_dw.dim_user where dt="20190909") t2 on
t1.userid=t2.userid
left join
(select * from `itcast_ads`.`tmp_order_address` where dt='20190909') as t3
on t1.userId = t3.userId
group by t2.userid,t2.loginname limit 5;
指标 |
---|
常用支付方式:某种支付方式对应的订单数量越多则越常用 |
更新mysql表中的模拟数据:
SET FOREIGN_KEY_CHECKS=0;
-- ----------------------------
-- Table structure for `itcast_payments`
-- ----------------------------
DROP TABLE IF EXISTS `itcast_payments`;
CREATE TABLE `itcast_payments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`payCode` varchar(20) DEFAULT NULL,
`payName` varchar(255) DEFAULT NULL,
`payDesc` text,
`payOrder` int(11) DEFAULT '0',
`payConfig` text,
`enabled` tinyint(4) DEFAULT '0',
`isOnline` tinyint(4) DEFAULT '0',
`payFor` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `payCode` (`payCode`,`enabled`,`isOnline`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;
-- ----------------------------
-- Records of itcast_payments
-- ----------------------------
INSERT INTO `itcast_payments` VALUES ('0', 'unkown', '未知方式', '未知', '0', null, '0', '0', null);
INSERT INTO `itcast_payments` VALUES ('1', 'alipays', '支付宝(及时到帐)', '支付宝(及时到帐)', '4', '', '0', '1', '1,2,4');
INSERT INTO `itcast_payments` VALUES ('2', 'weixinpays', '微信支付', '微信支付', '0', '', '0', '1', '1,2,3');
INSERT INTO `itcast_payments` VALUES ('3', 'wallets', '余额支付', '余额支付', '5', '', '1', '1', '1,2,3,4');
INSERT INTO `itcast_payments` VALUES ('4', 'cod', '货到付款', '开通城市', '1', '', '1', '0', '1,2,3,4');
--更新ods层数据
drop table if exists `itcast_dw`.`dim_payments`;
create table `itcast_dw`.`dim_payments`(
id bigint,
payCode string,
payName string,
payDesc string,
payOrder bigint,
payConfig string,
enabled bigint,
isOnline bigint,
payFor string
)
partitioned by (dt string)
STORED AS PARQUET;
--从ods层itcast_payments导出数据到dw层dim_payments表中
insert overwrite table `itcast_dw`.`dim_payments` partition(dt="20190909")
select
id ,
payCode ,
payName ,
payDesc ,
payOrder ,
payConfig,
enabled ,
isOnline ,
payFor
from `itcast_ods`.`itcast_payments` where dt="20190909";
1、加载支付方式排名
-- 创建支付方式分析表
drop table if exists `itcast_ads`.`tmp_order_paytype`;
create table `itcast_ads`.`tmp_order_paytype`(
userid bigint, -- 用户id
payType bigint, -- 支付类型id
payCode string, -- 支付码
totalCount bigint, -- 订单总数
rn bigint -- 等级
)
partitioned by (dt string)
STORED AS PARQUET;
-- 加载支付方式分析
insert overwrite table `itcast_ads`.`tmp_order_paytype` partition(dt='20190909')
select
t3.*,
row_number() over(partition by usrId order by totalCount desc) rn
from
(select
t1.userId,
t1.payType,
t2.payCode,
sum(1) as totalCount --sum(1)等同于count效果
from
`itcast_dw`.`dim_payments` t2
right join
`itcast_dw`.`tmp_order_datetag_wide` t1
on t2.id = t1.payType
group by t1.userId, t1.payType, t2.payCode) t3;
-- 测试
select * from `itcast_ads`.`tmp_order_paytype` limit 5;
参考代码:
select
t2.userid,
t2.loginname,
MIN(t1.payTime) as first_paytime,
MAX(t1.payTime) as lastest_paytime,
DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,
DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,
MIN(t1.paymoney) as min_paymoney,
MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11
then 1
else 0
end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11
then t1.paymoney
else 0
end
) as total_money_without_back, --累计消费金额不含退拒
--累计近30天消费次数不含退拒
sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
then 1
else 0
end
) as total_count_without_back_30,
--累计近30天消费金额不含退拒
sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
then t1.paymoney
else 0
end
) as total_money_without_back_30,
--累计近30天消费次数含退拒
sum(case when t1.flag30 =1
then 1
else 0
end
) as total_count_without_30,
--累计近30天消费金额含退拒
sum(case when t1.flag30 =1
then t1.paymoney
else 0
end
) as total_money_with_back_30,
SUM(t1.paymoney) / SUM(1) AS atv,
SUM(case when t1.orderStatus !=-3
then t1.paymoney else 0 end) /
SUM(case when t1.orderStatus !=-3 then 1 else 0
end) AS atv_withoutback,
SUM(case when t1.flag60 = 1
then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1
then 1
else 0
end) AS atv_60,
SUM(case when t1.orderStatus !=-3 and t1.flag60 = 1
then t1.paymoney
else 0
end) / SUM(case when (t1.orderStatus !=-3 ) and t1.flag60 = 1
then 1
else 0
end) AS atv_60_withoutback,
--最常用地址
max(case when t3.rn =1
then t3.othername
else ''
end) as most_usual_address,
--常用的支付方式
MAX(case when t4.rn = 1
then t4.payCode
else ''
end) as most_usual_paytype
from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1
left join
(select * from itcast_dw.dim_user where dt="20190909") t2 on
t1.userid=t2.userid
left join
(select * from `itcast_ads`.`tmp_order_address` where dt='20190909') as t3
on t1.userId = t3.userId
left join (select * from `itcast_ads`.`tmp_order_paytype` where dt='20190909') as t4
on t1.userId = t4.userId
group by t2.userid,t2.loginname limit 5;
指标 |
---|
学校下单总数 |
单位下单总数 |
家里下单总数 |
上午下单总数 |
下午下单总数 |
晚上下单总数 |
参考代码
select
t2.userid,
t2.loginname,
MIN(t1.payTime) as first_paytime,
MAX(t1.payTime) as lastest_paytime,
DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,
DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,
MIN(t1.paymoney) as min_paymoney,
MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11
then 1
else 0
end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11
then t1.paymoney
else 0
end
) as total_money_without_back, --累计消费金额不含退拒
--累计近30天消费次数不含退拒
sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
then 1
else 0
end
) as total_count_without_back_30,
--累计近30天消费金额不含退拒
sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
then t1.paymoney
else 0
end
) as total_money_without_back_30,
--累计近30天消费次数含退拒
sum(case when t1.flag30 =1
then 1
else 0
end
) as total_count_without_30,
--累计近30天消费金额含退拒
sum(case when t1.flag30 =1
then t1.paymoney
else 0
end
) as total_money_with_back_30,
SUM(t1.paymoney) / SUM(1) AS atv,
SUM(case when t1.orderStatus !=-3
then t1.paymoney else 0 end) /
SUM(case when t1.orderStatus !=-3 then 1 else 0
end) AS atv_withoutback,
SUM(case when t1.flag60 = 1
then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1
then 1
else 0
end) AS atv_60,
SUM(case when t1.orderStatus !=-3 and t1.flag60 = 1
then t1.paymoney
else 0
end) / SUM(case when (t1.orderStatus !=-3 ) and t1.flag60 = 1
then 1
else 0
end) AS atv_60_withoutback,
--最常用地址
max(case when t3.rn =1
then t3.othername
else ''
end) as most_usual_address,
--最常用支付方式
max(case when t4.rn = 1
then t4.payCode
else ''
end) as most_usual_paytype,
SUM(case when t1.otherName = '学校'
then 1
else 0
end) as school_order_count, -- 学校下单总数
SUM(case when t1.otherName = '单位'
then 1
else 0
end) as company_order_count, -- 单位下单总数
SUM(case when t1.otherName = '家里'
then 1
else 0
end) as home_order_count, -- 家里下单总数
SUM(case when t1.flagTimeBucket = '上午'
then 1
else 0
end) as am_order_count, -- 上午下单总数
SUM(case when t1.flagTimeBucket = '下午'
then 1
else 0
end) as pm_order_count, -- 下午下单总数
SUM(case when t1.flagTimeBucket = '晚上'
then 1
else 0
end) as night_order_count-- 晚上下单总数
from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1
left join
(select * from itcast_dw.dim_user where dt="20190909") t2 on
t1.userid=t2.userid
left join
(select * from `itcast_ads`.`tmp_order_address` where dt='20190909') as t3
on t1.userId = t3.userId
left join (select * from `itcast_ads`.`tmp_order_paytype` where dt='20190909') as t4
on t1.userId = t4.userId
group by t2.userid,t2.loginname limit 5;
drop table if exists `itcast_ads`.`tmp_user_order_measure`;
create table `itcast_ads`.`tmp_user_order_measure`(
userid string, -- 用户id
username string, -- 用户名称
first_paytime string, -- 第一次消费时间
lastest_paytime string, -- 最近一次消费时间
first_day_during_days bigint, -- 首单距今时间
lastest_day_durning_days bigint, -- 尾单距今时间
min_paymoney double, -- 最小消费金额
max_paymoney double, -- 最大消费金额
total_count_without_back bigint, -- 累计消费次数(不含退拒)
total_money_without_back double, -- 累计消费金额(不含退拒)
total_count_without_back_30 bigint, -- 近30天累计消费次数(不含退拒)
total_money_without_back_30 double, -- 近30天累计消费金额(不含退拒)
total_count_30 bigint, -- 近30天累计消费次数(含退拒)
total_money_30 double, -- 近30天累计消费金额(含退拒)
atv double, -- 客单价(含退拒)
atv_withoutback double, -- 客单价(不含退拒)
atv_60 double, -- 近60天客单价(含退拒)
atv_60_withoutback double, -- 近60天客单价(不含退拒)
most_usual_address string, -- 常用收货地址
most_usual_paytype string, -- 常用支付方式
school_order_count bigint, -- 学校下单总数
company_order_count bigint, -- 单位下单总数
home_order_count bigint, -- 家里下单总数
am_order_count bigint, -- 上午下单总数
pm_order_count bigint, -- 下午下单总数
night_order_count bigint -- 晚上下单总数
)
partitioned by (dt string)
STORED AS PARQUET;
insert overwrite table `itcast_ads`.`tmp_user_order_measure` partition (dt='20190909')
select
t2.userid,
t2.loginname,
MIN(t1.payTime) as first_paytime,
MAX(t1.payTime) as lastest_paytime,
DATEDIFF(CURRENT_TIMESTAMP, MIN(t1.payTime)) as first_day_during_days,
DATEDIFF(CURRENT_TIMESTAMP, MAX(t1.payTime)) as lastest_day_durning_days,
MIN(t1.paymoney) as min_paymoney,
MAX(t1.paymoney) as max_paymoney,
sum(
case when t1.orderstatus !=10 and t1.orderstatus !=11
then 1
else 0
end
) as total_count_without_back,--累计消费次数不含退拒,
sum(case when t1.orderstatus !=10 and t1.orderstatus !=11
then t1.paymoney
else 0
end
) as total_money_without_back, --累计消费金额不含退拒
--累计近30天消费次数不含退拒
sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
then 1
else 0
end
) as total_count_without_back_30,
--累计近30天消费金额不含退拒
sum(case when t1.flag30 =1 and t1.orderstatus !=10 and t1.orderstatus !=11
then t1.paymoney
else 0
end
) as total_money_without_back_30,
--累计近30天消费次数含退拒
sum(case when t1.flag30 =1
then 1
else 0
end
) as total_count_without_30,
--累计近30天消费金额含退拒
sum(case when t1.flag30 =1
then t1.paymoney
else 0
end
) as total_money_with_back_30,
SUM(t1.paymoney) / SUM(1) AS atv,
SUM(case when t1.orderStatus !=-3
then t1.paymoney else 0 end) /
SUM(case when t1.orderStatus !=-3 then 1 else 0
end) AS atv_withoutback,
SUM(case when t1.flag60 = 1
then t1.paymoney else 0 end) / SUM(case when t1.flag60 = 1
then 1
else 0
end) AS atv_60,
SUM(case when t1.orderStatus !=-3 and t1.flag60 = 1
then t1.paymoney
else 0
end) / SUM(case when (t1.orderStatus !=-3 ) and t1.flag60 = 1
then 1
else 0
end) AS atv_60_withoutback,
--最常用地址
max(case when t3.rn =1
then t3.othername
else ''
end) as most_usual_address,
max(case when t4.rn = 1
then t4.payCode
else ''
end) as most_usual_paytype,
SUM(case when t1.otherName = '学校'
then 1
else 0
end) as school_order_count, -- 学校下单总数
SUM(case when t1.otherName = '单位'
then 1
else 0
end) as company_order_count, -- 单位下单总数
SUM(case when t1.otherName = '家里'
then 1
else 0
end) as home_order_count, -- 家里下单总数
SUM(case when t1.flagTimeBucket = '上午'
then 1
else 0
end) as am_order_count, -- 上午下单总数
SUM(case when t1.flagTimeBucket = '下午'
then 1
else 0
end) as pm_order_count, -- 下午下单总数
SUM(case when t1.flagTimeBucket = '晚上'
then 1
else 0
end) as night_order_count-- 晚上下单总数
from
(select * from itcast_dw.fact_order_wide where dt="20190909") t1
left join
(select * from itcast_dw.dim_user where dt="20190909") t2 on
t1.userid=t2.userid
left join
(select * from `itcast_ads`.`tmp_order_address` where dt='20190909') as t3
on t1.userId = t3.userId
left join (select * from `itcast_ads`.`tmp_order_paytype` where dt='20190909') as t4
on t1.userId = t4.userId
group by t2.userid,t2.loginname ;
-- 测试
select * from `itcast_ads`.`tmp_user_order_measure` limit 10;
关于指标三所需要使用到的表,以及表与表之间的关联关系,大致流程如下所示:
通过查询我们最终生成的宽表,可以发现该表的数据已经包含了指标所需要分析的所有数据。
看到类似的效果说明我们就成功了!
大数据实战【千亿级数仓】阶段四的内容到这里就结束了。
如果以上过程中出现了任何的纰漏错误,烦请大佬们指正?
受益的朋友或对大数据技术感兴趣的伙伴记得点赞关注支持一波?