首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >数据仓库实战 4

数据仓库实战 4

作者头像
soundhearer
发布2020-12-18 11:40:50
1.3K0
发布2020-12-18 11:40:50
举报
文章被收录于专栏:数据湖数据湖

业务数仓

91712 Map-Reduce Framework Map input records=125 Map output records=125 Input split bytes=85 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=71 CPU time spent (ms)=1700 Physical memory (bytes) snapshot=259682304 Virtual memory (bytes) snapshot=2850103296 Total committed heap usage (bytes)=235929600 Peak Map Physical memory (bytes)=259682304 Peak Map Virtual memory (bytes)=2850103296 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=2181 20/11/25 11:07:51 INFO mapreduce.ImportJobBase: Transferred 2.1299 KB in 29.0742 seconds (75.0149 bytes/sec) 20/11/25 11:07:51 INFO mapreduce.ImportJobBase: Retrieved 125 records. Warning: /opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 20/11/25 11:07:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.7.1.3.0-100 20/11/25 11:07:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 20/11/25 11:07:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 20/11/25 11:07:56 INFO tool.CodeGenTool: Beginning code generation 20/11/25 11:07:57 INFO manager.SqlManager: Executing SQL statement: select id, name, category2_id from base_category3 where 1=1 and (1 = 0) ; 20/11/25 11:07:57 INFO manager.SqlManager: Executing SQL statement: select id, name, category2_id from base_category3 where 1=1 and (1 = 0) ; 20/11/25 11:07:57 INFO manager.SqlManager: Executing SQL statement: select id, name, category2_id from base_category3 where 1=1 and (1 = 0) ; 20/11/25 11:07:57 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce warning: No SupportedSourceVersion annotation found on org.apache.hadoop.hdds.conf.ConfigFileGenerator, returning RELEASE_6. warning: Supported source version 'RELEASE_6' from annotation processor 'org.apache.hadoop.hdds.conf.ConfigFileGenerator' less than -source '1.8' 2 warnings 20/11/25 11:07:59 WARN orm.CompilationManager: Could not rename /tmp/sqoop-root/compile/7b64a4d6eff156e0c71805f454daf5af/QueryResult.java to /data0/25k/./QueryResult.java. Error: Destination '/data0/25k/./QueryResult.java' already exists 20/11/25 11:07:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/7b64a4d6eff156e0c71805f454daf5af/QueryResult.jar 20/11/25 11:08:00 INFO tool.ImportTool: Destination directory /origin_data/gmall/db/base_category3/2020-11-24 is not present, hence not deleting. 20/11/25 11:08:00 INFO mapreduce.ImportJobBase: Beginning query import. 20/11/25 11:08:00 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 20/11/25 11:08:00 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 20/11/25 11:08:01 INFO client.RMProxy: Connecting to ResourceManager at cdh2.macro.com/192.168.0.207:8032 20/11/25 11:08:02 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hive/.staging/job_1606125802436_0010 20/11/25 11:08:05 INFO db.DBInputFormat: Using read commited transaction isolation 20/11/25 11:08:05 INFO mapreduce.JobSubmitter: number of splits:1 20/11/25 11:08:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606125802436_0010 20/11/25 11:08:05 INFO mapreduce.JobSubmitter: Executing with tokens: [] 20/11/25 11:08:06 INFO conf.Configuration: resource-types.xml not found 20/11/25 11:08:06 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 20/11/25 11:08:06 INFO impl.YarnClientImpl: Submitted application application_1606125802436_0010 20/11/25 11:08:06 INFO mapreduce.Job: The url to track the job: http://cdh2.macro.com:8088/proxy/application_1606125802436_0010/ 20/11/25 11:08:06 INFO mapreduce.Job: Running job: job_1606125802436_0010 20/11/25 11:08:15 INFO mapreduce.Job: Job job_1606125802436_0010 running in uber mode : false 20/11/25 11:08:15 INFO mapreduce.Job: map 0% reduce 0% 20/11/25 11:08:25 INFO mapreduce.Job: map 100% reduce 0% 20/11/25 11:08:25 INFO mapreduce.Job: Job job_1606125802436_0010 completed successfully

ODS层

完全仿照业务数据库中的表字段,一模一样的创建ODS层对应表。

下面是建表语句:

drop table if exists ods_order_info;

create external table ods_order_info ( 

  `id` string COMMENT '订单编号',

  `total_amount` decimal(10,2) COMMENT '订单金额', 

  `order_status` string COMMENT '订单状态', 

   `user_id` string COMMENT '用户id' ,

  `payment_way` string COMMENT '支付方式',  

  `out_trade_no` string COMMENT '支付流水号',  

  `create_time` string COMMENT '创建时间',  

  `operate_time` string COMMENT '操作时间' 

) COMMENT '订单表'

PARTITIONED BY ( `dt` string)

row format delimited  fields terminated by '\t' ;
drop table if exists ods_order_detail;

create external table ods_order_detail( 

`id` string COMMENT '订单编号',

`order_id` string  COMMENT '订单号', 

 `user_id` string COMMENT '用户id' ,

`sku_id` string COMMENT '商品id',  

`sku_name` string COMMENT '商品名称',  

 `order_price` string COMMENT '商品价格',  

 `sku_num` string COMMENT '商品数量',  

 `create_time` string COMMENT '创建时间'

) COMMENT '订单明细表'

PARTITIONED BY ( `dt` string)

row format delimited  fields terminated by '\t' 
drop table if exists ods_sku_info;

create external table ods_sku_info( 

  `id` string COMMENT 'skuId',

  `spu_id` string  COMMENT 'spuid', 

   `price` decimal(10,2) COMMENT '价格' ,

  `sku_name` string COMMENT '商品名称',  

  `sku_desc` string COMMENT '商品描述',  

  `weight` string COMMENT '重量',  

  `tm_id` string COMMENT '品牌id',  

  `category3_id` string COMMENT '品类id',  

  `create_time` string COMMENT '创建时间'

) COMMENT '商品表'

PARTITIONED BY ( `dt` string)

row format delimited  fields terminated by '\t' ;
drop table if exists ods_user_info;

create external table ods_user_info( 

  `id` string COMMENT '用户id',

  `name`  string COMMENT '姓名', 

   `birthday` string COMMENT '生日' ,

  `gender` string COMMENT '性别',  

  `email` string COMMENT '邮箱',  

  `user_level` string COMMENT '用户等级',  

  `create_time` string COMMENT '创建时间'

) COMMENT '用户信息'

PARTITIONED BY ( `dt` string)

row format delimited  fields terminated by '\t' ;
drop table if exists ods_base_category1;

create external table ods_base_category1( 

  `id` string COMMENT 'id',

  `name`  string COMMENT '名称'

) COMMENT '商品一级分类'

PARTITIONED BY ( `dt` string)

row format delimited  fields terminated by '\t';
drop table if exists ods_base_category2;

create external table ods_base_category2( 

  `id` string COMMENT ' id',

  `name`  string COMMENT '名称',

   category1_id string COMMENT '一级品类id'

) COMMENT '商品二级分类'

PARTITIONED BY ( `dt` string)

row format delimited  fields terminated by '\t'
drop table if exists ods_base_category3;

create external table ods_base_category3( 

  `id` string COMMENT ' id',

  `name`  string COMMENT '名称',

   category2_id string COMMENT '二级品类id'

) COMMENT '商品三级分类'

PARTITIONED BY ( `dt` string)

row format delimited  fields terminated by '\t';
drop table if exists `ods_payment_info`;

create external table  `ods_payment_info`(

  `id`  bigint COMMENT '编号',

  `out_trade_no`   string COMMENT '对外业务编号',

  `order_id`     string COMMENT '订单编号',

  `user_id`     string COMMENT '用户编号',

  `alipay_trade_no` string COMMENT '支付宝交易流水编号',

  `total_amount`   decimal(16,2) COMMENT '支付金额',

  `subject`     string COMMENT '交易内容',

  `payment_type` string COMMENT '支付类型',

  `payment_time`  string COMMENT '支付时间'

  )  COMMENT '支付流水表'

PARTITIONED BY ( `dt` string)

row format delimited  fields terminated by '\t' ;

ODS层数据导入脚本

新建脚本ods_db.sh

#!/bin/bash

APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
    do_date=$1
else 
    do_date=`date -d "-1 day" +%F`  
fi

sql=" 
load data inpath '/origin_data/$APP/db/order_info/$do_date'  OVERWRITE into table "$APP".ods_order_info partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/order_detail/$do_date'  OVERWRITE into table "$APP".ods_order_detail partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/sku_info/$do_date'  OVERWRITE into table "$APP".ods_sku_info partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table "$APP".ods_user_info partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table "$APP".ods_payment_info partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table "$APP".ods_base_category1 partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table "$APP".ods_base_category2 partition(dt='$do_date');

load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table "$APP".ods_base_category3 partition(dt='$do_date'); 
"
hive -e "$sql"

执行脚本导入数据

ods_db.sh 2020-11-24
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging.
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2
20/11/25 12:04:07 [main-EventThread]: ERROR imps.EnsembleTracker: Invalid config event received: {server.1=cdh2.macro.com:3181:4181:participant, version=0, server.3=cdh1.macro.com:3181:4181:participant, server.2=cdh3.macro.com:3181:4181:participant}
20/11/25 12:04:07 [main-EventThread]: ERROR imps.EnsembleTracker: Invalid config event received: {server.1=cdh2.macro.com:3181:4181:participant, version=0, server.3=cdh1.macro.com:3181:4181:participant, server.2=cdh3.macro.com:3181:4181:participant}
20/11/25 12:04:07 [main]: INFO jdbc.HiveConnection: Connected to cdh2.macro.com:10000
Connected to: Apache Hive (version 3.1.3000.7.1.3.0-100)
Driver: Hive JDBC (version 3.1.3000.7.1.3.0-100)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Compiling command(queryId=hive_20201125120407_cb083501-a516-40dc-8f5f-1b7e091f98d2): load data inpath '/origin_data/gmall/db/order_info/2020-11-24'  OVERWRITE into table gmall.ods_order_info partition(dt='2020-11-24')
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20201125120407_cb083501-a516-40dc-8f5f-1b7e091f98d2); Time taken: 0.175 seconds
INFO  : Executing command(queryId=hive_20201125120407_cb083501-a516-40dc-8f5f-1b7e091f98d2): load data inpath '/origin_data/gmall/db/order_info/2020-11-24'  OVERWRITE into table gmall.ods_order_info partition(dt='2020-11-24')
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table gmall.ods_order_info partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/order_info/2020-11-24
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Completed executing command(queryId=hive_20201125120407_cb083501-a516-40dc-8f5f-1b7e091f98d2); Time taken: 0.462 seconds
INFO  : OK
No rows affected (0.716 seconds)
INFO  : Compiling command(queryId=hive_20201125120408_2d33ea0d-8a3c-4247-b315-c6ec2c033256): load data inpath '/origin_data/gmall/db/order_detail/2020-11-24'  OVERWRITE into table gmall.ods_order_detail partition(dt='2020-11-24')
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20201125120408_2d33ea0d-8a3c-4247-b315-c6ec2c033256); Time taken: 0.16 seconds
INFO  : Executing command(queryId=hive_20201125120408_2d33ea0d-8a3c-4247-b315-c6ec2c033256): load data inpath '/origin_data/gmall/db/order_detail/2020-11-24'  OVERWRITE into table gmall.ods_order_detail partition(dt='2020-11-24')
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table gmall.ods_order_detail partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/order_detail/2020-11-24
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Completed executing command(queryId=hive_20201125120408_2d33ea0d-8a3c-4247-b315-c6ec2c033256); Time taken: 0.473 seconds
INFO  : OK
No rows affected (0.655 seconds)
INFO  : Compiling command(queryId=hive_20201125120409_c01eb4e3-beaa-4235-9ac1-4d2a70614b2b): load data inpath '/origin_data/gmall/db/sku_info/2020-11-24'  OVERWRITE into table gmall.ods_sku_info partition(dt='2020-11-24')
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20201125120409_c01eb4e3-beaa-4235-9ac1-4d2a70614b2b); Time taken: 0.154 seconds
INFO  : Executing command(queryId=hive_20201125120409_c01eb4e3-beaa-4235-9ac1-4d2a70614b2b): load data inpath '/origin_data/gmall/db/sku_info/2020-11-24'  OVERWRITE into table gmall.ods_sku_info partition(dt='2020-11-24')
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table gmall.ods_sku_info partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/sku_info/2020-11-24
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Completed executing command(queryId=hive_20201125120409_c01eb4e3-beaa-4235-9ac1-4d2a70614b2b); Time taken: 0.418 seconds
INFO  : OK
No rows affected (0.605 seconds)
INFO  : Compiling command(queryId=hive_20201125120409_808c17aa-5c8f-48b0-a27e-76f914c95512): load data inpath '/origin_data/gmall/db/user_info/2020-11-24' OVERWRITE into table gmall.ods_user_info partition(dt='2020-11-24')
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20201125120409_808c17aa-5c8f-48b0-a27e-76f914c95512); Time taken: 0.154 seconds
INFO  : Executing command(queryId=hive_20201125120409_808c17aa-5c8f-48b0-a27e-76f914c95512): load data inpath '/origin_data/gmall/db/user_info/2020-11-24' OVERWRITE into table gmall.ods_user_info partition(dt='2020-11-24')
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table gmall.ods_user_info partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/user_info/2020-11-24
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Completed executing command(queryId=hive_20201125120409_808c17aa-5c8f-48b0-a27e-76f914c95512); Time taken: 0.419 seconds
INFO  : OK
No rows affected (0.597 seconds)
INFO  : Compiling command(queryId=hive_20201125120410_676c568e-b172-473a-bbc6-f9f9b15e78df): load data inpath '/origin_data/gmall/db/payment_info/2020-11-24' OVERWRITE into table gmall.ods_payment_info partition(dt='2020-11-24')
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20201125120410_676c568e-b172-473a-bbc6-f9f9b15e78df); Time taken: 0.154 seconds
INFO  : Executing command(queryId=hive_20201125120410_676c568e-b172-473a-bbc6-f9f9b15e78df): load data inpath '/origin_data/gmall/db/payment_info/2020-11-24' OVERWRITE into table gmall.ods_payment_info partition(dt='2020-11-24')
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table gmall.ods_payment_info partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/payment_info/2020-11-24
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Completed executing command(queryId=hive_20201125120410_676c568e-b172-473a-bbc6-f9f9b15e78df); Time taken: 0.441 seconds
INFO  : OK
No rows affected (0.616 seconds)
INFO  : Compiling command(queryId=hive_20201125120411_d082247f-1fb6-45c4-b6a6-622f60a312bb): load data inpath '/origin_data/gmall/db/base_category1/2020-11-24' OVERWRITE into table gmall.ods_base_category1 partition(dt='2020-11-24')
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20201125120411_d082247f-1fb6-45c4-b6a6-622f60a312bb); Time taken: 0.149 seconds
INFO  : Executing command(queryId=hive_20201125120411_d082247f-1fb6-45c4-b6a6-622f60a312bb): load data inpath '/origin_data/gmall/db/base_category1/2020-11-24' OVERWRITE into table gmall.ods_base_category1 partition(dt='2020-11-24')
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table gmall.ods_base_category1 partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/base_category1/2020-11-24
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Completed executing command(queryId=hive_20201125120411_d082247f-1fb6-45c4-b6a6-622f60a312bb); Time taken: 0.413 seconds
INFO  : OK
No rows affected (0.583 seconds)
INFO  : Compiling command(queryId=hive_20201125120411_2d79b915-9fb8-4e71-9a26-97d4e369ed91): load data inpath '/origin_data/gmall/db/base_category2/2020-11-24' OVERWRITE into table gmall.ods_base_category2 partition(dt='2020-11-24')
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20201125120411_2d79b915-9fb8-4e71-9a26-97d4e369ed91); Time taken: 0.153 seconds
INFO  : Executing command(queryId=hive_20201125120411_2d79b915-9fb8-4e71-9a26-97d4e369ed91): load data inpath '/origin_data/gmall/db/base_category2/2020-11-24' OVERWRITE into table gmall.ods_base_category2 partition(dt='2020-11-24')
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table gmall.ods_base_category2 partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/base_category2/2020-11-24
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Completed executing command(queryId=hive_20201125120411_2d79b915-9fb8-4e71-9a26-97d4e369ed91); Time taken: 0.411 seconds
INFO  : OK
No rows affected (0.585 seconds)
INFO  : Compiling command(queryId=hive_20201125120412_02cc534c-8a19-4f4c-bbb9-d422efda325d): load data inpath '/origin_data/gmall/db/base_category3/2020-11-24' OVERWRITE into table gmall.ods_base_category3 partition(dt='2020-11-24')
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20201125120412_02cc534c-8a19-4f4c-bbb9-d422efda325d); Time taken: 0.154 seconds
INFO  : Executing command(queryId=hive_20201125120412_02cc534c-8a19-4f4c-bbb9-d422efda325d): load data inpath '/origin_data/gmall/db/base_category3/2020-11-24' OVERWRITE into table gmall.ods_base_category3 partition(dt='2020-11-24')
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table gmall.ods_base_category3 partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/base_category3/2020-11-24
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Completed executing command(queryId=hive_20201125120412_02cc534c-8a19-4f4c-bbb9-d422efda325d); Time taken: 0.425 seconds
INFO  : OK
No rows affected (0.601 seconds)
Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive
Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2

DWD层

对ODS层数据进行判空过滤。对商品分类表进行维度退化(降维)

创建订单表

drop table if exists dwd_order_info;

create external table dwd_order_info ( 

  `id` string COMMENT '',

  `total_amount` decimal(10,2) COMMENT '', 

  `order_status` string COMMENT ' 1  2  3  4  5', 

   `user_id` string COMMENT 'id' ,

  `payment_way` string COMMENT '',  

  `out_trade_no` string COMMENT '',  

  `create_time` string COMMENT '',  

  `operate_time` string COMMENT '' 

) 

PARTITIONED BY ( `dt` string)

stored as  parquet;

订单详情表

drop table if exists dwd_order_detail;

create external table dwd_order_detail( 

  `id` string COMMENT '',

  `order_id` decimal(10,2) COMMENT '', 

   `user_id` string COMMENT 'id' ,

  `sku_id` string COMMENT 'id',  

  `sku_name` string COMMENT '',  

  `order_price` string COMMENT '',  

  `sku_num` string COMMENT '', 

  `create_time` string COMMENT ''

) 

PARTITIONED BY (`dt` string)

stored as parquet;

用户表

drop table if exists dwd_user_info;

create external table dwd_user_info( 

  `id` string COMMENT 'id',

  `name`  string COMMENT '', 

   `birthday` string COMMENT '' ,

  `gender` string COMMENT '',  

  `email` string COMMENT '',  

  `user_level` string COMMENT '',  

  `create_time` string COMMENT ''

) 

PARTITIONED BY (`dt` string)

stored as  parquet;

支付流水表

drop table if exists `dwd_payment_info`;

create external  table  `dwd_payment_info`(

  `id`  bigint COMMENT '',

  `out_trade_no`   string COMMENT '',

  `order_id`     string COMMENT '',

  `user_id`     string COMMENT '',

  `alipay_trade_no` string COMMENT '',

  `total_amount`   decimal(16,2) COMMENT '',

  `subject`     string COMMENT '',

  `payment_type`   string COMMENT '',

  `payment_time`   string COMMENT ''

  )  

PARTITIONED BY ( `dt` string)

stored as  parquet;

创建商品表(增加分类)

drop table if exists dwd_sku_info;

create external table dwd_sku_info( 

  `id` string COMMENT 'skuId',

  `spu_id` string COMMENT 'spuid', 

   `price` decimal(10,2) COMMENT '' ,

  `sku_name` string COMMENT '',  

  `sku_desc` string COMMENT '',  

  `weight` string COMMENT '',  

  `tm_id` string COMMENT 'id',  

  `category3_id` string COMMENT '1id',  

  `category2_id` string COMMENT '2id',  

  `category1_id` string COMMENT '3id',  

  `category3_name` string COMMENT '3',  

  `category2_name` string COMMENT '2',  

  `category1_name` string COMMENT '1',  

  `create_time` string COMMENT ''

) 

PARTITIONED BY ( `dt` string)

stored as  parquet;

DWD层数据导入脚本

创建脚本dwd_db.sh

#!/bin/bash

# 定义变量方便修改
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
    do_date=$1
else 
    do_date=`date -d "-1 day" +%F`  
fi 

sql="

set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table   "$APP".dwd_order_info partition(dt)
select * from "$APP".ods_order_info 
where dt='$do_date'  and id is not null;

insert overwrite table   "$APP".dwd_order_detail partition(dt)
select * from "$APP".ods_order_detail 
where dt='$do_date'   and id is not null;

insert overwrite table   "$APP".dwd_user_info partition(dt)
select * from "$APP".ods_user_info
where dt='$do_date'   and id is not null;

insert overwrite table   "$APP".dwd_payment_info partition(dt)
select * from "$APP".ods_payment_info
where dt='$do_date'  and id is not null;

insert overwrite table   "$APP".dwd_sku_info partition(dt)
select  
    sku.id,
    sku.spu_id, 
    sku.price,
    sku.sku_name,  
    sku.sku_desc,  
    sku.weight,  
    sku.tm_id,  
    sku.category3_id,  
    c2.id category2_id ,  
    c1.id category1_id,  
    c3.name category3_name,  
    c2.name category2_name,  
    c1.name category1_name,  
    sku.create_time,
    sku.dt
from
    "$APP".ods_sku_info sku 
join "$APP".ods_base_category3 c3 on sku.category3_id=c3.id 
    join "$APP".ods_base_category2 c2 on c3.category2_id=c2.id 
    join "$APP".ods_base_category1 c1 on c2.category1_id=c1.id 
where sku.dt='$do_date'  and c2.dt='$do_date'  
and  c3.dt='$do_date' and  c1.dt='$do_date' 
and sku.id is not null;
"
hive -e "$sql"

执行脚本 dwd_db.sh 2020-11-24

INFO  : Status: DAG finished successfully in 9.01 seconds
INFO  : 
INFO  : Query Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  : OPERATION                            DURATION
INFO  : ----------------------------------------------------------------------------------------------
INFO  : Compile Query                           1.16s
INFO  : Prepare Plan                            0.16s
INFO  : Get Query Coordinator (AM)              0.01s
INFO  : Submit Plan                             0.10s
INFO  : Start DAG                               0.70s
INFO  : Run DAG                                 9.01s
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : Task Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
INFO  : ----------------------------------------------------------------------------------------------
INFO  :      Map 1            506.00            380              0             143              125
INFO  :      Map 2              0.00            110              0              18               18
INFO  :      Map 3           4955.00         10,340             90           1,574                1
INFO  :      Map 5              0.00            510              0             300              300
INFO  :  Reducer 4              0.00            610              0               1                0
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 6
INFO  :    TOTAL_LAUNCHED_TASKS: 6
INFO  :    DATA_LOCAL_TASKS: 4
INFO  :    AM_CPU_MILLISECONDS: 2930
INFO  :    AM_GC_TIME_MILLIS: 0
INFO  : File System Counters:
INFO  :    FILE_BYTES_READ: 4550
INFO  :    FILE_BYTES_WRITTEN: 36200
INFO  :    HDFS_BYTES_READ: 53605
INFO  :    HDFS_BYTES_WRITTEN: 54917
INFO  :    HDFS_READ_OPS: 13
INFO  :    HDFS_WRITE_OPS: 7
INFO  :    HDFS_OP_CREATE: 3
INFO  :    HDFS_OP_GET_FILE_STATUS: 9
INFO  :    HDFS_OP_MKDIRS: 2
INFO  :    HDFS_OP_OPEN: 4
INFO  :    HDFS_OP_RENAME: 2
INFO  : org.apache.tez.common.counters.TaskCounter:
INFO  :    REDUCE_INPUT_GROUPS: 1
INFO  :    REDUCE_INPUT_RECORDS: 1
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    SPILLED_RECORDS: 2
INFO  :    NUM_SHUFFLED_INPUTS: 4
INFO  :    NUM_SKIPPED_INPUTS: 1
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    MERGED_MAP_OUTPUTS: 1
INFO  :    GC_TIME_MILLIS: 90
INFO  :    TASK_DURATION_MILLIS: 5711
INFO  :    CPU_MILLISECONDS: 11950
INFO  :    PHYSICAL_MEMORY_BYTES: 4435476480
INFO  :    VIRTUAL_MEMORY_BYTES: 32940134400
INFO  :    COMMITTED_HEAP_BYTES: 4435476480
INFO  :    INPUT_RECORDS_PROCESSED: 2035
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 53605
INFO  :    OUTPUT_RECORDS: 444
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_BYTES: 41244
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 42158
INFO  :    OUTPUT_BYTES_PHYSICAL: 32070
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 4050
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 4050
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    SHUFFLE_CHUNK_COUNT: 1
INFO  :    SHUFFLE_BYTES: 31998
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 42158
INFO  :    SHUFFLE_BYTES_TO_MEM: 31642
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 356
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    SHUFFLE_PHASE_TIME: 1149
INFO  :    MERGE_PHASE_TIME: 88
INFO  :    FIRST_EVENT_RECEIVED: 690
INFO  :    LAST_EVENT_RECEIVED: 690
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  : HIVE:
INFO  :    CREATED_DYNAMIC_PARTITIONS: 1
INFO  :    CREATED_FILES: 2
INFO  :    DESERIALIZE_ERRORS: 0
INFO  :    RECORDS_IN_Map_1: 125
INFO  :    RECORDS_IN_Map_2: 18
INFO  :    RECORDS_IN_Map_3: 1149
INFO  :    RECORDS_IN_Map_5: 300
INFO  :    RECORDS_OUT_0: 1
INFO  :    RECORDS_OUT_1_gmall.dwd_sku_info: 300
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_1: 125
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_2: 18
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_3: 1
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_5: 300
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_4: 0
INFO  :    RECORDS_OUT_OPERATOR_FIL_41: 1149
INFO  :    RECORDS_OUT_OPERATOR_FIL_49: 18
INFO  :    RECORDS_OUT_OPERATOR_FIL_52: 125
INFO  :    RECORDS_OUT_OPERATOR_FIL_56: 300
INFO  :    RECORDS_OUT_OPERATOR_FS_22: 300
INFO  :    RECORDS_OUT_OPERATOR_FS_29: 1
INFO  :    RECORDS_OUT_OPERATOR_GBY_25: 1
INFO  :    RECORDS_OUT_OPERATOR_GBY_27: 1
INFO  :    RECORDS_OUT_OPERATOR_MAPJOIN_44: 1149
INFO  :    RECORDS_OUT_OPERATOR_MAPJOIN_45: 300
INFO  :    RECORDS_OUT_OPERATOR_MAPJOIN_54: 125
INFO  :    RECORDS_OUT_OPERATOR_MAP_0: 0
INFO  :    RECORDS_OUT_OPERATOR_RS_26: 1
INFO  :    RECORDS_OUT_OPERATOR_RS_51: 18
INFO  :    RECORDS_OUT_OPERATOR_RS_55: 125
INFO  :    RECORDS_OUT_OPERATOR_RS_58: 300
INFO  :    RECORDS_OUT_OPERATOR_SEL_21: 300
INFO  :    RECORDS_OUT_OPERATOR_SEL_24: 300
INFO  :    RECORDS_OUT_OPERATOR_SEL_28: 1
INFO  :    RECORDS_OUT_OPERATOR_SEL_50: 18
INFO  :    RECORDS_OUT_OPERATOR_SEL_53: 125
INFO  :    RECORDS_OUT_OPERATOR_SEL_57: 300
INFO  :    RECORDS_OUT_OPERATOR_SEL_8: 1149
INFO  :    RECORDS_OUT_OPERATOR_TS_0: 125
INFO  :    RECORDS_OUT_OPERATOR_TS_3: 18
INFO  :    RECORDS_OUT_OPERATOR_TS_6: 1149
INFO  :    RECORDS_OUT_OPERATOR_TS_9: 300
INFO  :    TOTAL_TABLE_ROWS_WRITTEN: 300
INFO  : Shuffle Errors:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : Shuffle Errors_Reducer_4_INPUT_Map_3:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : TaskCounter_Map_1_INPUT_Map_2:
INFO  :    FIRST_EVENT_RECEIVED: 9
INFO  :    INPUT_RECORDS_PROCESSED: 18
INFO  :    LAST_EVENT_RECEIVED: 9
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 1
INFO  :    SHUFFLE_BYTES: 356
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 348
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 356
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 0
INFO  :    SHUFFLE_PHASE_TIME: 50
INFO  : TaskCounter_Map_1_INPUT_c2:
INFO  :    INPUT_RECORDS_PROCESSED: 125
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 2181
INFO  : TaskCounter_Map_1_OUTPUT_Map_3:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  :    OUTPUT_BYTES: 4131
INFO  :    OUTPUT_BYTES_PHYSICAL: 2219
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 4387
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 125
INFO  :    SPILLED_RECORDS: 0
INFO  : TaskCounter_Map_2_INPUT_c1:
INFO  :    INPUT_RECORDS_PROCESSED: 18
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 270
INFO  : TaskCounter_Map_2_OUTPUT_Map_1:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  :    OUTPUT_BYTES: 306
INFO  :    OUTPUT_BYTES_PHYSICAL: 380
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 348
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 18
INFO  :    SPILLED_RECORDS: 0
INFO  : TaskCounter_Map_3_INPUT_Map_1:
INFO  :    FIRST_EVENT_RECEIVED: 329
INFO  :    INPUT_RECORDS_PROCESSED: 125
INFO  :    LAST_EVENT_RECEIVED: 329
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 1
INFO  :    SHUFFLE_BYTES: 2195
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 4387
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 0
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 2195
INFO  :    SHUFFLE_PHASE_TIME: 511
INFO  : TaskCounter_Map_3_INPUT_Map_5:
INFO  :    FIRST_EVENT_RECEIVED: 329
INFO  :    INPUT_RECORDS_PROCESSED: 300
INFO  :    LAST_EVENT_RECEIVED: 329
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 1
INFO  :    SHUFFLE_BYTES: 25397
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 30468
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 0
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 25397
INFO  :    SHUFFLE_PHASE_TIME: 514
INFO  : TaskCounter_Map_3_INPUT_c3:
INFO  :    INPUT_RECORDS_PROCESSED: 1149
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 21652
INFO  : TaskCounter_Map_3_OUTPUT_Reducer_4:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 6945
INFO  :    OUTPUT_BYTES_PHYSICAL: 4050
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 6955
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 1
INFO  :    SHUFFLE_CHUNK_COUNT: 1
INFO  :    SPILLED_RECORDS: 1
INFO  : TaskCounter_Map_5_INPUT_sku:
INFO  :    INPUT_RECORDS_PROCESSED: 300
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 29502
INFO  : TaskCounter_Map_5_OUTPUT_Map_3:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  :    OUTPUT_BYTES: 29862
INFO  :    OUTPUT_BYTES_PHYSICAL: 25421
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 30468
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 300
INFO  :    SPILLED_RECORDS: 0
INFO  : TaskCounter_Reducer_4_INPUT_Map_3:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 4050
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 4050
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    FIRST_EVENT_RECEIVED: 23
INFO  :    LAST_EVENT_RECEIVED: 23
INFO  :    MERGED_MAP_OUTPUTS: 1
INFO  :    MERGE_PHASE_TIME: 88
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_SHUFFLED_INPUTS: 1
INFO  :    NUM_SKIPPED_INPUTS: 1
INFO  :    REDUCE_INPUT_GROUPS: 1
INFO  :    REDUCE_INPUT_RECORDS: 1
INFO  :    SHUFFLE_BYTES: 4050
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 6955
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 0
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 4050
INFO  :    SHUFFLE_PHASE_TIME: 74
INFO  :    SPILLED_RECORDS: 1
INFO  : TaskCounter_Reducer_4_OUTPUT_out_Reducer_4:
INFO  :    OUTPUT_RECORDS: 0
INFO  : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO  :    GROUPED_INPUT_SPLITS_Map_1: 1
INFO  :    GROUPED_INPUT_SPLITS_Map_2: 1
INFO  :    GROUPED_INPUT_SPLITS_Map_3: 1
INFO  :    GROUPED_INPUT_SPLITS_Map_5: 1
INFO  :    INPUT_DIRECTORIES_Map_1: 1
INFO  :    INPUT_DIRECTORIES_Map_2: 1
INFO  :    INPUT_DIRECTORIES_Map_3: 1
INFO  :    INPUT_DIRECTORIES_Map_5: 1
INFO  :    INPUT_FILES_Map_1: 1
INFO  :    INPUT_FILES_Map_2: 1
INFO  :    INPUT_FILES_Map_3: 1
INFO  :    INPUT_FILES_Map_5: 1
INFO  :    RAW_INPUT_SPLITS_Map_1: 1
INFO  :    RAW_INPUT_SPLITS_Map_2: 1
INFO  :    RAW_INPUT_SPLITS_Map_3: 1
INFO  :    RAW_INPUT_SPLITS_Map_5: 1
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 2 .......... container     SUCCEEDED      1          1        0        0       0       0  
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Map 5 .......... container     SUCCEEDED      1          1        0        0       0       0  
Map 3 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 4 ...... container     SUCCEEDED      2          2        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 05/05  [==========================>>] 100%  ELAPSED TIME: 9.59 s     
----------------------------------------------------------------------------------------------
300 rows affected (11.721 seconds)
Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive
Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2

DWS层之用户行为宽表

1)为什么要建宽表

需求目标,把每个用户单日的行为聚合起来组成一张多列宽表,以便之后关联用户维度信息后进行,不同角度的统计分析

创建用户行为宽表

drop table if exists dws_user_action;

create external table dws_user_action 

(  

  user_id      string    comment '用户 id',

  order_count   bigint    comment '下单次数 ',

  order_amount   decimal(16,2)  comment '下单金额 ',

  payment_count  bigint    comment '支付次数',

  payment_amount  decimal(16,2) comment '支付金额 '

) COMMENT '每日用户行为宽表'

PARTITIONED BY (`dt` string)

stored as parquet

tblproperties ("parquet.compression"="snappy");

用户行为数据宽表导入脚本

创建脚本dws_db_wide.sh

#!/bin/bash

# 定义变量方便修改
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
    do_date=$1
else 
    do_date=`date -d "-1 day" +%F`  
fi 

sql="
with  
tmp_order as
(
    select 
        user_id, 
        count(*)  order_count,
        sum(oi.total_amount) order_amount
    from "$APP".dwd_order_info oi
    where date_format(oi.create_time,'yyyy-MM-dd')='$do_date'
    group by user_id
) ,
tmp_payment as
(
    select
        user_id, 
        sum(pi.total_amount) payment_amount, 
        count(*) payment_count 
    from "$APP".dwd_payment_info pi 
    where date_format(pi.payment_time,'yyyy-MM-dd')='$do_date'
    group by user_id
)
insert overwrite table "$APP".dws_user_action partition(dt='$do_date')
select
    user_actions.user_id,
    sum(user_actions.order_count),
    sum(user_actions.order_amount),
    sum(user_actions.payment_count),
    sum(user_actions.payment_amount)
from 
(
    select
        user_id,
        order_count,
        order_amount,
        0 payment_count,
        0 payment_amount
    from tmp_order

    union all
    select
        user_id,
        0 order_count,
        0 order_amount,
        payment_count,
        payment_amount
    from tmp_payment
 ) user_actions
group by user_id;
"

hive -e "$sql"

执行脚本

dws_db_wide.sh 2020-11-24
INFO  : Status: Running (Executing on YARN cluster with App id application_1606125802436_0017)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      2          2        0        0       0       0  
Reducer 4 ...... container     SUCCEEDED      2          2        0        0       0       0  
Reducer 5 ...... container     SUCCEEDED      2          2        0        0       0       0  
Map 6 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 7 ...... container     SUCCEEDED      2          2        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 06/06  [==========================>>] 100%  ELAPSED TIME: 11.15 s    
----------------------------------------------------------------------------------------------
INFO  : Status: DAG finished successfully in 10.07 seconds
INFO  : 
INFO  : Query Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  : OPERATION                            DURATION
INFO  : ----------------------------------------------------------------------------------------------
INFO  : Compile Query                           0.72s
INFO  : Prepare Plan                            0.18s
INFO  : Get Query Coordinator (AM)              0.01s
INFO  : Submit Plan                             4.30s
INFO  : Start DAG                               1.22s
INFO  : Run DAG                                10.07s
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : Task Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
INFO  : ----------------------------------------------------------------------------------------------
INFO  :      Map 1           4047.00          7,860             82           1,000              200
INFO  :      Map 6           4551.00          9,360            115             518              188
INFO  :  Reducer 2           1011.00          1,730             26             200              307
INFO  :  Reducer 4           1453.00          3,040              0             388                2
INFO  :  Reducer 5            443.00            830              8               2                0
INFO  :  Reducer 7           1011.00          1,970             14             188              188
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      2          2        0        0       0       0  
Reducer 4 ...... container     SUCCEEDED      2          2        0        0       0       0  
Reducer 5 ...... container     SUCCEEDED      2          2        0        0       0       0  
Map 6 .......... container     SUCCEEDED      1          1        0        0       0       0  .macro.com:8020/warehouse/tablespace/exReducer 7 ...... container     SUCCEEDED      2          2        0        0       0       0  3248837236917871195-1/-ext-10000
----------------------------------------------------------------------------------------------
VERTICES: 06/06  [==========================>>] 100%  ELAPSED TIME: 11.17 s    
----------------------------------------------------------------------------------------------
200 rows affected (17.604 seconds)
Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive
Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2

查看hue,数据已经生成

ADS层(需求:GMV成交总额)

建表语句

drop table if exists ads_gmv_sum_day;

create external table ads_gmv_sum_day(

  `dt` string COMMENT '统计日期',

  `gmv_count`  bigint COMMENT '当日gmv订单个数',

  `gmv_amount`  decimal(16,2) COMMENT '当日gmv订单总金额',

  `gmv_payment`  decimal(16,2) COMMENT '当日支付金额'

) COMMENT 'GMV'

row format delimited fields terminated by '\t';

数据导入脚本

创建脚本ads_db_gmv.sh

#!/bin/bash

# 定义变量方便修改
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
    do_date=$1
else 
    do_date=`date -d "-1 day" +%F`
fi 

sql="
insert into table "$APP".ads_gmv_sum_day 
select 
    '$do_date' dt,
    sum(order_count)  gmv_count,
    sum(order_amount) gmv_amount,
    sum(payment_amount) payment_amount 
from "$APP".dws_user_action 
where dt ='$do_date'
group by dt;
"

hive -e "$sql"

执行脚本

ads_db_gmv.sh 2020-11-24
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging.
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2
20/11/25 14:51:24 [main-EventThread]: ERROR imps.EnsembleTracker: Invalid config event received: {server.1=cdh2.macro.com:3181:4181:participant, version=0, server.3=cdh1.macro.com:3181:4181:participant, server.2=cdh3.macro.com:3181:4181:participant}
20/11/25 14:51:24 [main-EventThread]: ERROR imps.EnsembleTracker: Invalid config event received: {server.1=cdh2.macro.com:3181:4181:participant, version=0, server.3=cdh1.macro.com:3181:4181:participant, server.2=cdh3.macro.com:3181:4181:participant}
20/11/25 14:51:24 [main]: INFO jdbc.HiveConnection: Connected to cdh2.macro.com:10000
Connected to: Apache Hive (version 3.1.3000.7.1.3.0-100)
Driver: Hive JDBC (version 3.1.3000.7.1.3.0-100)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Compiling command(queryId=hive_20201125145125_efe11d96-baaa-451f-b5f8-d98403f2a949): insert into table gmall.ads_gmv_sum_day 
select 
    '2020-11-24' dt,
    sum(order_count)  gmv_count,
    sum(order_amount) gmv_amount,
    sum(payment_amount) payment_amount 
from gmall.dws_user_action 
where dt ='2020-11-24'
group by dt
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:string, comment:null), FieldSchema(name:_col1, type:bigint, comment:null), FieldSchema(name:_col2, type:decimal(16,2), comment:null), FieldSchema(name:_col3, type:decimal(16,2), comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20201125145125_efe11d96-baaa-451f-b5f8-d98403f2a949); Time taken: 0.217 seconds
INFO  : Executing command(queryId=hive_20201125145125_efe11d96-baaa-451f-b5f8-d98403f2a949): insert into table gmall.ads_gmv_sum_day 
select 
    '2020-11-24' dt,
    sum(order_count)  gmv_count,
    sum(order_amount) gmv_amount,
    sum(payment_amount) payment_amount 
from gmall.dws_user_action 
where dt ='2020-11-24'
group by dt
INFO  : Query ID = hive_20201125145125_efe11d96-baaa-451f-b5f8-d98403f2a949
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: hive_20201125145125_efe11d96-baaa-451f-b5f8-d98403f2a949
INFO  : Session is already open
INFO  : Dag name: insert into table gmall.ads_gmv_sum_day...dt (Stage-1)
INFO  : Tez session was closed. Reopening...
INFO  : Session re-established.
INFO  : Session re-established.
INFO  : Status: Running (Executing on YARN cluster with App id application_1606125802436_0018)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      2          2        0        0       0       0  
Reducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 10.00 s    
----------------------------------------------------------------------------------------------
INFO  : Status: DAG finished successfully in 9.45 seconds
INFO  : 
INFO  : Query Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  : OPERATION                            DURATION
INFO  : ----------------------------------------------------------------------------------------------
INFO  : Compile Query                           0.22s
INFO  : Prepare Plan                            0.06s
INFO  : Get Query Coordinator (AM)              0.01s
INFO  : Submit Plan                             4.22s
INFO  : Start DAG                               1.31s
INFO  : Run DAG                                 9.45s
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : Task Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
INFO  : ----------------------------------------------------------------------------------------------
INFO  :      Map 1           3527.00          7,280            102             200                1
INFO  :  Reducer 2           4397.00          7,540             98               1                2
INFO  :  Reducer 3           3389.00            630              0               2                0
INFO  : ----------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      2          2        0        0       0       0  tablespace/external/hive/gmall.db/ads_gReducer 3 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 10.01 s    
----------------------------------------------------------------------------------------------
1 row affected (15.842 seconds)
Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive
Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2

数据导出脚本

在MySQL中创建ads_gmv_sum_day表

DROP TABLE IF EXISTS ads_gmv_sum_day;

CREATE TABLE ads_gmv_sum_day(

 `dt` varchar(200) DEFAULT NULL COMMENT '统计日期',

 `gmv_count` bigint(20) DEFAULT NULL COMMENT '当日gmv订单个数',

 `gmv_amount` decimal(16, 2) DEFAULT NULL COMMENT '当日gmv订单总金额',

 `gmv_payment` decimal(16, 2) DEFAULT NULL COMMENT '当日支付金额'

) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = '每日活跃用户数量' ROW_FORMAT = Dynamic;

创建脚本sqoop_export.sh

vim sqoop_export.sh

#!/bin/bash

export HADOOP_USER_NAME=hive

db_name=gmall

export_data() {
sqoop export \
--connect "jdbc:mysql://192.168.0.208:3306/${db_name}?useUnicode=true&characterEncoding=utf-8"  \
--username root \
--password password \
--table $1 \
--num-mappers 1 \
--export-dir /warehouse/tablespace/external/hive/gmall.db/$1 \
--input-fields-terminated-by "\t" \
--update-mode allowinsert \
--update-key $2 \
--input-null-string '\\N'    \
--input-null-non-string '\\N'
}

case $1 in
  "ads_gmv_sum_day")
     export_data "ads_gmv_sum_day" "dt"
;;
   "all")
     export_data "ads_gmv_sum_day" "dt"
;;
esac

执行脚本导入数据

sqoop_export.sh all
Warning: /opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/11/25 15:49:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.7.1.3.0-100
20/11/25 15:49:34 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/11/25 15:49:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/11/25 15:49:35 INFO tool.CodeGenTool: Beginning code generation
20/11/25 15:49:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `ads_gmv_sum_day` AS t LIMIT 1
20/11/25 15:49:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `ads_gmv_sum_day` AS t LIMIT 1
20/11/25 15:49:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
warning: No SupportedSourceVersion annotation found on org.apache.hadoop.hdds.conf.ConfigFileGenerator, returning RELEASE_6.
warning: Supported source version 'RELEASE_6' from annotation processor 'org.apache.hadoop.hdds.conf.ConfigFileGenerator' less than -source '1.8'
2 warnings
20/11/25 15:49:38 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f6653e4db17d3a1223b93c16acc9d822/ads_gmv_sum_day.jar
20/11/25 15:49:38 WARN manager.MySQLManager: MySQL Connector upsert functionality is using INSERT ON
20/11/25 15:49:38 WARN manager.MySQLManager: DUPLICATE KEY UPDATE clause that relies on table's unique key.
20/11/25 15:49:38 WARN manager.MySQLManager: Insert/update distinction is therefore independent on column
20/11/25 15:49:38 WARN manager.MySQLManager: names specified in --update-key parameter. Please see MySQL
20/11/25 15:49:38 WARN manager.MySQLManager: documentation for additional limitations.
20/11/25 15:49:38 INFO mapreduce.ExportJobBase: Beginning export of ads_gmv_sum_day
20/11/25 15:49:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
20/11/25 15:49:39 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
20/11/25 15:49:39 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
20/11/25 15:49:39 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
20/11/25 15:49:40 INFO client.RMProxy: Connecting to ResourceManager at cdh2.macro.com/192.168.0.207:8032
20/11/25 15:49:41 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hive/.staging/job_1606125802436_0019
20/11/25 15:49:45 INFO input.FileInputFormat: Total input files to process : 1
20/11/25 15:49:45 INFO input.FileInputFormat: Total input files to process : 1
20/11/25 15:49:45 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
20/11/25 15:49:45 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 19cb887b751ca6813bec478cf3bf20459deed68b]
20/11/25 15:49:45 INFO mapreduce.JobSubmitter: number of splits:1
20/11/25 15:49:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606125802436_0019
20/11/25 15:49:45 INFO mapreduce.JobSubmitter: Executing with tokens: []
20/11/25 15:49:45 INFO conf.Configuration: resource-types.xml not found
20/11/25 15:49:45 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
20/11/25 15:49:46 INFO impl.YarnClientImpl: Submitted application application_1606125802436_0019
20/11/25 15:49:46 INFO mapreduce.Job: The url to track the job: http://cdh2.macro.com:8088/proxy/application_1606125802436_0019/
20/11/25 15:49:46 INFO mapreduce.Job: Running job: job_1606125802436_0019
20/11/25 15:49:54 INFO mapreduce.Job: Job job_1606125802436_0019 running in uber mode : false
20/11/25 15:49:54 INFO mapreduce.Job:  map 0% reduce 0%
20/11/25 15:50:02 INFO mapreduce.Job:  map 100% reduce 0%
20/11/25 15:50:02 INFO mapreduce.Job: Job job_1606125802436_0019 completed successfully
20/11/25 15:50:02 INFO mapreduce.Job: Counters: 33
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=253343
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=215
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
        HDFS: Number of bytes read erasure-coded=0
    Job Counters 
        Launched map tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=5709
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=5709
        Total vcore-milliseconds taken by all map tasks=5709
        Total megabyte-milliseconds taken by all map tasks=11692032
    Map-Reduce Framework
        Map input records=1
        Map output records=1
        Input split bytes=176
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=74
        CPU time spent (ms)=1320
        Physical memory (bytes) snapshot=285306880
        Virtual memory (bytes) snapshot=2846998528
        Total committed heap usage (bytes)=320864256
        Peak Map Physical memory (bytes)=285306880
        Peak Map Virtual memory (bytes)=2846998528
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
20/11/25 15:50:02 INFO mapreduce.ExportJobBase: Transferred 215 bytes in 22.3977 seconds (9.5992 bytes/sec)
20/11/25 15:50:02 INFO mapreduce.ExportJobBase: Exported 1 records.

查看mysql数据

MariaDB [gmall]> select * from ads_gmv_sum_day;
+------------+-----------+------------+-------------+
| dt         | gmv_count | gmv_amount | gmv_payment |
+------------+-----------+------------+-------------+
| 2020-11-24 |      1000 |  527431.00 |   268152.00 |
+------------+-----------+------------+-------------+
1 row in set (0.00 sec)
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-12-10,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 数据湖 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 业务数仓
  • ODS层
  • ODS层数据导入脚本
  • DWD层
  • DWS层之用户行为宽表
  • ADS层(需求:GMV成交总额)
相关产品与服务
专用宿主机
专用宿主机(CVM Dedicated Host,CDH)提供用户独享的物理服务器资源,满足您资源独享、资源物理隔离、安全、合规需求。专用宿主机搭载了腾讯云虚拟化系统,购买之后,您可在其上灵活创建、管理多个自定义规格的云服务器实例,自主规划物理资源的使用。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档