前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >数据仓库实战 3

数据仓库实战 3

作者头像
soundhearer
发布2020-12-18 11:45:05
1.2K0
发布2020-12-18 11:45:05
举报
文章被收录于专栏:数据湖数据湖

经过前面那么久的折腾,我们终于可以切入主题了,接下来我们用数仓分层的理论,在Hive中建立数据仓库。

ODS层

启动Hive客户端,创建gmall数据库

代码语言:javascript
复制
[root@cdh2 ~]# hive
0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macr> create database gmall;
0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macr> use gmall;

创建启动日志表

ODS层创建启动日志表分析

代码语言:javascript
复制
drop table if exists ods_start_log;
CREATE EXTERNAL TABLE ods_start_log (`line` string)
PARTITIONED BY (`dt` string)
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

注意的是,我们要配置hive的LZO压缩INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

ODS层加载数据脚本

代码语言:javascript
复制
0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macr> load data inpath '/origin_data/gmall/log/topic_start/2020-11-24' into table gmall.ods_start_log partition(dt='2020-11-24');
INFO  : Compiling command(queryId=hive_20201124101935_584d70b8-56bb-4102-aa82-f14e34180f6a): load data inpath '/origin_data/gmall/log/topic_start/2020-11-24' into table gmall.ods_start_log partition(dt='2020-11-24')
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20201124101935_584d70b8-56bb-4102-aa82-f14e34180f6a); Time taken: 0.171 seconds
INFO  : Executing command(queryId=hive_20201124101935_584d70b8-56bb-4102-aa82-f14e34180f6a): load data inpath '/origin_data/gmall/log/topic_start/2020-11-24' into table gmall.ods_start_log partition(dt='2020-11-24')
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table gmall.ods_start_log partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/log/topic_start/2020-11-24
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Completed executing command(queryId=hive_20201124101935_584d70b8-56bb-4102-aa82-f14e34180f6a); Time taken: 1.229 seconds
INFO  : OK
No rows affected (1.443 seconds)

通过Hue查找数据,我们可以看到ods_start_log表已经有数据了

DWD层

创建启动表

代码语言:javascript
复制
0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macr> drop table if exists dwd_start_log;
0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macr> CREATE EXTERNAL TABLE dwd_start_log(
. . . . . . . . . . . . . . . . . . . . . . .> `mid_id` string,
. . . . . . . . . . . . . . . . . . . . . . .> `user_id` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `version_code` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `version_name` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `lang` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `source` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `os` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `area` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `model` string,
. . . . . . . . . . . . . . . . . . . . . . .> `brand` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `sdk_version` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `gmail` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `height_width` string,  
. . . . . . . . . . . . . . . . . . . . . . .> `app_time` string,
. . . . . . . . . . . . . . . . . . . . . . .> `network` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `lng` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `lat` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `entry` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `open_ad_type` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `action` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `loading_time` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `detail` string, 
. . . . . . . . . . . . . . . . . . . . . . .> `extend1` string
. . . . . . . . . . . . . . . . . . . . . . .> )
. . . . . . . . . . . . . . . . . . . . . . .> PARTITIONED BY (dt string);

DWD层启动表加载数据脚本

新建脚本dwd_start_log.sh

代码语言:javascript
复制
#!/bin/bash

# 定义变量方便修改
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
    do_date=$1
else 
    do_date=`date -d "-1 day" +%F`  
fi 

sql="
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table "$APP".dwd_start_log
PARTITION (dt='$do_date')
select 
    get_json_object(line,'$.mid') mid_id,
    get_json_object(line,'$.uid') user_id,
    get_json_object(line,'$.vc') version_code,
    get_json_object(line,'$.vn') version_name,
    get_json_object(line,'$.l') lang,
    get_json_object(line,'$.sr') source,
    get_json_object(line,'$.os') os,
    get_json_object(line,'$.ar') area,
    get_json_object(line,'$.md') model,
    get_json_object(line,'$.ba') brand,
    get_json_object(line,'$.sv') sdk_version,
    get_json_object(line,'$.g') gmail,
    get_json_object(line,'$.hw') height_width,
    get_json_object(line,'$.t') app_time,
    get_json_object(line,'$.nw') network,
    get_json_object(line,'$.ln') lng,
    get_json_object(line,'$.la') lat,
    get_json_object(line,'$.entry') entry,
    get_json_object(line,'$.open_ad_type') open_ad_type,
    get_json_object(line,'$.action') action,
    get_json_object(line,'$.loading_time') loading_time,
    get_json_object(line,'$.detail') detail,
    get_json_object(line,'$.extend1') extend1
from "$APP".ods_start_log 
where dt='$do_date';
"

hive -e "$sql"

增加脚本执行权限

代码语言:javascript
复制
chmod 777 dwd_start_log.sh

执行脚本

代码语言:javascript
复制
dwd_start_log.sh 2020-11-24
代码语言:javascript
复制
[root@cdh2]# dwd_start_log.sh 2020-11-24
INFO  : Compiling command(queryId=hive_20201124203254_3e88639f-e439-40d3-bd7b-bbbb2159fb3b): insert overwrite table gmall.dwd_start_log
PARTITION (dt='2020-11-24')
select 
    get_json_object(line,'$.mid') mid_id,
    get_json_object(line,'$.uid') user_id,
    get_json_object(line,'$.vc') version_code,
    get_json_object(line,'$.vn') version_name,
    get_json_object(line,'$.l') lang,
    get_json_object(line,'$.sr') source,
    get_json_object(line,'$.os') os,
    get_json_object(line,'$.ar') area,
    get_json_object(line,'$.md') model,
    get_json_object(line,'$.ba') brand,
    get_json_object(line,'$.sv') sdk_version,
    get_json_object(line,'$.g') gmail,
    get_json_object(line,'$.hw') height_width,
    get_json_object(line,'$.t') app_time,
    get_json_object(line,'$.nw') network,
    get_json_object(line,'$.ln') lng,
    get_json_object(line,'$.la') lat,
    get_json_object(line,'$.entry') entry,
    get_json_object(line,'$.open_ad_type') open_ad_type,
    get_json_object(line,'$.action') action,
    get_json_object(line,'$.loading_time') loading_time,
    get_json_object(line,'$.detail') detail,
    get_json_object(line,'$.extend1') extend1
from gmall.ods_start_log 
where dt='2020-11-24'
INFO  : No Stats for gmall@ods_start_log, Columns: line
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:mid_id, type:string, comment:null), FieldSchema(name:user_id, type:string, comment:null), FieldSchema(name:version_code, type:string, comment:null), FieldSchema(name:version_name, type:string, comment:null), FieldSchema(name:lang, type:string, comment:null), FieldSchema(name:source, type:string, comment:null), FieldSchema(name:os, type:string, comment:null), FieldSchema(name:area, type:string, comment:null), FieldSchema(name:model, type:string, comment:null), FieldSchema(name:brand, type:string, comment:null), FieldSchema(name:sdk_version, type:string, comment:null), FieldSchema(name:gmail, type:string, comment:null), FieldSchema(name:height_width, type:string, comment:null), FieldSchema(name:app_time, type:string, comment:null), FieldSchema(name:network, type:string, comment:null), FieldSchema(name:lng, type:string, comment:null), FieldSchema(name:lat, type:string, comment:null), FieldSchema(name:entry, type:string, comment:null), FieldSchema(name:open_ad_type, type:string, comment:null), FieldSchema(name:action, type:string, comment:null), FieldSchema(name:loading_time, type:string, comment:null), FieldSchema(name:detail, type:string, comment:null), FieldSchema(name:extend1, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20201124203254_3e88639f-e439-40d3-bd7b-bbbb2159fb3b); Time taken: 1.615 seconds
INFO  : Executing command(queryId=hive_20201124203254_3e88639f-e439-40d3-bd7b-bbbb2159fb3b): insert overwrite table gmall.dwd_start_log
PARTITION (dt='2020-11-24')
select 
    get_json_object(line,'$.mid') mid_id,
    get_json_object(line,'$.uid') user_id,
    get_json_object(line,'$.vc') version_code,
    get_json_object(line,'$.vn') version_name,
    get_json_object(line,'$.l') lang,
    get_json_object(line,'$.sr') source,
    get_json_object(line,'$.os') os,
    get_json_object(line,'$.ar') area,
    get_json_object(line,'$.md') model,
    get_json_object(line,'$.ba') brand,
    get_json_object(line,'$.sv') sdk_version,
    get_json_object(line,'$.g') gmail,
    get_json_object(line,'$.hw') height_width,
    get_json_object(line,'$.t') app_time,
    get_json_object(line,'$.nw') network,
    get_json_object(line,'$.ln') lng,
    get_json_object(line,'$.la') lat,
    get_json_object(line,'$.entry') entry,
    get_json_object(line,'$.open_ad_type') open_ad_type,
    get_json_object(line,'$.action') action,
    get_json_object(line,'$.loading_time') loading_time,
    get_json_object(line,'$.detail') detail,
    get_json_object(line,'$.extend1') extend1
from gmall.ods_start_log 
where dt='2020-11-24'
INFO  : Query ID = hive_20201124203254_3e88639f-e439-40d3-bd7b-bbbb2159fb3b
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: hive_20201124203254_3e88639f-e439-40d3-bd7b-bbbb2159fb3b
INFO  : Session is already open
INFO  : Dag name: insert overwrite table gma...dt='2020-11-24' (Stage-1)
INFO  : Tez session was closed. Reopening...
INFO  : Session re-established.
INFO  : Session re-established.
INFO  : Status: Running (Executing on YARN cluster with App id application_1606125802436_0004)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      2          2        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 10.58 s    
----------------------------------------------------------------------------------------------
INFO  : Status: DAG finished successfully in 9.26 seconds
INFO  : 
INFO  : Query Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  : OPERATION                            DURATION
INFO  : ----------------------------------------------------------------------------------------------
INFO  : Compile Query                           1.61s
INFO  : Prepare Plan                            0.16s
INFO  : Get Query Coordinator (AM)              0.03s
INFO  : Submit Plan                             4.28s
INFO  : Start DAG                               1.36s
INFO  : Run DAG                                 9.26s
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : Task Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
INFO  : ----------------------------------------------------------------------------------------------
INFO  :      Map 1           5540.00         11,310            128             502                1
INFO  :  Reducer 2            177.00          1,490             12               1                0
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 3
INFO  :    TOTAL_LAUNCHED_TASKS: 3
INFO  :    DATA_LOCAL_TASKS: 1
INFO  :    AM_CPU_MILLISECONDS: 3270
INFO  :    AM_GC_TIME_MILLIS: 19
INFO  : File System Counters:
INFO  :    FILE_BYTES_READ: 3722
INFO  :    FILE_BYTES_WRITTEN: 3610
INFO  :    HDFS_BYTES_READ: 48652
INFO  :    HDFS_BYTES_WRITTEN: 73761
INFO  :    HDFS_READ_OPS: 8
INFO  :    HDFS_WRITE_OPS: 5
INFO  :    HDFS_OP_CREATE: 3
INFO  :    HDFS_OP_GET_FILE_STATUS: 7
INFO  :    HDFS_OP_OPEN: 1
INFO  :    HDFS_OP_RENAME: 2
INFO  : org.apache.tez.common.counters.TaskCounter:
INFO  :    REDUCE_INPUT_GROUPS: 1
INFO  :    REDUCE_INPUT_RECORDS: 1
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    SPILLED_RECORDS: 2
INFO  :    NUM_SHUFFLED_INPUTS: 1
INFO  :    NUM_SKIPPED_INPUTS: 1
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    MERGED_MAP_OUTPUTS: 1
INFO  :    GC_TIME_MILLIS: 140
INFO  :    TASK_DURATION_MILLIS: 5902
INFO  :    CPU_MILLISECONDS: 12800
INFO  :    PHYSICAL_MEMORY_BYTES: 1645215744
INFO  :    VIRTUAL_MEMORY_BYTES: 16442019840
INFO  :    COMMITTED_HEAP_BYTES: 1645215744
INFO  :    INPUT_RECORDS_PROCESSED: 502
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 48652
INFO  :    OUTPUT_RECORDS: 1
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_BYTES: 9238
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 9248
INFO  :    OUTPUT_BYTES_PHYSICAL: 3554
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 3554
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    SHUFFLE_CHUNK_COUNT: 1
INFO  :    SHUFFLE_BYTES: 3554
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 9248
INFO  :    SHUFFLE_BYTES_TO_MEM: 0
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 3554
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    SHUFFLE_PHASE_TIME: 72
INFO  :    MERGE_PHASE_TIME: 89
INFO  :    FIRST_EVENT_RECEIVED: 49
INFO  :    LAST_EVENT_RECEIVED: 49
INFO  : HIVE:
INFO  :    CREATED_FILES: 2
INFO  :    DESERIALIZE_ERRORS: 0
INFO  :    RECORDS_IN_Map_1: 502
INFO  :    RECORDS_OUT_0: 1
INFO  :    RECORDS_OUT_1_gmall.dwd_start_log: 502
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_1: 1
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO  :    RECORDS_OUT_OPERATOR_FS_10: 1
INFO  :    RECORDS_OUT_OPERATOR_FS_3: 502
INFO  :    RECORDS_OUT_OPERATOR_GBY_6: 1
INFO  :    RECORDS_OUT_OPERATOR_GBY_8: 1
INFO  :    RECORDS_OUT_OPERATOR_MAP_0: 0
INFO  :    RECORDS_OUT_OPERATOR_RS_7: 1
INFO  :    RECORDS_OUT_OPERATOR_SEL_2: 502
INFO  :    RECORDS_OUT_OPERATOR_SEL_5: 502
INFO  :    RECORDS_OUT_OPERATOR_SEL_9: 1
INFO  :    RECORDS_OUT_OPERATOR_TS_0: 502
INFO  :    TOTAL_TABLE_ROWS_WRITTEN: 502
INFO  : Shuffle Errors:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : Shuffle Errors_Reducer_2_INPUT_Map_1:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : TaskCounter_Map_1_INPUT_ods_start_log:
INFO  :    INPUT_RECORDS_PROCESSED: 502
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 48652
INFO  : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 9238
INFO  :    OUTPUT_BYTES_PHYSICAL: 3554
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 9248
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 1
INFO  :    SHUFFLE_CHUNK_COUNT: 1
INFO  :    SPILLED_RECORDS: 1
INFO  : TaskCounter_Reducer_2_INPUT_Map_1:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 3554
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    FIRST_EVENT_RECEIVED: 49
INFO  :    LAST_EVENT_RECEIVED: 49
INFO  :    MERGED_MAP_OUTPUTS: 1
INFO  :    MERGE_PHASE_TIME: 89
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_SHUFFLED_INPUTS: 1
INFO  :    NUM_SKIPPED_INPUTS: 1
INFO  :    REDUCE_INPUT_GROUPS: 1
INFO  :    REDUCE_INPUT_RECORDS: 1
INFO  :    SHUFFLE_BYTES: 3554
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 9248
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 3554
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 0
INFO  :    SHUFFLE_PHASE_TIME: 72
INFO  :    SPILLED_RECORDS: 1
INFO  : TaskCounter_Reducer_2_OUTPUT_out_Reducer_2:
INFO  :    OUTPUT_RECORDS: 0
INFO  : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO  :    GROUPED_INPUT_SPLITS_Map_1: 1
INFO  :    INPUT_DIRECTORIES_Map_1: 1
INFO  :    INPUT_FILES_Map_1: 1
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  Reducer 2 ...... container     SUCCEEDED      2          2        0        0       0       0  640498459124089-1/-ext-10000
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 10.61 s    
----------------------------------------------------------------------------------------------
502 rows affected (18.059 seconds)

Hue中查看dwd_start_log,可以看到已经生成数据了

DWS层(用户日活跃)

每日活跃设备分析

目标:统计当日、当周、当月活动的每个设备明细

新建dws_uv_detail_day表

代码语言:javascript
复制
drop table if exists dws_uv_detail_day;
create external table dws_uv_detail_day
(
    `mid_id` string COMMENT '设备唯一标识',
    `user_id` string COMMENT '用户标识', 
    `version_code` string COMMENT '程序版本号', 
    `version_name` string COMMENT '程序版本名', 
    `lang` string COMMENT '系统语言', 
    `source` string COMMENT '渠道号', 
    `os` string COMMENT '安卓系统版本', 
    `area` string COMMENT '区域', 
    `model` string COMMENT '手机型号', 
    `brand` string COMMENT '手机品牌', 
    `sdk_version` string COMMENT 'sdkVersion', 
    `gmail` string COMMENT 'gmail', 
    `height_width` string COMMENT '屏幕宽高',
    `app_time` string COMMENT '客户端日志产生时的时间',
    `network` string COMMENT '网络模式',
    `lng` string COMMENT '经度',
    `lat` string COMMENT '纬度'
)
partitioned by(dt string)
stored as parquet

新建脚本dws_log.sh

代码语言:javascript
复制
[root@cdh2 25k]# cat dws_log.sh 
#!/bin/bash

# 定义变量方便修改
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
    do_date=$1
else 
    do_date=`date -d "-1 day" +%F`  
fi 


sql="
  set hive.exec.dynamic.partition.mode=nonstrict;

  insert overwrite table "$APP".dws_uv_detail_day partition(dt='$do_date')
  select  
    mid_id,
    concat_ws('|', collect_set(user_id)) user_id,
    concat_ws('|', collect_set(version_code)) version_code,
    concat_ws('|', collect_set(version_name)) version_name,
    concat_ws('|', collect_set(lang)) lang,
    concat_ws('|', collect_set(source)) source,
    concat_ws('|', collect_set(os)) os,
    concat_ws('|', collect_set(area)) area, 
    concat_ws('|', collect_set(model)) model,
    concat_ws('|', collect_set(brand)) brand,
    concat_ws('|', collect_set(sdk_version)) sdk_version,
    concat_ws('|', collect_set(gmail)) gmail,
    concat_ws('|', collect_set(height_width)) height_width,
    concat_ws('|', collect_set(app_time)) app_time,
    concat_ws('|', collect_set(network)) network,
    concat_ws('|', collect_set(lng)) lng,
    concat_ws('|', collect_set(lat)) lat
  from "$APP".dwd_start_log
  where dt='$do_date'  
  group by mid_id;
"

hive -e "$sql"

修改执行权限

代码语言:javascript
复制
chmod 777 dws_log.sh

执行脚本

代码语言:javascript
复制
[root@cdh2 25k]# dws_log.sh 2020-11-24
20/11/24 21:43:12 [main]: INFO jdbc.HiveConnection: Connected to cdh2.macro.com:10000
Connected to: Apache Hive (version 3.1.3000.7.1.3.0-100)
Driver: Hive JDBC (version 3.1.3000.7.1.3.0-100)
Transaction isolation: TRANSACTION_REPEATABLE_READ
No rows affected (0.068 seconds)
INFO  : Compiling command(queryId=hive_20201124214313_9cbb09a0-699d-4747-a5e9-a126602a5435): insert overwrite table gmall.dws_uv_detail_day partition(dt='2020-11-24')
  select  
    mid_id,
    concat_ws('|', collect_set(user_id)) user_id,
    concat_ws('|', collect_set(version_code)) version_code,
    concat_ws('|', collect_set(version_name)) version_name,
    concat_ws('|', collect_set(lang)) lang,
    concat_ws('|', collect_set(source)) source,
    concat_ws('|', collect_set(os)) os,
    concat_ws('|', collect_set(area)) area, 
    concat_ws('|', collect_set(model)) model,
    concat_ws('|', collect_set(brand)) brand,
    concat_ws('|', collect_set(sdk_version)) sdk_version,
    concat_ws('|', collect_set(gmail)) gmail,
    concat_ws('|', collect_set(height_width)) height_width,
    concat_ws('|', collect_set(app_time)) app_time,
    concat_ws('|', collect_set(network)) network,
    concat_ws('|', collect_set(lng)) lng,
    concat_ws('|', collect_set(lat)) lat
  from gmall.dwd_start_log
  where dt='2020-11-24'  
  group by mid_id
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:mid_id, type:string, comment:null), FieldSchema(name:user_id, type:string, comment:null), FieldSchema(name:version_code, type:string, comment:null), FieldSchema(name:version_name, type:string, comment:null), FieldSchema(name:lang, type:string, comment:null), FieldSchema(name:source, type:string, comment:null), FieldSchema(name:os, type:string, comment:null), FieldSchema(name:area, type:string, comment:null), FieldSchema(name:model, type:string, comment:null), FieldSchema(name:brand, type:string, comment:null), FieldSchema(name:sdk_version, type:string, comment:null), FieldSchema(name:gmail, type:string, comment:null), FieldSchema(name:height_width, type:string, comment:null), FieldSchema(name:app_time, type:string, comment:null), FieldSchema(name:network, type:string, comment:null), FieldSchema(name:lng, type:string, comment:null), FieldSchema(name:lat, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20201124214313_9cbb09a0-699d-4747-a5e9-a126602a5435); Time taken: 0.669 seconds
INFO  : Executing command(queryId=hive_20201124214313_9cbb09a0-699d-4747-a5e9-a126602a5435): insert overwrite table gmall.dws_uv_detail_day partition(dt='2020-11-24')
  select  
    mid_id,
    concat_ws('|', collect_set(user_id)) user_id,
    concat_ws('|', collect_set(version_code)) version_code,
    concat_ws('|', collect_set(version_name)) version_name,
    concat_ws('|', collect_set(lang)) lang,
    concat_ws('|', collect_set(source)) source,
    concat_ws('|', collect_set(os)) os,
    concat_ws('|', collect_set(area)) area, 
    concat_ws('|', collect_set(model)) model,
    concat_ws('|', collect_set(brand)) brand,
    concat_ws('|', collect_set(sdk_version)) sdk_version,
    concat_ws('|', collect_set(gmail)) gmail,
    concat_ws('|', collect_set(height_width)) height_width,
    concat_ws('|', collect_set(app_time)) app_time,
    concat_ws('|', collect_set(network)) network,
    concat_ws('|', collect_set(lng)) lng,
    concat_ws('|', collect_set(lat)) lat
  from gmall.dwd_start_log
  where dt='2020-11-24'  
  group by mid_id
INFO  : Query ID = hive_20201124214313_9cbb09a0-699d-4747-a5e9-a126602a5435
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: hive_20201124214313_9cbb09a0-699d-4747-a5e9-a126602a5435
INFO  : Session is already open
INFO  : Dag name: insert overwrite table gmall.dws_uv...mid_id (Stage-1)
INFO  : Tez session was closed. Reopening...
INFO  : Session re-established.
INFO  : Session re-established.
INFO  : Status: Running (Executing on YARN cluster with App id application_1606125802436_0005)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      2          2        0        0       0       0  
Reducer 3 ...... container     SUCCEEDED      2          2        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 12.60 s    
----------------------------------------------------------------------------------------------
INFO  : Status: DAG finished successfully in 12.05 seconds
INFO  : 
INFO  : Query Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  : OPERATION                            DURATION
INFO  : ----------------------------------------------------------------------------------------------
INFO  : Compile Query                           0.67s
INFO  : Prepare Plan                            0.12s
INFO  : Get Query Coordinator (AM)              0.01s
INFO  : Submit Plan                             4.36s
INFO  : Start DAG                               1.28s
INFO  : Run DAG                                12.05s
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : Task Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
INFO  : ----------------------------------------------------------------------------------------------
INFO  :      Map 1           3531.00          7,710             86             502              502
INFO  :  Reducer 2           6983.00         13,430            128             502                2
INFO  :  Reducer 3           4951.00          1,270              0               2                0
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 5
INFO  :    TOTAL_LAUNCHED_TASKS: 5
INFO  :    DATA_LOCAL_TASKS: 1
INFO  :    AM_CPU_MILLISECONDS: 3690
INFO  :    AM_GC_TIME_MILLIS: 42
INFO  : File System Counters:
INFO  :    FILE_BYTES_READ: 51126
INFO  :    FILE_BYTES_WRITTEN: 69324
INFO  :    HDFS_BYTES_READ: 61338
INFO  :    HDFS_BYTES_WRITTEN: 60185
INFO  :    HDFS_READ_OPS: 12
INFO  :    HDFS_WRITE_OPS: 8
INFO  :    HDFS_OP_CREATE: 5
INFO  :    HDFS_OP_GET_FILE_STATUS: 11
INFO  :    HDFS_OP_OPEN: 1
INFO  :    HDFS_OP_RENAME: 3
INFO  : org.apache.tez.common.counters.TaskCounter:
INFO  :    REDUCE_INPUT_GROUPS: 503
INFO  :    REDUCE_INPUT_RECORDS: 504
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    SPILLED_RECORDS: 1008
INFO  :    NUM_SHUFFLED_INPUTS: 4
INFO  :    NUM_SKIPPED_INPUTS: 2
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    MERGED_MAP_OUTPUTS: 4
INFO  :    GC_TIME_MILLIS: 214
INFO  :    TASK_DURATION_MILLIS: 15451
INFO  :    CPU_MILLISECONDS: 22410
INFO  :    PHYSICAL_MEMORY_BYTES: 2729443328
INFO  :    VIRTUAL_MEMORY_BYTES: 27425902592
INFO  :    COMMITTED_HEAP_BYTES: 2729443328
INFO  :    INPUT_RECORDS_PROCESSED: 502
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 61338
INFO  :    OUTPUT_RECORDS: 504
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_BYTES: 120214
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 121752
INFO  :    OUTPUT_BYTES_PHYSICAL: 46586
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 22570
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 46586
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    SHUFFLE_CHUNK_COUNT: 3
INFO  :    SHUFFLE_BYTES: 46586
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 121752
INFO  :    SHUFFLE_BYTES_TO_MEM: 22570
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 24016
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    SHUFFLE_PHASE_TIME: 5484
INFO  :    MERGE_PHASE_TIME: 5613
INFO  :    FIRST_EVENT_RECEIVED: 127
INFO  :    LAST_EVENT_RECEIVED: 5243
INFO  : HIVE:
INFO  :    CREATED_FILES: 3
INFO  :    DESERIALIZE_ERRORS: 0
INFO  :    RECORDS_IN_Map_1: 502
INFO  :    RECORDS_OUT_0: 1
INFO  :    RECORDS_OUT_1_gmall.dws_uv_detail_day: 502
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_1: 502
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_2: 2
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_3: 0
INFO  :    RECORDS_OUT_OPERATOR_FS_14: 1
INFO  :    RECORDS_OUT_OPERATOR_FS_7: 502
INFO  :    RECORDS_OUT_OPERATOR_GBY_10: 2
INFO  :    RECORDS_OUT_OPERATOR_GBY_12: 1
INFO  :    RECORDS_OUT_OPERATOR_GBY_3: 502
INFO  :    RECORDS_OUT_OPERATOR_GBY_5: 502
INFO  :    RECORDS_OUT_OPERATOR_MAP_0: 0
INFO  :    RECORDS_OUT_OPERATOR_RS_11: 2
INFO  :    RECORDS_OUT_OPERATOR_RS_4: 502
INFO  :    RECORDS_OUT_OPERATOR_SEL_13: 1
INFO  :    RECORDS_OUT_OPERATOR_SEL_2: 502
INFO  :    RECORDS_OUT_OPERATOR_SEL_6: 502
INFO  :    RECORDS_OUT_OPERATOR_SEL_9: 502
INFO  :    RECORDS_OUT_OPERATOR_TS_0: 502
INFO  :    TOTAL_TABLE_ROWS_WRITTEN: 502
INFO  : Shuffle Errors:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : Shuffle Errors_Reducer_2_INPUT_Map_1:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : Shuffle Errors_Reducer_3_INPUT_Reducer_2:
INFO  :    BAD_ID: 0
INFO  :    CONNECTION: 0
INFO  :    IO_ERROR: 0
INFO  :    WRONG_LENGTH: 0
INFO  :    WRONG_MAP: 0
INFO  :    WRONG_REDUCE: 0
INFO  : TaskCounter_Map_1_INPUT_dwd_start_log:
INFO  :    INPUT_RECORDS_PROCESSED: 502
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 61338
INFO  : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 105540
INFO  :    OUTPUT_BYTES_PHYSICAL: 40988
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 107058
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 502
INFO  :    SHUFFLE_CHUNK_COUNT: 1
INFO  :    SPILLED_RECORDS: 502
INFO  : TaskCounter_Reducer_2_INPUT_Map_1:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 40988
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 19779
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    FIRST_EVENT_RECEIVED: 102
INFO  :    LAST_EVENT_RECEIVED: 102
INFO  :    MERGED_MAP_OUTPUTS: 2
INFO  :    MERGE_PHASE_TIME: 393
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_SHUFFLED_INPUTS: 2
INFO  :    NUM_SKIPPED_INPUTS: 0
INFO  :    REDUCE_INPUT_GROUPS: 502
INFO  :    REDUCE_INPUT_RECORDS: 502
INFO  :    SHUFFLE_BYTES: 40988
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 107058
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 21209
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 19779
INFO  :    SHUFFLE_PHASE_TIME: 272
INFO  :    SPILLED_RECORDS: 502
INFO  : TaskCounter_Reducer_2_OUTPUT_Reducer_3:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    OUTPUT_BYTES: 14674
INFO  :    OUTPUT_BYTES_PHYSICAL: 5598
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 14694
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 2
INFO  :    SHUFFLE_CHUNK_COUNT: 2
INFO  :    SPILLED_RECORDS: 2
INFO  : TaskCounter_Reducer_3_INPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 5598
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 2791
INFO  :    COMBINE_INPUT_RECORDS: 0
INFO  :    FIRST_EVENT_RECEIVED: 25
INFO  :    LAST_EVENT_RECEIVED: 5141
INFO  :    MERGED_MAP_OUTPUTS: 2
INFO  :    MERGE_PHASE_TIME: 5220
INFO  :    NUM_DISK_TO_DISK_MERGES: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_MEM_TO_DISK_MERGES: 0
INFO  :    NUM_SHUFFLED_INPUTS: 2
INFO  :    NUM_SKIPPED_INPUTS: 2
INFO  :    REDUCE_INPUT_GROUPS: 1
INFO  :    REDUCE_INPUT_RECORDS: 2
INFO  :    SHUFFLE_BYTES: 5598
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 14694
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 2807
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 2791
INFO  :    SHUFFLE_PHASE_TIME: 5212
INFO  :    SPILLED_RECORDS: 2
INFO  : TaskCounter_Reducer_3_OUTPUT_out_Reducer_3:
INFO  :    OUTPUT_RECORDS: 0
INFO  : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO  :    GROUPED_INPUT_SPLITS_Map_1: 1
INFO  :    INPUT_DIRECTORIES_Map_1: 1
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      2          2        0        0       0       0  h1.macro.com:8020/warehouse/tablespace/Reducer 3 ...... container     SUCCEEDED      2          2        0        0       0       0  211_5016652950517580007-8/-ext-10000
----------------------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 12.62 s    
----------------------------------------------------------------------------------------------
502 rows affected (19.065 seconds)
Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive
Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2

查看Hue页面,dws_uv_detail_day表已经生成数据了。

ADS层(需求:用户日活跃)

目标:当日活跃设备数

活跃设备数

建表

代码语言:javascript
复制
drop table if exists ads_uv_count;
create external table ads_uv_count( 
`dt` string COMMENT '统计日期',
`day_count` bigint COMMENT '当日用户数量'
) COMMENT '活跃设备数' 
row format delimited fields terminated by '\t';

ADS层加载数据脚本

代码语言:javascript
复制
#!/bin/bash

# 定义变量方便修改
APP=gmall

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
    do_date=$1
else 
    do_date=`date -d "-1 day" +%F`  
fi 

sql="
  set hive.exec.dynamic.partition.mode=nonstrict;

insert into table "$APP".ads_uv_count 
select  
  '$do_date' dt,
   daycount.ct
from 
(
   select  
      '$do_date' dt,
       count(*) ct
   from "$APP".dws_uv_detail_day
   where dt='$do_date'  
)daycount;
"

hive -e "$sql"

增加脚本执行权限

代码语言:javascript
复制
chmod 777 ads_uv_log.sh

执行脚本

代码语言:javascript
复制
[root@cdh2 25k]# ads_uv_log.sh 2020-11-24
No rows affected (0.07 seconds)
INFO  : Compiling command(queryId=hive_20201125002417_71fcb5d2-816d-4fce-9674-631a8b04cf3b): insert into table gmall.ads_uv_count 
select  
  '2020-11-24' dt,
   daycount.ct
from 
(
   select  
      '2020-11-24' dt,
       count(*) ct
   from gmall.dws_uv_detail_day
   where dt='2020-11-24'  
)daycount
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:dt, type:string, comment:null), FieldSchema(name:daycount.ct, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20201125002417_71fcb5d2-816d-4fce-9674-631a8b04cf3b); Time taken: 0.178 seconds
INFO  : Executing command(queryId=hive_20201125002417_71fcb5d2-816d-4fce-9674-631a8b04cf3b): insert into table gmall.ads_uv_count 
select  
  '2020-11-24' dt,
   daycount.ct
from 
(
   select  
      '2020-11-24' dt,
       count(*) ct
   from gmall.dws_uv_detail_day
   where dt='2020-11-24'  
)daycount
INFO  : Query ID = hive_20201125002417_71fcb5d2-816d-4fce-9674-631a8b04cf3b
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: hive_20201125002417_71fcb5d2-816d-4fce-9674-631a8b04cf3b
INFO  : Session is already open
INFO  : Dag name: insert into table gmall.ads_uv_...
)daycount (Stage-1)
INFO  : Tez session was closed. Reopening...
INFO  : Session re-established.
INFO  : Session re-established.
INFO  : Status: Running (Executing on YARN cluster with App id application_1606125802436_0007)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 5.82 s     
----------------------------------------------------------------------------------------------
INFO  : Status: DAG finished successfully in 5.32 seconds
INFO  : 
INFO  : Query Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  : OPERATION                            DURATION
INFO  : ----------------------------------------------------------------------------------------------
INFO  : Compile Query                           0.18s
INFO  : Prepare Plan                            0.09s
INFO  : Get Query Coordinator (AM)              0.01s
INFO  : Submit Plan                             4.27s
INFO  : Start DAG                               1.40s
INFO  : Run DAG                                 5.32s
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : Task Execution Summary
INFO  : ----------------------------------------------------------------------------------------------
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   INPUT_RECORDS   OUTPUT_RECORDS
INFO  : ----------------------------------------------------------------------------------------------
INFO  :      Map 1           3025.00          6,120             85             502                1
INFO  :  Reducer 2            261.00          1,050              8               1                0
INFO  : ----------------------------------------------------------------------------------------------
INFO  : 
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 2
INFO  :    TOTAL_LAUNCHED_TASKS: 2
INFO  :    RACK_LOCAL_TASKS: 1
INFO  :    AM_CPU_MILLISECONDS: 3350
INFO  :    AM_GC_TIME_MILLIS: 26
INFO  : File System Counters:
INFO  :    FILE_BYTES_READ: 59
INFO  :    FILE_BYTES_WRITTEN: 59
INFO  :    HDFS_BYTES_READ: 52659
INFO  :    HDFS_BYTES_WRITTEN: 258
INFO  :    HDFS_READ_OPS: 19
INFO  :    HDFS_WRITE_OPS: 5
INFO  :    HDFS_OP_CREATE: 3
INFO  :    HDFS_OP_GET_FILE_STATUS: 13
INFO  :    HDFS_OP_OPEN: 6
INFO  :    HDFS_OP_RENAME: 2
INFO  : org.apache.tez.common.counters.TaskCounter:
INFO  :    SPILLED_RECORDS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 1
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    GC_TIME_MILLIS: 93
INFO  :    TASK_DURATION_MILLIS: 3255
INFO  :    CPU_MILLISECONDS: 7170
INFO  :    PHYSICAL_MEMORY_BYTES: 524288000
INFO  :    VIRTUAL_MEMORY_BYTES: 10938982400
INFO  :    COMMITTED_HEAP_BYTES: 524288000
INFO  :    INPUT_RECORDS_PROCESSED: 3
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 49896
INFO  :    OUTPUT_RECORDS: 1
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_BYTES: 5
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 13
INFO  :    OUTPUT_BYTES_PHYSICAL: 51
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    SHUFFLE_BYTES: 27
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 13
INFO  :    SHUFFLE_BYTES_TO_MEM: 0
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 27
INFO  :    SHUFFLE_PHASE_TIME: 50
INFO  :    FIRST_EVENT_RECEIVED: 33
INFO  :    LAST_EVENT_RECEIVED: 33
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  : HIVE:
INFO  :    CREATED_FILES: 2
INFO  :    DESERIALIZE_ERRORS: 0
INFO  :    RECORDS_IN_Map_1: 502
INFO  :    RECORDS_OUT_0: 1
INFO  :    RECORDS_OUT_1_gmall.ads_uv_count: 1
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_1: 1
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO  :    RECORDS_OUT_OPERATOR_FS_14: 1
INFO  :    RECORDS_OUT_OPERATOR_FS_7: 1
INFO  :    RECORDS_OUT_OPERATOR_GBY_12: 1
INFO  :    RECORDS_OUT_OPERATOR_GBY_17: 1
INFO  :    RECORDS_OUT_OPERATOR_GBY_5: 1
INFO  :    RECORDS_OUT_OPERATOR_MAP_0: 0
INFO  :    RECORDS_OUT_OPERATOR_RS_18: 1
INFO  :    RECORDS_OUT_OPERATOR_SEL_13: 1
INFO  :    RECORDS_OUT_OPERATOR_SEL_16: 502
INFO  :    RECORDS_OUT_OPERATOR_SEL_6: 1
INFO  :    RECORDS_OUT_OPERATOR_SEL_9: 1
INFO  :    RECORDS_OUT_OPERATOR_TS_0: 502
INFO  :    TOTAL_TABLE_ROWS_WRITTEN: 1
INFO  : TaskCounter_Map_1_INPUT_dws_uv_detail_day:
INFO  :    INPUT_RECORDS_PROCESSED: 2
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 49896
INFO  : TaskCounter_Map_1_OUTPUT_Reducer_2:
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    DATA_BYTES_VIA_EVENT: 0
INFO  :    OUTPUT_BYTES: 5
INFO  :    OUTPUT_BYTES_PHYSICAL: 51
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 13
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_RECORDS: 1
INFO  :    SPILLED_RECORDS: 0
INFO  : TaskCounter_Reducer_2_INPUT_Map_1:
INFO  :    FIRST_EVENT_RECEIVED: 33
INFO  :    INPUT_RECORDS_PROCESSED: 1
INFO  :    LAST_EVENT_RECEIVED: 33
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 1
INFO  :    SHUFFLE_BYTES: 27
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 13
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 27
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 0
INFO  :    SHUFFLE_PHASE_TIME: 50
INFO  : TaskCounter_Reducer_2_OUTPUT_out_Reducer_2:
INFO  :    OUTPUT_RECORDS: 0
INFO  : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
INFO  :    GROUPED_INPUT_SPLITS_Map_1: 1
INFO  :    INPUT_DIRECTORIES_Map_1: 1
INFO  :    INPUT_FILES_Map_1: 2
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 5.83 s     
----------------------------------------------------------------------------------------------
1 row affected (11.783 seconds)
Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive
Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2

查看hue,可以看到ads_uv_count表的数据

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-12-10,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 数据湖 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • ODS层
  • 启动Hive客户端,创建gmall数据库
  • 创建启动日志表
  • ODS层加载数据脚本
  • DWD层
  • 创建启动表
  • DWS层(用户日活跃)
  • 每日活跃设备分析
  • ADS层(需求:用户日活跃)
  • 活跃设备数
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档