专栏首页Hadoop数据仓库基于Hadoop生态圈的数据仓库实践 —— 进阶技术(九)

基于Hadoop生态圈的数据仓库实践 —— 进阶技术(九)

九、退化维度 本节讨论一种称为退化维度的技术。该技术减少维度的数量,简化维度数据仓库模式。简单的模式比复杂的更容易理解,也有更好的查询性能。当一个维度没有数据仓库需要的任何数据时就可以退化此维度,此时需要把退化维度的相关数据迁移到事实表中,然后删除退化的维度。 1. 退化订单维度 本小节说明如何退化订单维度,包括对数据仓库模式和定期装载脚本的修改。使用维度退化技术时你首先要识别数据,分析从来不用的数据列。例如,订单维度的order_number列就可能是这样的一列。但如果用户想看事务的细节,还需要订单号。因此,在退化订单维度前,要把订单号迁移到sales_order_fact表。下图显示了迁移后的模式。

按顺序执行下面的四步退化order_dim维度表: (1)给sales_order_fact表添加order_number列 (2)把order_dim表里的订单号迁移到sales_order_fact表 (3)删除sales_order_fact表里的order_sk列 (4)删除order_dim表 下面的脚本完成所有退化订单维度所需的步骤。

use dw;  
alter table sales_order_fact rename to sales_order_fact_old;
create table sales_order_fact(                                              
   order_number int COMMENT 'order number',                                   
   customer_sk int COMMENT 'customer surrogate key',                             
   product_sk int COMMENT 'product surrogate key',                               
   order_date_sk int COMMENT 'order date surrogate key',                         
   allocate_date_sk int COMMENT 'allocate date surrogate key',                   
   allocate_quantity int COMMENT 'allocate quantity',                            
   packing_date_sk int COMMENT 'packing date surrogate key',                     
   packing_quantity int COMMENT 'packing quantity',                              
   ship_date_sk int COMMENT 'ship date surrogate key',                           
   ship_quantity int COMMENT 'ship quantity',                                    
   receive_date_sk int COMMENT 'receive date surrogate key',                     
   receive_quantity int COMMENT 'receive quantity',                              
   request_delivery_date_sk int COMMENT 'request delivery date surrogate key',   
   order_amount decimal(10,2) COMMENT 'order amount',                            
   order_quantity int COMMENT 'order quantity')
clustered by (order_number) into 8 buckets  
stored as orc tblproperties ('transactional'='true'); 

insert into table sales_order_fact
select t2.order_number,
       t1.customer_sk,                             
       t1.product_sk,                               
       t1.order_date_sk,                         
       t1.allocate_date_sk,                   
       t1.allocate_quantity,                            
       t1.packing_date_sk,                     
       t1.packing_quantity,                              
       t1.ship_date_sk,                           
       t1.ship_quantity,                                    
       t1.receive_date_sk,                     
       t1.receive_quantity,                              
       t1.request_delivery_date_sk,   
       t1.order_amount,                            
       t1.order_quantity
  from sales_order_fact_old t1 
 inner join order_dim t2 on t1.order_sk = t2.order_sk;

drop table sales_order_fact_old;
drop table order_dim;

2. 修改定期装载脚本 退化一个维度后需要做的另一件事就是修改定期装载脚本。修改后的脚本需要把订单号加入到销售订单事实表,而不再需要导入订单维度。下面显示了修改后的regular_etl.sql脚本文件内容。

-- 设置变量以支持事务    
set hive.support.concurrency=true;    
set hive.exec.dynamic.partition.mode=nonstrict;    
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;    
set hive.compactor.initiator.on=true;    
set hive.compactor.worker.threads=1;    
    
USE dw;    
      
-- 设置SCD的生效时间和过期时间    
SET hivevar:cur_date = CURRENT_DATE();
SET hivevar:pre_date = DATE_ADD(${hivevar:cur_date},-1);    
SET hivevar:max_date = CAST('2200-01-01' AS DATE);    
      
-- 设置CDC的上限时间    
INSERT OVERWRITE TABLE rds.cdc_time SELECT last_load, ${hivevar:cur_date} FROM rds.cdc_time;    
    
-- 装载customer维度    
-- 设置已删除记录和地址相关列上SCD2的过期,用<=>运算符处理NULL值。    
UPDATE customer_dim     
   SET expiry_date = ${hivevar:pre_date}      
 WHERE customer_dim.customer_sk IN      
(SELECT a.customer_sk     
   FROM (SELECT customer_sk,    
                customer_number,    
                customer_street_address,    
                customer_zip_code,    
                customer_city,    
                customer_state,    
                shipping_address,    
                shipping_zip_code,    
                shipping_city,    
                shipping_state    
           FROM customer_dim WHERE expiry_date = ${hivevar:max_date}) a LEFT JOIN     
                rds.customer b ON a.customer_number = b.customer_number     
          WHERE b.customer_number IS NULL OR     
          (  !(a.customer_street_address <=> b.customer_street_address)    
          OR !(a.customer_zip_code <=> b.customer_zip_code)    
          OR !(a.customer_city <=> b.customer_city)    
          OR !(a.customer_state <=> b.customer_state)    
          OR !(a.shipping_address <=> b.shipping_address)    
          OR !(a.shipping_zip_code <=> b.shipping_zip_code)    
          OR !(a.shipping_city <=> b.shipping_city)    
          OR !(a.shipping_state <=> b.shipping_state)    
          ));     
    
-- 处理customer_street_addresses列上SCD2的新增行      
INSERT INTO customer_dim    
SELECT    
    ROW_NUMBER() OVER (ORDER BY t1.customer_number) + t2.sk_max,    
    t1.customer_number,    
    t1.customer_name,    
    t1.customer_street_address,    
    t1.customer_zip_code,    
    t1.customer_city,    
    t1.customer_state,    
    t1.shipping_address,    
    t1.shipping_zip_code,    
    t1.shipping_city,    
    t1.shipping_state,    
    t1.version,    
    t1.effective_date,    
    t1.expiry_date    
FROM      
(      
SELECT      
    t2.customer_number customer_number,    
    t2.customer_name customer_name,    
    t2.customer_street_address customer_street_address,    
    t2.customer_zip_code customer_zip_code,    
    t2.customer_city customer_city,    
    t2.customer_state customer_state,    
    t2.shipping_address shipping_address,    
    t2.shipping_zip_code shipping_zip_code,    
    t2.shipping_city shipping_city,    
    t2.shipping_state shipping_state,    
    t1.version + 1 version,    
    ${hivevar:pre_date} effective_date,      
    ${hivevar:max_date} expiry_date      
 FROM customer_dim t1     
INNER JOIN rds.customer t2      
   ON t1.customer_number = t2.customer_number       
  AND t1.expiry_date = ${hivevar:pre_date}      
 LEFT JOIN customer_dim t3     
   ON t1.customer_number = t3.customer_number     
  AND t3.expiry_date = ${hivevar:max_date}      
WHERE (!(t1.customer_street_address <=> t2.customer_street_address)    
   OR  !(t1.customer_zip_code <=> t2.customer_zip_code)    
   OR  !(t1.customer_city <=> t2.customer_city)    
   OR  !(t1.customer_state <=> t2.customer_state)    
   OR  !(t1.shipping_address <=> t2.shipping_address)    
   OR  !(t1.shipping_zip_code <=> t2.shipping_zip_code)    
   OR  !(t1.shipping_city <=> t2.shipping_city)    
   OR  !(t1.shipping_state <=> t2.shipping_state)    
   )    
  AND t3.customer_sk IS NULL) t1      
CROSS JOIN      
(SELECT COALESCE(MAX(customer_sk),0) sk_max FROM customer_dim) t2;    
    
-- 处理customer_name列上的SCD1    
-- 因为hive的update的set子句还不支持子查询,所以这里使用了一个临时表存储需要更新的记录,用先delete再insert代替update    
-- 因为SCD1本身就不保存历史数据,所以这里更新维度表里的所有customer_name改变的记录,而不是仅仅更新当前版本的记录    
DROP TABLE IF EXISTS tmp;    
CREATE TABLE tmp AS    
SELECT    
    a.customer_sk,    
    a.customer_number,    
    b.customer_name,    
    a.customer_street_address,    
    a.customer_zip_code,    
    a.customer_city,    
    a.customer_state,    
    a.shipping_address,    
    a.shipping_zip_code,    
    a.shipping_city,    
    a.shipping_state,    
    a.version,    
    a.effective_date,    
    a.expiry_date    
  FROM customer_dim a, rds.customer b      
 WHERE a.customer_number = b.customer_number AND !(a.customer_name <=> b.customer_name);      
DELETE FROM customer_dim WHERE customer_dim.customer_sk IN (SELECT customer_sk FROM tmp);      
INSERT INTO customer_dim SELECT * FROM tmp;    
    
-- 处理新增的customer记录     
INSERT INTO customer_dim    
SELECT    
    ROW_NUMBER() OVER (ORDER BY t1.customer_number) + t2.sk_max,    
    t1.customer_number,    
    t1.customer_name,    
    t1.customer_street_address,    
    t1.customer_zip_code,    
    t1.customer_city,    
    t1.customer_state,    
    t1.shipping_address,    
    t1.shipping_zip_code,    
    t1.shipping_city,    
    t1.shipping_state,    
    1,    
    ${hivevar:pre_date},    
    ${hivevar:max_date}    
FROM      
(      
SELECT t1.* FROM rds.customer t1 LEFT JOIN customer_dim t2 ON t1.customer_number = t2.customer_number      
 WHERE t2.customer_sk IS NULL) t1      
CROSS JOIN      
(SELECT COALESCE(MAX(customer_sk),0) sk_max FROM customer_dim) t2;    
    
-- 重载PA客户维度    
TRUNCATE TABLE pa_customer_dim;      
INSERT INTO pa_customer_dim      
SELECT      
  customer_sk      
, customer_number      
, customer_name      
, customer_street_address      
, customer_zip_code      
, customer_city      
, customer_state      
, shipping_address      
, shipping_zip_code      
, shipping_city      
, shipping_state      
, version      
, effective_date      
, expiry_date      
FROM customer_dim      
WHERE customer_state = 'PA' ;     
    
-- 装载product维度    
-- 设置已删除记录和product_name、product_category列上SCD2的过期    
UPDATE product_dim    
   SET expiry_date = ${hivevar:pre_date}      
 WHERE product_dim.product_sk IN      
(SELECT a.product_sk     
   FROM (SELECT product_sk,product_code,product_name,product_category     
           FROM product_dim WHERE expiry_date = ${hivevar:max_date}) a LEFT JOIN     
                rds.product b ON a.product_code = b.product_code     
          WHERE b.product_code IS NULL OR (a.product_name <> b.product_name OR a.product_category <> b.product_category));    
    
-- 处理product_name、product_category列上SCD2的新增行      
INSERT INTO product_dim    
SELECT    
    ROW_NUMBER() OVER (ORDER BY t1.product_code) + t2.sk_max,    
    t1.product_code,    
    t1.product_name,    
    t1.product_category,    
    t1.version,    
    t1.effective_date,    
    t1.expiry_date    
FROM      
(      
SELECT      
    t2.product_code product_code,    
    t2.product_name product_name,    
    t2.product_category product_category,        
    t1.version + 1 version,    
    ${hivevar:pre_date} effective_date,      
    ${hivevar:max_date} expiry_date      
 FROM product_dim t1     
INNER JOIN rds.product t2      
   ON t1.product_code = t2.product_code      
  AND t1.expiry_date = ${hivevar:pre_date}      
 LEFT JOIN product_dim t3     
   ON t1.product_code = t3.product_code     
  AND t3.expiry_date = ${hivevar:max_date}      
WHERE (t1.product_name <> t2.product_name OR t1.product_category <> t2.product_category) AND t3.product_sk IS NULL) t1      
CROSS JOIN      
(SELECT COALESCE(MAX(product_sk),0) sk_max FROM product_dim) t2;    
    
-- 处理新增的product记录    
INSERT INTO product_dim    
SELECT    
    ROW_NUMBER() OVER (ORDER BY t1.product_code) + t2.sk_max,    
    t1.product_code,    
    t1.product_name,    
    t1.product_category,    
    1,    
    ${hivevar:pre_date},    
    ${hivevar:max_date}    
FROM      
(      
SELECT t1.* FROM rds.product t1 LEFT JOIN product_dim t2 ON t1.product_code = t2.product_code      
 WHERE t2.product_sk IS NULL) t1      
CROSS JOIN      
(SELECT COALESCE(MAX(product_sk),0) sk_max FROM product_dim) t2;    
    
-- 装载销售订单事实表 
-- 前一天新增的销售订单   
INSERT INTO sales_order_fact    
SELECT    
    a.order_number,    
    customer_sk,    
    product_sk,    
    e.order_date_sk,
    null,
    null,
    null,
    null,
    null,
    null,
    null,
    null,
    f.request_delivery_date_sk,
    order_amount,    
    quantity    
  FROM    
    rds.sales_order a,     
    customer_dim c,    
    product_dim d,    
    order_date_dim e,  
    request_delivery_date_dim f,  
    rds.cdc_time g    
 WHERE 
    a.order_status = 'N'
AND a.customer_number = c.customer_number    
AND a.status_date >= c.effective_date    
AND a.status_date < c.expiry_date    
AND a.product_code = d.product_code    
AND a.status_date >= d.effective_date    
AND a.status_date < d.expiry_date    
AND to_date(a.status_date) = e.order_date   
AND to_date(a.request_delivery_date) = f.request_delivery_date   
AND a.entry_date >= g.last_load AND a.entry_date < g.current_load ;    

-- 处理分配库房、打包、配送和收货四个状态
DROP TABLE IF EXISTS tmp;
CREATE TABLE tmp AS
select t0.order_number order_number,
       t0.customer_sk customer_sk,
       t0.product_sk product_sk,
       t0.order_date_sk order_date_sk,
       t2.allocate_date_sk allocate_date_sk,
       t1.quantity allocate_quantity,
       t0.packing_date_sk packing_date_sk,
       t0.packing_quantity packing_quantity,
       t0.ship_date_sk ship_date_sk,
       t0.ship_quantity ship_quantity,
       t0.receive_date_sk receive_date_sk,
       t0.receive_quantity receive_quantity,
       t0.request_delivery_date_sk request_delivery_date_sk,
       t0.order_amount order_amount,
       t0.order_quantity order_quantity
  from sales_order_fact t0,
       rds.sales_order t1,
       allocate_date_dim t2,
       rds.cdc_time t4
 where t0.order_number = t1.order_number and t1.order_status = 'A' 
   and to_date(t1.status_date) = t2.allocate_date
   and t1.entry_date >= t4.last_load and t1.entry_date < t4.current_load;

DELETE FROM sales_order_fact WHERE sales_order_fact.order_number IN (SELECT order_number FROM tmp); 
INSERT INTO sales_order_fact SELECT * FROM tmp;

DROP TABLE IF EXISTS tmp;
CREATE TABLE tmp AS
select t0.order_number order_number,
       t0.customer_sk customer_sk,
       t0.product_sk product_sk,
       t0.order_date_sk order_date_sk,
       t0.allocate_date_sk allocate_date_sk,
       t0.allocate_quantity allocate_quantity,
       t2.packing_date_sk packing_date_sk,
       t1.quantity packing_quantity,
       t0.ship_date_sk ship_date_sk,
       t0.ship_quantity ship_quantity,
       t0.receive_date_sk receive_date_sk,
       t0.receive_quantity receive_quantity,
       t0.request_delivery_date_sk request_delivery_date_sk,
       t0.order_amount order_amount,
       t0.order_quantity order_quantity
  from sales_order_fact t0,
       rds.sales_order t1,
       packing_date_dim t2,
       rds.cdc_time t4
 where t0.order_number = t1.order_number and t1.order_status = 'P' 
   and to_date(t1.status_date) = t2.packing_date
   and t1.entry_date >= t4.last_load and t1.entry_date < t4.current_load; 
   
DELETE FROM sales_order_fact WHERE sales_order_fact.order_number IN (SELECT order_number FROM tmp); 
INSERT INTO sales_order_fact SELECT * FROM tmp;

DROP TABLE IF EXISTS tmp;
CREATE TABLE tmp AS
select t0.order_number order_number,
       t0.customer_sk customer_sk,
       t0.product_sk product_sk,
       t0.order_date_sk order_date_sk,
       t0.allocate_date_sk allocate_date_sk,
       t0.allocate_quantity allocate_quantity,
       t0.packing_date_sk packing_date_sk,
       t0.packing_quantity packing_quantity,
       t2.ship_date_sk ship_date_sk,
       t1.quantity ship_quantity,
       t0.receive_date_sk receive_date_sk,
       t0.receive_quantity receive_quantity,
       t0.request_delivery_date_sk request_delivery_date_sk,
       t0.order_amount order_amount,
       t0.order_quantity order_quantity
  from sales_order_fact t0,
       rds.sales_order t1,
       ship_date_dim t2,
       rds.cdc_time t4
 where t0.order_number = t1.order_number and t1.order_status = 'S' 
   and to_date(t1.status_date) = t2.ship_date
   and t1.entry_date >= t4.last_load and t1.entry_date < t4.current_load;
   
DELETE FROM sales_order_fact WHERE sales_order_fact.order_number IN (SELECT order_number FROM tmp); 
INSERT INTO sales_order_fact SELECT * FROM tmp;

DROP TABLE IF EXISTS tmp;
CREATE TABLE tmp AS
select t0.order_number order_number,
       t0.customer_sk customer_sk,
       t0.product_sk product_sk,
       t0.order_date_sk order_date_sk,
       t0.allocate_date_sk allocate_date_sk,
       t0.allocate_quantity allocate_quantity,
       t0.packing_date_sk packing_date_sk,
       t0.packing_quantity packing_quantity,
       t0.ship_date_sk ship_date_sk,
       t0.ship_quantity ship_quantity,
       t2.receive_date_sk receive_date_sk,
       t1.quantity receive_quantity,
       t0.request_delivery_date_sk request_delivery_date_sk,
       t0.order_amount order_amount,
       t0.order_quantity order_quantity
  from sales_order_fact t0,
       rds.sales_order t1,
       receive_date_dim t2,
       rds.cdc_time t4
 where t0.order_number = t1.order_number and t1.order_status = 'R' 
   and to_date(t1.status_date) = t2.receive_date
   and t1.entry_date >= t4.last_load and t1.entry_date < t4.current_load;
   
DELETE FROM sales_order_fact WHERE sales_order_fact.order_number IN (SELECT order_number FROM tmp); 
INSERT INTO sales_order_fact SELECT * FROM tmp;

-- 更新时间戳表的last_load字段    
INSERT OVERWRITE TABLE rds.cdc_time SELECT current_load, current_load FROM rds.cdc_time;

3. 测试修改后的定期装载 (1)准备测试数据 测试使用具有分配库房、打包、配送和收货里程碑的两个新订单。所以每个订单需要添加五行。下面的脚本向源数据库里的sales_order表新增十行。

USE source; 
DROP TABLE IF EXISTS temp_sales_order_data;  
CREATE TABLE temp_sales_order_data AS SELECT * FROM sales_order WHERE 1=0;  

SET @start_date := unix_timestamp('2016-07-25');  
SET @end_date := unix_timestamp('2016-07-26');  
SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));  
SET @amount := floor(1000 + rand() * 9000); 
SET @quantity := floor(10 + rand() * 90);    
INSERT INTO temp_sales_order_data VALUES (1, 131, 1, 1, @order_date, 'N', '2016-08-01', @order_date, @amount, @quantity); 

SET @start_date := unix_timestamp('2016-07-25');  
SET @end_date := unix_timestamp('2016-07-26');  
SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));  
SET @amount := floor(1000 + rand() * 9000); 
SET @quantity := floor(10 + rand() * 90);    
INSERT INTO temp_sales_order_data VALUES (2, 132, 2, 2, @order_date, 'N', '2016-08-01', @order_date, @amount, @quantity); 

SET @start_date := unix_timestamp('2016-07-26');  
SET @end_date := unix_timestamp('2016-07-27');  
SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));  
SET @amount := floor(1000 + rand() * 9000); 
SET @quantity := floor(10 + rand() * 90);    
INSERT INTO temp_sales_order_data VALUES (3, 131, 1, 1, @order_date, 'A', '2016-08-01', @order_date, @amount, @quantity); 

SET @start_date := unix_timestamp('2016-07-26');  
SET @end_date := unix_timestamp('2016-07-27');  
SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));  
SET @amount := floor(1000 + rand() * 9000); 
SET @quantity := floor(10 + rand() * 90);    
INSERT INTO temp_sales_order_data VALUES (4, 132, 2, 2, @order_date, 'A', '2016-08-01', @order_date, @amount, @quantity);

SET @start_date := unix_timestamp('2016-07-27');  
SET @end_date := unix_timestamp('2016-07-28');  
SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));  
SET @amount := floor(1000 + rand() * 9000); 
SET @quantity := floor(10 + rand() * 90);    
INSERT INTO temp_sales_order_data VALUES (5, 131, 1, 1, @order_date, 'P', '2016-08-01', @order_date, @amount, @quantity); 

SET @start_date := unix_timestamp('2016-07-27');  
SET @end_date := unix_timestamp('2016-07-28');  
SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));  
SET @amount := floor(1000 + rand() * 9000); 
SET @quantity := floor(10 + rand() * 90);    
INSERT INTO temp_sales_order_data VALUES (6, 132, 2, 2, @order_date, 'P', '2016-08-01', @order_date, @amount, @quantity);

SET @start_date := unix_timestamp('2016-07-28');  
SET @end_date := unix_timestamp('2016-07-29');  
SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));  
SET @amount := floor(1000 + rand() * 9000); 
SET @quantity := floor(10 + rand() * 90);    
INSERT INTO temp_sales_order_data VALUES (7, 131, 1, 1, @order_date, 'S', '2016-08-01', @order_date, @amount, @quantity); 

SET @start_date := unix_timestamp('2016-07-28');  
SET @end_date := unix_timestamp('2016-07-29');  
SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));  
SET @amount := floor(1000 + rand() * 9000); 
SET @quantity := floor(10 + rand() * 90);    
INSERT INTO temp_sales_order_data VALUES (8, 132, 2, 2, @order_date, 'S', '2016-08-01', @order_date, @amount, @quantity);

SET @start_date := unix_timestamp('2016-07-29');  
SET @end_date := unix_timestamp('2016-07-30');  
SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));  
SET @amount := floor(1000 + rand() * 9000); 
SET @quantity := floor(10 + rand() * 90);    
INSERT INTO temp_sales_order_data VALUES (9, 131, 1, 1, @order_date, 'R', '2016-08-01', @order_date, @amount, @quantity); 

SET @start_date := unix_timestamp('2016-07-29');  
SET @end_date := unix_timestamp('2016-07-30');  
SET @order_date := from_unixtime(@start_date + rand() * (@end_date - @start_date));  
SET @amount := floor(1000 + rand() * 9000); 
SET @quantity := floor(10 + rand() * 90);    
INSERT INTO temp_sales_order_data VALUES (10, 132, 2, 2, @order_date, 'R', '2016-08-01', @order_date, @amount, @quantity);

INSERT INTO sales_order        
select null,  
       order_number,  
       customer_number,  
       product_code,  
       status_date,  
       order_status,  
       request_delivery_date,  
       entry_date,  
       order_amount,  
       quantity  
  from temp_sales_order_data t1
 order by t1.status_date;   

COMMIT ;

(2)执行五次定期装载

use rds;  
INSERT OVERWRITE TABLE rds.cdc_time SELECT '2016-07-25', '2016-07-26' FROM rds.cdc_time;

将regular_etl.sql文件中的SET hivevar:cur_date = CURRENT_DATE();行改为SET hivevar:cur_date = '2016-07-26';

./regular_etl.sh
use rds;  
INSERT OVERWRITE TABLE rds.cdc_time SELECT '2016-07-26', '2016-07-27' FROM rds.cdc_time;

将regular_etl.sql文件中的SET hivevar:cur_date = CURRENT_DATE();行改为SET hivevar:cur_date = '2016-07-27';

./regular_etl.sh
use rds;  
INSERT OVERWRITE TABLE rds.cdc_time SELECT '2016-07-27', '2016-07-28' FROM rds.cdc_time;

将regular_etl.sql文件中的SET hivevar:cur_date = CURRENT_DATE();行改为SET hivevar:cur_date = '2016-07-28';

./regular_etl.sh
use rds;  
INSERT OVERWRITE TABLE rds.cdc_time SELECT '2016-07-28', '2016-07-29' FROM rds.cdc_time;

将regular_etl.sql文件中的SET hivevar:cur_date = CURRENT_DATE();行改为SET hivevar:cur_date = '2016-07-29';

./regular_etl.sh
use rds;  
INSERT OVERWRITE TABLE rds.cdc_time SELECT '2016-07-29', '2016-07-30' FROM rds.cdc_time;

将regular_etl.sql文件中的SET hivevar:cur_date = CURRENT_DATE();行改为SET hivevar:cur_date = '2016-07-30';

./regular_etl.sh

查询sales_order_fact表的两条订单。

use dw;
select t1.order_number orn,
       t2.order_date od,
       t1.order_quantity oq,
       t3.allocate_date ad,
       t1.allocate_quantity aq,
       t4.packing_date pd,
       t1.packing_quantity pq,
       t5.ship_date sd,
       t1.ship_quantity sq,
       t6.receive_date rd,
       t1.receive_quantity rq
  from sales_order_fact t1
       inner join order_date_dim t2 on t1.order_date_sk = t2.order_date_sk
        left join allocate_date_dim t3 on t1.allocate_date_sk = t3.allocate_date_sk
        left join packing_date_dim t4 on t1.packing_date_sk = t4.packing_date_sk
        left join ship_date_dim t5 on t1.ship_date_sk = t5.ship_date_sk
        left join receive_date_dim t6 on t1.receive_date_sk = t6.receive_date_sk
 where t1.order_number IN (131 , 132);

查询结果如下图所示。

测试完将regular_etl.sql文件中的SET hivevar:cur_date = CURRENT_DATE();行恢复。

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

推荐阅读

  • 「网安夜校」开课啦!多门网络安全课程开启限时优惠报名

    众志成城,共抗疫情。腾讯安全联合腾讯云大学、腾讯课堂启动「网安夜校」,为大家提供限时优惠的网络安全课程。欢迎网络安全从业者和信息安全专业学生报名参加学习,快速充电提升自我。

    腾讯安全
    安全培训腾讯云大学
  • Flink源码走读(一):Flink工程目录

    导语 | Flink已经成为未来流计算趋势,目前在很多大厂已经有了大规模的使用。最近在学习Flink源码,就想把自己学习的过程分享出来,希望能帮助到志同道合的朋友。开始阅读源码,说明读者已经对flink的基本概念有一些了解,这里就不再重复介绍Flink了。本文作为学习过程的第一章,首先对Flink的工程目录做一个解读,了解了工程下各个模块的作用,才能在遇到问题时准确定位到代码,进一步学习。

    2011aad
    大数据解决方案
  • Flink源码走读(二):Flink+Kafka实现端到端Exactly Once语义

    Flink通过Checkpoint机制实现了消息对状态影响的Exactly Once语义,即每条消息只会影响Flink内部状态有且只有一次。但无法保证输出到Sink中的数据不重复。以图一所示为例,Flink APP收到Source中的A消息,将其转化为B消息输出到Sink,APP在处理完A1后做了一次Checkpoint,假设APP在处理到A4时发生错误重启,APP将会重新从A2开始消费并处理数据,就会导致B2和B3重复输出到Sink中两次。

    2011aad
    大数据解决方案Kafka
  • kubernetes系列教程(十九)使用metric-server让HPA弹性伸缩愉快运行

    kubernetes监控指标大体可以分为两类:核心监控指标和自定义指标,核心监控指标是kubernetes内置稳定可靠监控指标,早期由heapster完成,现由metric-server实现;自定义指标用于实现核心指标的扩展,能够提供更丰富的指标支持,如应用状态指标,自定义指标需要通过Aggregator和k8s api集成,当前主流通过promethues实现。

    HappyLau谈云计算
    Kubernetes容器微服务微服务架构腾讯微服务平台 TFS
  • 三分钟入坑指北 🔜 Docsify + Serverless Framework 快速创建个人博客系统

    之前由于学摄影的关系,为了提高自己的审美,顺便锻炼下自己的英文能力,翻译了不少国外艺术类的 文章。最近一直想搭一个个人博客来存放这些内容,又懒得折腾建站,遂一直搁置。

    Aceyclee
    ServerlessHTML网站GitGitHub
  • NVM作为主存上对数据库管理系统的影响

    implications of non-volatile memory as primary storage for database management systems

    yzsDBA
    存储缓存数据库数据结构SQL
  • DevOps平台架构演进

    附最新架构图https://www.processon.com/view/5cbd897de4b0bab90962c435

    我思故我在
    DevOps 解决方案微服务架构架构设计
  • 【腾讯云AI小程序大赛】中山大学作品《小耳朵天使》

    ----------------------------------------------------------------------------------

    陈华山
    小程序 · 云开发小程序语音识别文字识别对话机器人
  • Kona JDK 在腾讯大数据领域内的实践与发展

    经常听人谈到 OpenJDK,那它到底是什么呢?相信大家都听说过 Java SE、ME、EE等规范, 通常意义上对 Open JDK 的定义指:Java SE规范的一个免费和开源参考实现。

    腾小云
    JDKJavaJVM大数据Oracle
  • 公告丨腾讯安全产品更名通知

    为了更好地为政企客户的安全保驾护航,腾讯安全即日起更新旗下身份安全、网络安全、终端安全、应用安全、数据安全、业务安全、安全管理、安全服务等八类安全产品的命名,致力于打造全栈安全产品“货架”,让客户选购安全产品/服务更加便捷,更快地找到合适的安全产品,从而对自身的安全建设“对症下药”。

    腾讯安全
    DDoS 防护应用安全 MS验证码(业务安全)应用安全(移动安全)漏洞扫描服务

扫码关注云+社区

领取腾讯云代金券