本节讨论多路径层次,它是对单路径层次的扩展。上一节里数据仓库的月维度只有一条层次路径,即年-季度-月这条路径。在本节中加一个新的级别——促销期,并且加一个新的年-促销期-月的层次路径。这时月维度将有两条层次路径,因此具有多路径层次。本节讨论的另一个主题是不完全层次,这种层次在它的一个或多个级别上没有数据。
1. 增加一个层次
下面的脚本给month_dim表添加一个叫做campaign_session的新列,并建立rds.campaign_session过渡表。
use dw;
-- 增加促销期列
alter table month_dim rename to month_dim_old;
create table month_dim (
month_sk int comment 'surrogate key',
month tinyint comment 'month',
month_name varchar(9) comment 'month name',
campaign_session varchar(30) comment 'campaign session',
quarter tinyint comment 'quarter',
year smallint comment 'year'
)
comment 'month dimension table'
clustered by (month_sk) into 8 buckets
stored as orc tblproperties ('transactional'='true') ;
insert into month_dim select month_sk,month,month_name,null,quarter,year from month_dim_old;
drop table month_dim_old;
-- 建立促销期过渡表
use rds;
create table campaign_session (
campaign_session varchar(30),
month tinyint,
year smallint
)
row format delimited fields terminated by ',' stored as textfile;
修改后的模式如下图所示。
假设所有促销期都不跨年,并且一个促销期可以包含一个或多个年月,但一个年月只能属于一个促销期。为了理解促销期如何工作,看下表的示例。
Campaign Session |
Month |
2016 First Campaign |
January-April |
2016 Second Campaign |
May-July |
2016 Third Campaign |
August-August |
2016 Last Campaign |
September-December |
2016 First Campaign,1,20162016 First Campaign,2,20162016 First Campaign,3,20162016 First Campaign,4,20162016 Second Campaign,5,20162016 Second Campaign,6,20162016 Second Campaign,7,20162016 Third Campaign,8,20162016 Last Campaign,9,20162016 Last Campaign,10,20162016 Last Campaign,11,20162016 Last Campaign,12,2016现在可以执行下面的脚本把2016年的促销期数据装载进月维度。
use rds;load data local inpath '/root/campaign_session.csv' overwrite into table campaign_session;use dw;drop table if exists tmp;create table tmp as select t1.month_sk month_sk, t1.month month, t1.month_name month_name, t2.campaign_session campaign_session, t1.quarter quarter, t1.year year from month_dim t1 inner join rds.campaign_session t2 on t1.year = t2.year and t1.month = t2.month;delete from month_dim where month_dim.month_sk in (select month_sk from tmp);insert into month_dim select * from tmp;select year,month,campaign_session from dw.month_dim;查询结果如下图所示,2016年的促销期已经有数据,其它年份的campaign_session字段值为null。
2. 层次查询
下面的语句查询年-促销期-月层次。
USE dw; SELECT product_category, time, order_amount, order_quantity FROM (SELECT * FROM (SELECT product_category, year, 1 month, year time, 1 sequence, SUM(month_order_amount) order_amount, SUM(month_order_quantity) order_quantity FROM month_end_sales_order_fact a, product_dim b, month_dim c WHERE a.product_sk = b.product_sk AND a.order_month_sk = c.month_sk AND year = 2016 GROUP BY product_category, year UNION ALL SELECT product_category, year, month, campaign_session time, 2 sequence, SUM(month_order_amount) order_amount, SUM(month_order_quantity) order_quantity FROM month_end_sales_order_fact a, product_dim b, month_dim c WHERE a.product_sk = b.product_sk AND a.order_month_sk = c.month_sk AND year = 2016 GROUP BY product_category, year, month, campaign_session UNION ALL SELECT product_category, year, month, month_name time, 3 sequence, SUM(month_order_amount) order_amount, SUM(month_order_quantity) order_quantity FROM month_end_sales_order_fact a, product_dim b, month_dim c WHERE a.product_sk = b.product_sk AND a.order_month_sk = c.month_sk AND year = 2016 GROUP BY product_category, year, quarter, month, month_name) tCLUSTER BY product_category, year, month, sequence) t;查询结果如下图所示。
3. 不完全层次
在一个或多个级别上没有数据的层次称为不完全层次。例如在特定月份没有促销期,那么月维度就具有不完全推广期层次。本小节说明不完全层次,还有在促销期上如何应用它。
下面是一个不完全促销期(在ragged_campaign.csv文件里)的例子,2016年1月、4月、6月、9月、10月、11月和12月没有促销期。
,1,20162016 Early Spring Campaign,2,20162016 Early Spring Campaign,3,2016,4,20162016 Spring Campaign,5,2016,6,20162016 Last Campaign,7,20162016 Last Campaign,8,2016,9,2016,10,2016,11,2016,12,2016下面的命令先把campaign_session字段置空,然后向month_dim表装载促销期数据。
use rds;load data local inpath '/root/ragged_campaign.csv' overwrite into table campaign_session;use dw;update month_dim set campaign_session = null;drop table if exists tmp;create table tmp as select t1.month_sk month_sk, t1.month month, t1.month_name month_name, case when t2.campaign_session != '' then t2.campaign_session else t1.month_name end campaign_session, t1.quarter quarter, t1.year year from month_dim t1 inner join rds.campaign_session t2 on t1.year = t2.year and t1.month = t2.month;delete from month_dim where month_dim.month_sk in (select month_sk from tmp);insert into month_dim select * from tmp;select year,month,campaign_session from dw.month_dim;查询结果如下图所示。
再次执行上面的层次查询语句,结果如下图所示。
在有促销期月份的路径,月级别行的汇总与促销期级别的行相同。而对于没有促销期的月份,其促销期级别的行与月级别的行相同。也就是说,在没有促销期级别的月份,月上卷了它们自己。例如,6月没有促销期,所以在输出看到了两个6月的行(第2行和第3行)。第3行是月份级别的行,第2行表示是没有促销期的行。对于没有促销期的月份,促销期行的销售订单金额(输出里的order_amount列)与月分行的相同。