Oracle分析函数总结(3)- 数值分布 - cume_dist,percent_rank,ntile,percentile_disc,percentile_cont,ratio_to_report

分析函数的基本概念和语法 -> /fw0124/article/details/7842039

下面这些函数计算某个值在一组有序数据中的累计分布(cumulative distribution)

1) cume_dist()
计算结果为相对位置/总行数。返回值(0,1]。
例如在一个5行的组中，返回的累计分布值为0.2,0.4,0.6,0.8,1.0;
注意对于重复行，计算时取重复行中的最后一行的位置。

a)作为聚合函数的用法
语法：cume_dist(expr) with group (order by exp)
cume_dist()用参数中的指定的数据构造一条假定的数据并插入到现存行中，
然后计算这条假定数据在所有行中的相对位置。
例如下面的查询中,emp中总共有14行数据，
假定的数据(deptno=20,sal=4000)会插入到第9行，
因此相对位置 9/(14+1)=0.6。




tony@ORA11GR2> select cume_dist(20,4000)



  2      within group (order by deptno, sal) cume_dist



  3  from emp;



 



 CUME_DIST



----------



        .6

b)作为分析函数的用法
语法：cume_dist() over([partition_clause] order_by_clause)
例如计算每个人在本部门按照薪水排列中的相对位置。




tony@ORA11GR2> select ename, sal, deptno,



  2      cume_dist() over(partition by deptno order by sal) cume_dist



  3   from emp;



 



ENAME                       SAL     DEPTNO  CUME_DIST



-------------------- ---------- ---------- ----------



MILLER                     1300         10 .333333333




CLARK                      2450         10 .666666667




KING                       5002         10          1




SMITH                       800         20         .2




ADAMS                      1100         20         .4




JONES                      2975         20         .6




SCOTT                      3000         20          1




FORD                       3000         20          1




JAMES                       950         30 .166666667




MARTIN                     1250         30         .5




WARD                       1250         30         .5




TURNER                     1500         30 .666666667




ALLEN                      1600         30 .833333333




BLAKE                      2850         30          1




 



已选择14行。

2) percent_rank()
和cume_dist的不同点在于计算分布结果的方法。
计算方法为 (相对位置-1)/(总行数-1),
因此第一行的结果为0。返回值[0,1]。
例如在一个5行的组中，返回的累计分布值为0,0.25,0.5,0.75,1.0;
注意对于重复行，计算时取重复行中的第一行的位置。

a)作为聚合函数的用法
语法：percent_rank(expr) with group (order by exp)
下面的例子中,类似cume_dist，假定数据(deptno=20,sal=4000)会插入到第9行，
计算相对位置 (9-1)/((14+1)-1)=0.57




tony@ORA11GR2> select percent_rank(20,4000)



  2      within group (order by deptno, sal) percent_rank



  3  from emp;



 



PERCENT_RANK



------------



  .571428571

b)作为分析函数的用法
语法： percent_rank() over([partition_clause] order_by_clause)
例如计算每个人在本部门按照薪水排列中的相对位置。




tony@ORA11GR2> select ename, sal, deptno,



  2      percent_rank() over(partition by deptno order by sal) percent_rank



  3  from emp;



 



ENAME                       SAL     DEPTNO PERCENT_RANK



-------------------- ---------- ---------- ------------



MILLER                     1300         10            0




CLARK                      2450         10           .5




KING                       5002         10            1




SMITH                       800         20            0




ADAMS                      1100         20          .25




JONES                      2975         20           .5




SCOTT                      3000         20          .75




FORD                       3000         20          .75




JAMES                       950         30            0




MARTIN                     1250         30           .2




WARD                       1250         30           .2




TURNER                     1500         30           .6




ALLEN                      1600         30           .8




BLAKE                      2850         30            1




 



已选择14行。

3) percentile_disc()
返回一个与输入的分布百分比值相对应的数据值，
分布百分比的计算方法同函数CUME_DIST()，
如果没有正好对应的数据值，就取大于该分布值的下一个值。

a)作为聚合函数的用法
语法： percentile_disc(expr) with group (order by expr)
例如，查询各部门中薪水分布处于40%位置的人的薪水。




tony@ORA11GR2> select deptno,



  2       percentile_disc(0.4) within group (order by sal) percentile_disc



  3  from emp group by deptno;



 



    DEPTNO PERCENTILE_DISC



---------- ---------------



        10            2450




        20            1100




        30            1250

2)作为分析函数的用法
语法：percentile_disc(expr) with group (order by expr) over([partition_clause])
例如，查询各部门中薪水分布处于40%位置的人的薪水。注意和cume_dist()的输出结果作对照。




tony@ORA11GR2> select ename, sal, deptno,



  2  cume_dist() over(partition by deptno order by sal) cume_dist,



  3  percentile_disc(0.4) within group (order by sal)



  4      over(partition by deptno) percentile_disc



  5  from emp;



 



ENAME                       SAL     DEPTNO  CUME_DIST PERCENTILE_DISC



-------------------- ---------- ---------- ---------- ---------------



MILLER                     1300         10 .333333333            2450




CLARK                      2450         10 .666666667            2450




KING                       5002         10          1            2450




SMITH                       800         20         .2            1100




ADAMS                      1100         20         .4            1100




JONES                      2975         20         .6            1100




SCOTT                      3000         20          1            1100




FORD                       3000         20          1            1100




JAMES                       950         30 .166666667            1250




MARTIN                     1250         30         .5            1250




WARD                       1250         30         .5            1250




TURNER                     1500         30 .666666667            1250




ALLEN                      1600         30 .833333333            1250




BLAKE                      2850         30          1            1250




 



已选择14行。

4) percentile_cont()
返回一个与输入的分布百分比值相对应的数据值，分布百分比的计算方法同函数PERCENT_RANK()，
如果没有正好对应的数据值，就通过下面算法来得到值：
RN = 1+ (P*(N-1)) 其中P是输入的分布百分比值，N是组内的行数
CRN = CEIL(RN) FRN = FLOOR(RN)
If (CRN = FRN = RN) then the result is
    (value of expression from row at RN)
Otherwise the result is
    (CRN - RN) * (value of expression for row at FRN) +
    (RN - FRN) * (value of expression for row at CRN)
上面的算法比较难理解，实际上可以看作X轴上最小坐标0，最大坐标(行数-1),
先根据输入百分比找到对应的X轴坐标，然后按比例计算对应的数值。

a)作为聚合函数的用法
语法percentile_cont(expr) with group (order by expr)
下面的例子中,部门号30中薪水处于50%位置的数额计算方法：
RN=1+(0.5*(6-1))=3.5;CRN=4;FRN=3;
结果=(4-3.5)*1250+(3.5-3)*1500=1375




tony@ORA11GR2> select deptno,



  2  percentile_cont(0.5) within group (order by sal) percentile_cont



  3  from emp group by deptno;



 



    DEPTNO PERCENTILE_CONT



---------- ---------------



        10            2450




        20            2975




        30            1375

b)作为分析函数的用法
语法：percentile_cont(expr) with group (order by expr) over([partition_clause])
例如，计算各部门中薪水处于50%位置的薪水数额，注意和percent_rank()的输出结果作对照。




tony@ORA11GR2> select ename, sal, deptno,



  2  percent_rank() over(partition by deptno order by sal) percent_rank,



  3  percentile_cont(0.5) within group (order by sal)



  4      over(partition by deptno) percentile_cont



  5  from emp;



 



ENAME                       SAL     DEPTNO PERCENT_RANK PERCENTILE_CONT



-------------------- ---------- ---------- ------------ ---------------



MILLER                     1300         10            0            2450




CLARK                      2450         10           .5            2450




KING                       5002         10            1            2450




SMITH                       800         20            0            2975




ADAMS                      1100         20          .25            2975




JONES                      2975         20           .5            2975




SCOTT                      3000         20          .75            2975




FORD                       3000         20          .75            2975




JAMES                       950         30            0            1375




MARTIN                     1250         30           .2            1375




WARD                       1250         30           .2            1375




TURNER                     1500         30           .6            1375




ALLEN                      1600         30           .8            1375




BLAKE                      2850         30            1            1375




 



已选择14行。

5) ntile()
只能用作分析函数。
语法: ntile(expr) over([partition_clause] order_by_clause)
可以看作是把有序的数据集合平均分配到expr指定的数量的桶中,将桶号分配给每一行。
如果不能平均分配，则较小桶号的桶分配额外的行，并且各个桶中能放的行数最多相差1。
例如如果桶的数量为4，一共有6行，则：
1号桶:1行,2行;
2号桶:3行,4行;
3号桶:5行;
4号桶:6行;
例如，NTILE列返回每一行所应该被分配的桶号。




tony@ORA11GR2> select ename, sal, deptno,



  2      ntile(4) over(partition by deptno order by sal) ntile



  3   from emp;



 



ENAME                       SAL     DEPTNO      NTILE



-------------------- ---------- ---------- ----------



MILLER                     1300         10          1




CLARK                      2450         10          2




KING                       5002         10          3




SMITH                       800         20          1




ADAMS                      1100         20          1




JONES                      2975         20          2




SCOTT                      3000         20          3




FORD                       3000         20          4




JAMES                       950         30          1




MARTIN                     1250         30          1




WARD                       1250         30          2




TURNER                     1500         30          2




ALLEN                      1600         30          3




BLAKE                      2850         30          4




 



已选择14行。

6) ratio_to_report()
计算当前行的数值在分组中所有行的数值总和中所占的比例。
语法：ratio_to_report(expr) over([partition_clause])
例如，计算每个人的薪水在部门薪水总额中的比例。




tony@ORA11GR2> select ename, sal, deptno,



  2      sum(sal) over(partition by deptno) dept_sal_sum,



  3      ratio_to_report(sal) over(partition by deptno) ratio_to_report



  4   from emp;



 



ENAME                       SAL     DEPTNO DEPT_SAL_SUM RATIO_TO_REPORT



-------------------- ---------- ---------- ------------ ---------------



CLARK                      2450         10         8752      .279936015




KING                       5002         10         8752      .571526508




MILLER                     1300         10         8752      .148537477




JONES                      2975         20        10875      .273563218




FORD                       3000         20        10875      .275862069




ADAMS                      1100         20        10875      .101149425




SMITH                       800         20        10875      .073563218




SCOTT                      3000         20        10875      .275862069




WARD                       1250         30         9400      .132978723




TURNER                     1500         30         9400      .159574468




ALLEN                      1600         30         9400      .170212766




JAMES                       950         30         9400       .10106383




BLAKE                      2850         30         9400      .303191489




MARTIN                     1250         30         9400      .132978723




 



已选择14行。

秒客网

Oracle分析函数总结(3)- 数值分布 - cume_dist,percent_rank,ntile,percentile_disc,percentile_cont,ratio_to_report

相关文章