今天将一个比较繁琐的Excel手动报表操作通过SQL的方式给自动化了,今后每月至少节省半小时的苦力,哈哈。
需求描述:每个Ukey对应多个report ID,每个report ID都有report type和report date这两个属性。现要求查询出每个Ukey的最新的report的report type和report date。
当前的做法:查询出有几十万行之多的数据集,在此基础上手动添加一些辅助列,然后借助pivot table查询每组的最大值,并且手动查询出最大日期(report date)所对应的report type. 还是那句老话,这种手动操作耗时耗力不说,很容易出错(事实上,今天发现之前的操作流程是存在瑕疵的)
新方法(SQL替代手动操作):
WITH CQDATA AS ( SELECT PID AS PID, SITE AS SITE, PID||'_'||SITE AS UKEY, REPORTTYPE AS REPORT_TYPE, PUD AS REPORT_DATE, PN AS REPORT_ID FROM WORK_TABLE WHERE PUD IS NOT NULL AND SITE IS NOT NULL AND REPORTTYPE IS NOT NULL ) SELECT A.PID, A.SITE, A.UKEY, A.REPORT_TYPE, A.REPORT_DATE FROM (SELECT ROW_NUMBER () OVER (PARTITION BY UKEY ORDER BY CQDATA.REPORT_DATE DESC) RN, CQDATA.* FROM CQDATA) A WHERE A.RN = 1 ORDER BY UKEY
在此对以下文章的帮助表示我衷心的感谢:
https://blog.csdn.net/it_taojingzhan/article/details/49994159
以下内容摘自Oracle Reference Guide
--------------------------------------------------------------------------------------------------------------
1. Syntax
2. Purpose
ROW_NUMBER is an analytic function. It assigns a unique number to each row to which it
is applied (either each row in the partition or each row returned by the query), in theordered sequence of rows specified in the order_by_clause, beginning with 1.
By nesting a subquery using ROW_NUMBER inside a query that retrieves the ROW_NUMBER
values for a specified range, you can find a precise subset of rows from the results of
the inner query. This use of the function lets you implement top-N, bottom-N, and
inner-N reporting. For consistent results, the query must ensure a deterministic sort
order.
You cannot nest analytic functions by using ROW_NUMBER or any other analytic function
for expr. However, you can use other built-in function expressions for expr. Refer to
"About SQL Expressions" on page 6-1 for information on valid forms of expr.
3. Examples
The following example finds the three highest paid employees in each department in
the hr.employees table. Fewer than three rows are returned for departments with
fewer than three employees.
SELECT DEPARTMENT_ID, FIRST_NAME, LAST_NAME, SALARY FROM (SELECT DEPARTMENT_ID, FIRST_NAME, LAST_NAME, SALARY, ROW_NUMBER () OVER (PARTITION BY DEPARTMENT_ID ORDER BY SALARY DESC) RN FROM EMPLOYEES) WHERE RN <= 3 ORDER BY DEPARTMENT_ID, SALARY DESC, LAST_NAME;