SQL中子查询为聚合函数时的优化

时间:2022-02-11 23:13:05
测试数据:
create table test1 as select * from dba_objects where rownum<=10000;--10000条记录
create table test2 as select * from dba_objects;--13438条记录
分析执行计划:
SQL1:
SQL> select *
2 from test
3 where object_id =
4 (select max(object_id)
5 from test1
6 where test1.object_name = test.object_name);
已选择10行。
已用时间:  00: 00: 00.07
执行计划
----------------------------------------------------------
Plan hash value: 2637409915
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 961 | 194K| 43 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | TEST | 10 | 2070 | 3 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 79 | | |
|* 4 | TABLE ACCESS FULL| TEST1 | 96 | 7584 | 40 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_ID"= (SELECT MAX("OBJECT_ID") FROM "TEST1"
"TEST1" WHERE "TEST1"."OBJECT_NAME"=:B1))
4 - filter("TEST1"."OBJECT_NAME"=:B1)
Note
-----
- dynamic sampling used for this statement (level=4)
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
1344 consistent gets
0 physical reads
0 redo size
1710 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed
SQL2:
SQL> select *
2 from test
3 where exists (select 1
4 from (select distinct object_name,
5 max(object_id) over(partition by test1.object_name) object_id
6 from test1) t
7 where t.object_name = test.object_name
8 and test.object_id = t.object_id);
已选择10行。
已用时间:  00: 00: 00.06
执行计划
----------------------------------------------------------
Plan hash value: 918945524
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 286 | | 405 (1)| 00:00:05 |
|* 1 | HASH JOIN SEMI | | 1 | 286 | | 405 (1)| 00:00:05 |
| 2 | TABLE ACCESS FULL | TEST | 10 | 2070 | | 3 (0)| 00:00:01 |
| 3 | VIEW | | 9606 | 741K| | 401 (1)| 00:00:05 |
| 4 | HASH UNIQUE | | 9606 | 741K| 848K| 401 (1)| 00:00:05 |
| 5 | WINDOW SORT | | 9606 | 741K| 848K| 401 (1)| 00:00:05 |
| 6 | TABLE ACCESS FULL| TEST1 | 9606 | 741K| | 40 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T"."OBJECT_NAME"="TEST"."OBJECT_NAME" AND
"TEST"."OBJECT_ID"="T"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement (level=4)
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
137 consistent gets
0 physical reads
0 redo size
1710 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
10 rows processed 从上面执行计划可以看出:
SQL1:filter会根据test返回行数决定过滤表test1访问次数,类似于nested loop(注意,第二个表总是全表扫描的哦);逻辑读也比较大1344.
SQL2:相当于将子查询作为一个”表“与test进行hash join,当然每个表只会访问一次。逻辑读为137。
当然,如果test表返回数据量很大,那么SQL1的效率问题会更明显。
这个就属于SQL书写的问题,需要谨慎小心。 将子查询作为一个“表”与主查询表test做join连接,当然,需要先改写max聚合函数为分析函数,如下:
select *
from test
join (select object_name, max(object_id) object_id
from test1
group by object_name) t
on test.object_name = t.object_name
and test.object_id = t.object_id;
与上面改写是等效的。