HASH JOIN ,MERGE JOIN ,NESTED LOOP用法效率比较

时间:2022-11-20 16:48:39

概述:

NESTED LOOP:

对于被连接的数据子集较小的情况,嵌套循环连接是个较好的选择。在嵌套循环中,内表被外表驱动,外表返回的每一行都要在内表中检索找到与它匹配的行,因此整个查询返回的结果集不能太大(大于1 万不适合),要把返回子集较小表的作为外表(CBO 默认外表是驱动表),而且在内表的连接字段上一定要有索引。当然也可以用ORDERED 提示来改变CBO默认的驱动表,使用USE_NL(table_name1 table_name2)可是强制CBO 执行嵌套循环连接。

 

HASH JOIN :

散列连接是CBO 做大数据集连接时的常用方式,优化器使用两个表中较小的表(或数据源)利用连接键在内存中建立散列表,然后扫描较大的表并探测散列表,找出与散列表匹配的行。

这种方式适用于较小的表完全可以放于内存中的情况,这样总成本就是访问两个表的成本之和。但是在表很大的情况下并不能完全放入内存,这时优化器会将它分割成若干不同的分区,不能放入内存的部分就把该分区写入磁盘的临时段,此时要有较大的临时段从而尽量提高I/O 的性能。

也可以用USE_HASH(table_name1 table_name2)提示来强制使用散列连接。如果使用散列连接HASH_AREA_SIZE 初始化参数必须足够的大,如果是9i,Oracle建议使用SQL工作区自动管理,设置WORKAREA_SIZE_POLICY 为AUTO,然后调整PGA_AGGREGATE_TARGET 即可。

 

MERGE JOIN排序合并连接

通常情况下散列连接的效果都比排序合并连接要好,然而如果行源已经被排过序,在执行排序合并连接时不需要再排序了,这时排序合并连接的性能会优于散列连接。可以使用USE_MERGE(table_name1 table_name2)来强制使用排序合并连接

 

几种方式的操作方式

merge join

merge join的操作通常分三步:

1、对连接的每个表做table access full;

2、对table access full的结果进行排序。

3、进行merge join对排序结果进行合并。

在全表扫描比索引范围扫描再通过rowid进行表访问更可取的情况下,merge join会比nested loops性能更佳。当表特别小或特别巨大的时候,实行全表访问可能会比索引范围扫描更有效。mrege join的性能开销几乎都在前两步。

 

hash join

对两个表进行全表扫描,然后oracle读取涉及连接的其中一个表,并且在内存里创建来自表的连接列的唯一关键字的位图。当读取和处理第二个表的行时,创建值的位图被用做过滤器。如果一个行成功的通过位图过滤,则hash算法用于数据查找和后来的连接。(这里涉及数学问题,我也弄的不是很清楚)。

以下条件下hash join可能有优势:

两个巨大的表之间的连接。

在一个巨大的表和一个小表之间的连接。

 

Nested Loops

会循环外表(驱动表),逐个比对和内表的连接是否符合条件。在驱动表比较小,内表比较大,而且内外表的连接列有索引的时候比较好。当SORT_AREA空间不足的时候,Oracle也会选择使用NL。基于Cost的Oracle优化器(CBO)会自动选择较小的表做外表。

 

 

连接方式总结:

1))嵌套循环(nest loop):

          对于被连接的数据子集较小的情况,嵌套循环连接是较好的选择。在嵌套循环中,外表驱动内表,外表返回的每一行都要在内表中检索找到它匹配的行,因此整个查询返回的结果集不能太大(大于10000不合适),要把返回子集较小的表作为外表(驱动表),而且在内表的连接字段上一定要有索引。

2)哈希连接(hash join):

         哈希连接是大数据集连接时常用的方式,优化器使用两个表中较小的表,利用连接键在内存中建立散列表,然后扫描较大的表并探测散列表,找出与散列表匹配的行。

        这种方式适用于较小的表完全可以放入内存的情况,这样成本就是访问两个表的成本之和。但是在表很大的情况下并不能完全放入内存,这时优化器将它分割成若干不同的分区,不能放入内存的部分就把该分区写入磁盘的临时段。

        哈希连接只能应用于等值连接(如WHERE A.COL3 = B.COL4)、非等值连接(WHERE A.COL3 > B.COL4)、外连接(WHERE A.COL3 = B.COL4(+))。

3)排序合并连接(Sort Merge Join )

          通常情况下哈希连接的效果都比排序合并连接要好。然而如果行源已经被排过序,在执行排序合并连接时不需要再排序了,这时排序归并连接的性能会忧于哈希连接。

 

测试实验:

 

--创建测试表

SQL> Create Table t As Select * From dba_objects;
SQL> Insert /*+ append*/ t Select * From t;
SQL> Insert /*+ append*/ t Select * From t;
......
SQL> commit;

SQL> Create Table t1 As Select * From user_objects;

SQL> Create Table t2 As Select * From dba_objects;
SQL> Insert /*+ append*/ t2 Select * From t2;
......
SQL> commit;

SQL> select count(1) from t;

  COUNT(1)
----------
   3199936

SQL> select count(1) from t2;

  COUNT(1)
----------
    101220

SQL> select count(1) from t1;

  COUNT(1)
----------
        13
       
SQL> Create Index ind_id On t(object_id);
SQL> Create Index ind1_id On t1(object_id);
SQL> Create Index ind2_id On t2(object_id);


1.默认使用Nested loops的SQL语句的情况
--不加hints会默认使用Nested loops join
SQL> Select * From t,t1 Where t.object_id=t1.object_id;

576 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1308275697

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   833 |   208K|   862   (1)| 00:00:11 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |    64 |  6144 |    66   (0)| 00:00:01 |
|   2 |   NESTED LOOPS              |        |   833 |   208K|   862   (1)| 00:00:11 |
|   3 |    TABLE ACCESS FULL        | T1     |    13 |  2080 |     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN         | IND_ID |    64 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T"."OBJECT_ID"="T1"."OBJECT_ID")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
        687  consistent gets
          0  physical reads
       1484  redo size
      14877  bytes sent via SQL*Net to client
        910  bytes received via SQL*Net from client
         40  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        576  rows processed

--强制使用Hash Jion
SQL> Select /*+ use_hash(t1,t)*/ * From t1,t Where t.object_id=t1.object_id;

576 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1444793974

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   833 |   208K|  9931   (3)| 00:02:00 |
|*  1 |  HASH JOIN         |      |   833 |   208K|  9931   (3)| 00:02:00 |
|   2 |   TABLE ACCESS FULL| T1   |    13 |  2080 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T    |  3204K|   293M|  9890   (2)| 00:01:59 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T"."OBJECT_ID"="T1"."OBJECT_ID")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
      44124  consistent gets
          0  physical reads
        524  redo size
      71981  bytes sent via SQL*Net to client
        910  bytes received via SQL*Net from client
         40  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        576  rows processed


--强制使用Merge join
SQL> Select /*+ use_merge(t1,t)*/ * From t1,t Where t.object_id=t1.object_id;

576 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2036048135

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |   833 |   208K|       | 80548   (1)| 00:16:07 |
|   1 |  MERGE JOIN                  |         |   833 |   208K|       | 80548   (1)| 00:16:07 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1      |    13 |  2080 |       |     2   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN           | IND1_ID |    13 |       |       |     1   (0)| 00:00:01 |
|*  4 |   SORT JOIN                  |         |  3204K|   293M|   794M| 80546   (1)| 00:16:07 |
|   5 |    TABLE ACCESS FULL         | T       |  3204K|   293M|       |  9890   (2)| 00:01:59 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T"."OBJECT_ID"="T1"."OBJECT_ID")
       filter("T"."OBJECT_ID"="T1"."OBJECT_ID")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      44097  consistent gets
          0  physical reads
        524  redo size
      14899  bytes sent via SQL*Net to client
        910  bytes received via SQL*Net from client
         40  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        576  rows processed

结论:
逻辑读分别为:
Nested loops join 687
Hash Jion 44124
Merge join 44097
可以看到,大表t和小表t1关联使用Nested loops join关联性能会比较好。

2.默认使用Hash Jion的SQL语句的情况

--不加hints会默认使用Hash Jion
SQL> set timing on
SQL> Select  t.object_id From t,t2 Where t.object_id=t2.object_id;

6364032 rows selected.

Elapsed: 00:00:31.58

Execution Plan
----------------------------------------------------------
Plan hash value: 3237705646

-----------------------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |  5713K|    98M|       |  4442   (4)| 00:00:54 |
|*  1 |  HASH JOIN            |         |  5713K|    98M|  2224K|  4442   (4)| 00:00:54 |
|   2 |   INDEX FAST FULL SCAN| IND2_ID | 90908 |  1154K|       |    55   (4)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| IND_ID  |  3204K|    15M|       |  1596   (4)| 00:00:20 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
     430719  consistent gets
          0  physical reads
          0  redo size
  109700403  bytes sent via SQL*Net to client
    4667440  bytes received via SQL*Net from client
     424270  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
    6364032  rows processed

--强制使用Nested loops join
SQL> Select  /*+ use_nl(t,t2)*/ t.object_id From t,t2 Where t.object_id=t2.object_id;

6364032 rows selected.

Elapsed: 00:00:31.45

Execution Plan
----------------------------------------------------------
Plan hash value: 3551542035

---------------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |  5713K|    98M| 98323   (1)| 00:19:40 |
|   1 |  NESTED LOOPS         |         |  5713K|    98M| 98323   (1)| 00:19:40 |
|   2 |   INDEX FAST FULL SCAN| IND2_ID | 90908 |  1154K|    55   (4)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN    | IND_ID  |    63 |   315 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
     838760  consistent gets
          0  physical reads
          0  redo size
  109700403  bytes sent via SQL*Net to client
    4667440  bytes received via SQL*Net from client
     424270  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
    6364032  rows processed
   
--强制使用merge join
SQL> Select  /*+ use_merge(t,t2)*/ t.object_id From t,t2 Where t.object_id=t2.object_id;

6364032 rows selected.

Elapsed: 00:00:31.78

Execution Plan
----------------------------------------------------------
Plan hash value: 1712472574

------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |  5713K|    98M|       |  7661   (2)| 00:01:32 |
|   1 |  MERGE JOIN            |         |  5713K|    98M|       |  7661   (2)| 00:01:32 |
|   2 |   INDEX FULL SCAN      | IND_ID  |  3204K|    15M|       |  7137   (2)| 00:01:26 |
|*  3 |   SORT JOIN            |         | 90908 |  1154K|  3576K|   495   (3)| 00:00:06 |
|   4 |    INDEX FAST FULL SCAN| IND2_ID | 90908 |  1154K|       |    55   (4)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")
       filter("T"."OBJECT_ID"="T2"."OBJECT_ID")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
     430689  consistent gets
          0  physical reads
          0  redo size
  109700403  bytes sent via SQL*Net to client
    4667440  bytes received via SQL*Net from client
     424270  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
    6364032  rows processed
   
结论:
逻辑读分别为:
Nested loops join 838760
Hash Jion 430719
Merge join 430689
可以看到,大表t和大表t2关联使用Hash Jion和Merge join关联性能差不多,Nested loops join性能最差。

3.默认使用Merge Jion的SQL语句的情况
--不加hints会默认使用Merge Jion
SQL> Select  t.object_id From t2,(Select * From t Where owner='SYS' Order By object_id) t Where t.object_id=t2.object_id;

2910592 rows selected.

Elapsed: 00:00:15.57

Execution Plan
----------------------------------------------------------
Plan hash value: 2780935651

---------------------------------------------------------------------------------------
| Id  | Operation           | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |  2589K|    59M|       | 16553   (2)| 00:03:19 |
|   1 |  MERGE JOIN         |         |  2589K|    59M|       | 16553   (2)| 00:03:19 |
|   2 |   INDEX FULL SCAN   | IND2_ID | 90908 |  1154K|       |   240   (2)| 00:00:03 |
|*  3 |   SORT JOIN         |         |  1452K|    15M|    55M| 16300   (2)| 00:03:16 |
|*  4 |    TABLE ACCESS FULL| T       |  1452K|    15M|       |  9822   (2)| 00:01:58 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")
       filter("T"."OBJECT_ID"="T2"."OBJECT_ID")
   4 - filter("OWNER"='SYS')

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
      89745  consistent gets
          0  physical reads
        124  redo size
   50169586  bytes sent via SQL*Net to client
    2134921  bytes received via SQL*Net from client
     194041  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
    2910592  rows processed
   
--强制使用Nested join
SQL> Select  /*+ use_nl(t2,t)*/  t.object_id From t2,(Select * From t Where owner='SYS' Order By object_id) t Where t.object_id=t2.object_id;

2910592 rows selected.

Elapsed: 00:00:17.18

Execution Plan
----------------------------------------------------------
Plan hash value: 1858129701

---------------------------------------------------------------------------------------
| Id  | Operation           | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |  2589K|    59M|       |  1482K  (1)| 04:56:28 |
|   1 |  SORT ORDER BY      |         |  2589K|    59M|    79M|  1482K  (1)| 04:56:28 |
|   2 |   NESTED LOOPS      |         |  2589K|    59M|       |  1463K  (1)| 04:52:47 |
|*  3 |    TABLE ACCESS FULL| T       |  1452K|    15M|       |  9822   (2)| 00:01:58 |
|*  4 |    INDEX RANGE SCAN | IND2_ID |     2 |    26 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("OWNER"='SYS')
   4 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
    1523249  consistent gets
          0  physical reads
        568  redo size
   50169586  bytes sent via SQL*Net to client
    2134921  bytes received via SQL*Net from client
     194041  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
    2910592  rows processed

--强制使用Hash join
SQL> Select  /*+ use_hash(t2,t)*/  t.object_id From t2,(Select * From t Where owner='SYS' Order By object_id) t Where t.object_id=t2.object_id;

2910592 rows selected.

Elapsed: 00:00:16.34

Execution Plan
----------------------------------------------------------
Plan hash value: 2900034259

------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |  2589K|    59M|       | 30029   (2)| 00:06:01 |
|   1 |  SORT ORDER BY         |         |  2589K|    59M|    79M| 30029   (2)| 00:06:01 |
|*  2 |   HASH JOIN            |         |  2589K|    59M|  2224K| 11615   (2)| 00:02:20 |
|   3 |    INDEX FAST FULL SCAN| IND2_ID | 90908 |  1154K|       |    55   (4)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL   | T       |  1452K|    15M|       |  9822   (2)| 00:01:58 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")
   4 - filter("OWNER"='SYS')

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
      44373  consistent gets
          0  physical reads
        524  redo size
   50169586  bytes sent via SQL*Net to client
    2134921  bytes received via SQL*Net from client
     194041  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
    2910592  rows processed
   
结论:
逻辑读分别为:
Nested loops join 1523249
Hash Jion 44373
Merge join 89745
可以看到,T2在和t的排序的限定结果集关联时,执行计划使用了merge join,但此时执行效率最的还是hash join,因为此时同样是大表之间关联,采用hash join效果会更好一些。