从一个SQL使用了不理想的执行计划说开,浅谈执行计划如何估算cache信息的影响及系统统计信息的收集等(2010-10-15)

   数据库信息:
   db:10.2.0.4.0 - 64bi
   os:Red Hat Enterprise Linux AS release 4 (Nahant Update 7)

数据库主机监控报警:连续10分钟每分钟的主机CPU平均使用率都超过了80%
因为数据库设置的是20分钟抓取一次性能快照,所以直接生成了包含这个时间段的20分钟间隔的awr报表.
其中:
CPU used by this session /per Second 1431.86
对于一个16核CPU的主机来说,20分钟内的CPU平均使用率达到了14.31/16,接近了90%,而这台数据库主机平常这个时间段的CPU使用率不到30%的,现在来说确实是偏高了.
Operating System Statistics中:
LOAD 15
NUM_CPUS 16
也印证了这一点.

查看Top 5 Timed Events,排在前两位,占据百分比明显高的是:
Event                        Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time                       16,409                 35.5
latch: cache buffers chains   62,362 11,783            189 25.5                Concurrency

这时我一般会特别关注高CPU消耗和高逻辑读消耗的SQL语句,而往往它们又是一致的(逻辑读偏高容易导致hash latch争用,逻辑读消耗CPU,hash latch争用导致的spin又导致CPU空转,导致CPU的高消耗)

在这里发现了一条以前没有出现在顶部的sql语句:
select *
    from (select tt.*, rownum no
            from (select p.id          as id,
                         p.image2      as img,
                         p.image3      as img3,
                         p.productname as name,
                         p.refprice    as price
                    from product p,catalogrelateproduct cp
                   where p.publishstatus = 3 and cp.catalogid = :1 and cp.productid = p.id
                     and bitand(p.status, :2) <> :3
                   order by p.priority desc, p.pageview desc) tt)
   where no >= :4
     and no <= :5;

它的逻辑读消耗占据了总的55%,% Total DB Time将近40%.
在20分钟时间里,它执行了5100次,平均每次的逻辑IO是38W多,平均CPU时间:1.07s
语句并不复杂,感觉这个逻辑IO太大了.

使用awrsqrpt.sql得到当时的执行计划如下:
Id Operation Name Rows Bytes TempSpc Cost (%CPU) Time
0 SELECT STATEMENT 6693 (100)
1     FILTER
2       VIEW 15132 5098K 6693 (1) 00:01:21
3         COUNT
4           VIEW 15132 4906K 6693 (1) 00:01:21
5             SORT ORDER BY 15132 2970K 6392K 6693 (1) 00:01:21
6               NESTED LOOPS 15132 2970K 6026 (1) 00:01:13
7                 TABLE ACCESS FULL PRODUCT 144K 26M 6020 (1) 00:01:13
8                 INDEX UNIQUE SCAN PK_CATALOGRELATEPRODUCT 1 10 1 (0) 00:00:01

显然这里估算返回14W行,却执行NL,可能是存在问题的.

查看dba_hist_sqlbind,发现那个时段相应的两组快照值:
select *
    from (select tt.*, rownum no
            from (select p.id          as id,
                         p.image2      as img,
                         p.image3      as img3,
                         p.productname as name,
                         p.refprice    as price
                    from product p,catalogrelateproduct cp
                   where p.publishstatus = 3 and cp.catalogid = 122 and cp.productid = p.id
                     and bitand(p.status, 8) <> 8
                   order by p.priority desc, p.pageview desc) tt)
   where no >= 521
     and no <= 540;

和

select *
    from (select tt.*, rownum no
            from (select p.id          as id,
                         p.image2      as img,
                         p.image3      as img3,
                         p.productname as name,
                         p.refprice    as price
                    from product p,catalogrelateproduct cp
                   where p.publishstatus = 3 and cp.catalogid = 1565 and cp.productid = p.id
                     and bitand(p.status, 8) <> 8
                   order by p.priority desc, p.pageview desc) tt)
   where no >= 161
     and no <= 180;

那么看看它们的执行计划是什么,到底是哪里的步骤导致了这么高的逻辑读:
SET PAGESIZE 200;
SET LINESIZE 200;

set serveroutput off;
COL PLAN_TABLE_OUTPUT FOR A195;

alter session set statistics_level=all;

select *
    from (select tt.*, rownum no
            from (select p.id          as id,
                         p.image2      as img,
                         p.image3      as img3,
                         p.productname as name,
                         p.refprice    as price
                    from product p,catalogrelateproduct cp
                   where p.publishstatus = 3 and cp.catalogid = 122 and cp.productid = p.id
                     and bitand(p.status, 8) <> 8
                   order by p.priority desc, p.pageview desc) tt)
   where no >= 521
     and no <= 540;

20 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(null,null,'ALL IOSTATS LAST'));

Plan hash value: 3306616323

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time       | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                  |                         |      1 | 20717 | 6979K|       | 7856   (1)| 00:01:35 |     20 |00:00:01.07 |     388K|
|   2 |   COUNT                |                         |      1 |        |       |       |            |          |   3430 |00:00:01.06 |     388K|
|   3 |    VIEW                |                         |      1 | 20717 | 6716K|       | 7856   (1)| 00:01:35 |   3430 |00:00:01.06 |     388K|
|   4 |     SORT ORDER BY      |                         |      1 | 20717 | 4410K| 9480K| 7856   (1)| 00:01:35 |   3430 |00:00:01.06 |     388K|
|   5 |      NESTED LOOPS      |                         |      1 | 20717 | 4410K|       | 6872   (1)| 00:01:23 |   3430 |00:00:01.05 |     388K|
|* 6 |       TABLE ACCESS FULL| PRODUCT                 |      1 |    174K|    34M|       | 6865   (1)| 00:01:23 |    174K|00:00:00.35 |   39206 |
|* 7 |       INDEX UNIQUE SCAN| PK_CATALOGRELATEPRODUCT |    174K|      1 |    10 |       |     1   (0)| 00:00:01 |   3430 |00:00:00.53 |     349K|
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("NO"<=540 AND "NO">=521))
   6 - filter((BITAND("P"."STATUS",8)<>8 AND "P"."PUBLISHSTATUS"=3))
   7 - access("CP"."CATALOGID"=122 AND "CP"."PRODUCTID"="P"."ID")

(其它信息这里就省略掉了)

使用这种方法的好处是可以很容易的查看到优化器统计信息是否准确,资源到底消耗在哪里了(当然这里只能看到逻辑IO的消耗,看不到CPU这些资源的消耗的),时间消耗在哪里了,当然这些信息在10046 trace文件也可以看到的

20 rows selected.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(null,0,'ALL IOSTATS LAST'));

Plan hash value: 4053098602

----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                        | Name                        | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                            |                             |      1 |     99 | 34155 |    52   (2)| 00:00:01 |     20 |00:00:00.02 |    4520 |
|   2 |   COUNT                          |                             |      1 |        |       |            |          |   1085 |00:00:00.02 |    4520 |
|   3 |    VIEW                          |                             |      1 |     99 | 32868 |    52   (2)| 00:00:01 |   1085 |00:00:00.02 |    4520 |
|   4 |     SORT ORDER BY                |                             |      1 |     99 | 21582 |    52   (2)| 00:00:01 |   1085 |00:00:00.02 |    4520 |
|   5 |      NESTED LOOPS                |                             |      1 |     99 | 21582 |    51   (0)| 00:00:01 |   1085 |00:00:00.02 |    4520 |
|* 6 |       INDEX RANGE SCAN           | INDEX2_CATALOGRELATEPRODUCT |      1 |     99 |   990 |     2   (0)| 00:00:01 |   2248 |00:00:00.01 |      12 |
|* 7 |       TABLE ACCESS BY INDEX ROWID| PRODUCT                     |   2248 |      1 |   208 |     1   (0)| 00:00:01 |   1085 |00:00:00.02 |    4508 |
|* 8 |        INDEX UNIQUE SCAN         | PK_PRODUCT                  |   2248 |      1 |       |     1   (0)| 00:00:01 |   2248 |00:00:00.01 |    2250 |
----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("NO"<=180 AND "NO">=161))
   6 - access("CP"."CATALOGID"=1565)
   7 - filter((BITAND("P"."STATUS",8)<>8 AND "P"."PUBLISHSTATUS"=3))
   8 - access("CP"."PRODUCTID"="P"."ID")

很明显,它们的执行计划是不一样的,cp.catalogid = 122 的执行计划和使用绑定变量的SQL语句实际的执行计划是一致的,而cp.catalogid = 1565的执行计划和实际的执行计划是不一样的.

很容易猜想到:cp.catalogid上存在柱状图统计信息,使用绑定变量时的peeking机制,导致了执行计划的不稳定性.
因为各种原因导致需要重新硬分析的时候,peeking的刚好是122这类的值,所以导致了这样的执行计划.看看是不是这样:
select column_name,histogram from user_tab_columns where table_name=upper('catalogrelateproduct');

COLUMN_NAME                    HISTOGRAM
------------------------------ ---------------
CATALOGID                      HEIGHT BALANCED
CATALOGNAME                    NONE
PRODUCTID                      NONE
PRODUCTNAME                    NONE
PRIORITY                       NONE
MAINBRANDSTATUS                NONE

确实,CATALOGID这个列上存在高等直方图,所以导致了现在的问题.

解决这个问题的最简单的方式就是:cp.catalogid = :1 这里不再使用绑定变量,而使用字面值.

先不去管这个问题,而来看看执行计划的优劣问题:
从上面的实际统计信息来看:cp.catalogid = 122 的执行计划的逻辑IO是388K,和使用绑定变量的实际的平均逻辑IO是一致的.而cp.catalogid = 1565 的逻辑IO只有4520,和平均值相差很远.
那假如cp.catalogid = 1565 也使用了实际的执行计划,它的逻辑IO会是多少呢?也会是38W左右吗?
使用hint来改变一下执行计划看看:
select *
      from (select tt.*, rownum no
              from (select /*+ ordered full(p) use_nl(p cp)*/p.id          as id,
                           p.image2      as img,
                           p.image3      as img3,
                           p.productname as name,
                            p.refprice    as price
                      from product p,catalogrelateproduct cp
                     where p.publishstatus = 3 and cp.catalogid = 1565 and cp.productid = p.id
                       and bitand(p.status, 8) <> 8
                     order by p.priority desc, p.pageview desc) tt)
     where no >= 161
       and no <= 180;

--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                  |                         |      1 |     99 | 34155 | 6873   (1)| 00:01:23 |     20 |00:00:00.95 |     389K|
|   2 |   COUNT                |                         |      1 |        |       |            |          |   1085 |00:00:00.95 |     389K|
|   3 |    VIEW                |                         |      1 |     99 | 32868 | 6873   (1)| 00:01:23 |   1085 |00:00:00.95 |     389K|
|   4 |     SORT ORDER BY      |                         |      1 |     99 | 21582 | 6873   (1)| 00:01:23 |   1085 |00:00:00.95 |     389K|
|   5 |      NESTED LOOPS      |                         |      1 |     99 | 21582 | 6872   (1)| 00:01:23 |   1085 |00:00:00.95 |     389K|
|* 6 |       TABLE ACCESS FULL| PRODUCT                 |      1 |    174K|    34M| 6865   (1)| 00:01:23 |    174K|00:00:00.35 |   39206 |
|* 7 |       INDEX UNIQUE SCAN| PK_CATALOGRELATEPRODUCT |    174K|      1 |    10 |     1   (0)| 00:00:01 |   1085 |00:00:00.44 |     349K|
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("NO"<=180 AND "NO">=161))
   6 - filter((BITAND("P"."STATUS",8)<>8 AND "P"."PUBLISHSTATUS"=3))
   7 - access("CP"."CATALOGID"=1565 AND "CP"."PRODUCTID"="P"."ID")

可以看到使用同样的执行计划后,它的逻辑IO也变为了38W左右.
为什么是这样呢?因为它们都是先对product p应用filter((BITAND("P"."STATUS",8)<>8 AND "P"."PUBLISHSTATUS"=3))这样的过滤条件,这样都返回了17.4W行数据.
然后对返回的每一行数据都应用access("CP"."CATALOGID"=1565 AND "CP"."PRODUCTID"="P"."ID")这样的access predicates( index(CATALOGID, PRODUCTID)是主键索引).
也就是说access("CP"."CATALOGID"=1565 AND "CP"."PRODUCTID"="P"."ID")这样的access predicates都执行了17.4W次(从starts统计项可以知道这点).
所以它们的逻辑IO相差是很小的.
所以使用绑定变量的SQL平均的逻辑IO是38W多.

那cp.catalogid = 122 使用cp.catalogid = 1565 的执行计划呢?它的逻辑IO会是多大呢?
同样使用提示来完成这个工作:

select *
    from (select tt.*, rownum no
            from (select /*+ index(p) */p.id          as id,
                         p.image2      as img,
                         p.image3      as img3,
                         p.productname as name,
                         p.refprice    as price
                    from product p,catalogrelateproduct cp
                   where p.publishstatus = 3 and cp.catalogid = 122 and cp.productid = p.id
                     and bitand(p.status, 8) <> 8
                   order by p.priority desc, p.pageview desc) tt)
   where no >= 521
     and no <= 540;

--------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                        | Name                    | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                            |                         |      1 | 20717 | 6979K|       | 11369   (1)| 00:02:17 |     20 |00:00:00.22 |   39176 |
|   2 |   COUNT                          |                         |      1 |        |       |       |            |          |   3435 |00:00:00.22 |   39176 |
|   3 |    VIEW                          |                         |      1 | 20717 | 6716K|       | 11369   (1)| 00:02:17 |   3435 |00:00:00.22 |   39176 |
|   4 |     SORT ORDER BY                |                         |      1 | 20717 | 4410K| 9768K| 11369   (1)| 00:02:17 |   3435 |00:00:00.21 |   39176 |
|   5 |      NESTED LOOPS                |                         |      1 | 20717 | 4410K|       | 10385   (1)| 00:02:05 |   3435 |00:00:00.21 |   39176 |
|* 6 |       INDEX RANGE SCAN           | PK_CATALOGRELATEPRODUCT |      1 | 20687 |   202K|       |    35   (0)| 00:00:01 | 19529 |00:00:00.01 |          62 |
|* 7 |       TABLE ACCESS BY INDEX ROWID| PRODUCT                 | 19529 |      1 |   208 |       |     1   (0)| 00:00:01 |   3435 |00:00:00.18 |   39114 |
|* 8 |        INDEX UNIQUE SCAN         | PK_PRODUCT              | 19529 |      1 |       |       |     1   (0)| 00:00:01 | 19529 |00:00:00.07 |   19531 |
--------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("NO"<=540 AND "NO">=521))
   6 - access("CP"."CATALOGID"=122)
   7 - filter((BITAND("P"."STATUS",8)<>8 AND "P"."PUBLISHSTATUS"=3))
   8 - access("CP"."PRODUCTID"="P"."ID")

它的逻辑IO是3.9W,比实际的38W多小了很多.

我看到实际的执行计划的时候,估算返回14W行,却执行NL,感觉可能就是存在问题的,那么走hash join呢?

select *
    from (select tt.*, rownum no
            from (select /*+ use_hash(p cp)*/p.id          as id,
                         p.image2      as img,
                         p.image3      as img3,
                         p.productname as name,
                         p.refprice    as price
                    from product p,catalogrelateproduct cp
                   where p.publishstatus = 3 and cp.catalogid = 122 and cp.productid = p.id
                     and bitand(p.status, 8) <> 8
                   order by p.priority desc, p.pageview desc) tt)
   where no >= 521
     and no <= 540;

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time       | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                  |                         |      1 | 20717 | 6979K|       | 7885   (1)| 00:01:35 |     20 |00:00:00.67 |   39268 |
|   2 |   COUNT                |                         |      1 |        |       |       |            |          |   3434 |00:00:00.67 |   39268 |
|   3 |    VIEW                |                         |      1 | 20717 | 6716K|       | 7885   (1)| 00:01:35 |   3434 |00:00:00.66 |   39268 |
|   4 |     SORT ORDER BY      |                         |      1 | 20717 | 4410K| 9768K| 7885   (1)| 00:01:35 |   3434 |00:00:00.66 |   39268 |
|* 5 |      HASH JOIN         |                         |      1 | 20717 | 4410K|       | 6901   (1)| 00:01:23 |   3434 |00:00:00.65 |   39268 |
|* 6 |       INDEX RANGE SCAN | PK_CATALOGRELATEPRODUCT |      1 | 20687 |   202K|       |    35   (0)| 00:00:01 | 19528 |00:00:00.01 | 62 |
|* 7 |       TABLE ACCESS FULL| PRODUCT                 |      1 |    174K|    34M|       | 6865   (1)| 00:01:23 |    174K|00:00:00.35 |   39206 |
----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("NO"<=540 AND "NO">=521))
   5 - access("CP"."PRODUCTID"="P"."ID")
   6 - access("CP"."CATALOGID"=122)
   7 - filter((BITAND("P"."STATUS",8)<>8 AND "P"."PUBLISHSTATUS"=3))

它的逻辑IO也达到了3.9W,cp.catalogid = 1565也不用实验了,也会是3.9W多,因为导致主要的逻辑读消耗的步骤table access full product的3.9W的逻辑IO是不会变的.
所以使用hash join之后,不管绑定变量值是多少,它的逻辑IO一定是3.9W多的.
而且使用hash join之后,除了逻辑读之外hash join也是消耗CPU的,它的CPU时间达到了0.4S.而使用nl时的CPU时间主要是由逻辑读造成的,逻辑读最大的cp.catalogid = 122的CPU时间只有0.1S多.

现在看来,从逻辑IO和CPU时间来考量的话,cp.catalogid = 1565 不使用任何提示的执行计划对于cp.catalogid = :1 是最优的,
无论绑定变量值是122还是1565,而且catalogrelateproduct.catalogid分布是不均匀的,cp.catalogid = :1 返回的行数决定了下面操作
|* 7 | TABLE ACCESS BY INDEX ROWID| PRODUCT
|* 8 | INDEX UNIQUE SCAN | PK_PRODUCT
的执行次数,返回的越多,执行次数越大,逻辑IO越大,执行次数越少,逻辑IO越小.
而cp.catalogid =122是返回行数最多的,也就是它的逻辑IO最多就是3.9W,实际执行时因为绑定变量值很多时候会小于这个最大值的.

所以现在把cp.catalogid = :1 这里不再使用绑定变量,而使用字面值的方案给抛弃掉,而仍然使用绑定变量,这种情况下,如何使用上面的执行计划呢?
既然oracle认为cp.catalogid = :1 返回行数少时就选择了你想要的执行计划,那就让它认为这个条件返回很少的数据行.

select count(1) total_prduct,
count(distinct cataLOGID) total_catalog,
trunc(count(1)/count(distinct cataLOGID)) prodcut_per_catalog
FROM catalogrelateproduct;

TOTAL_PRDUCT TOTAL_CATALOG PRODCUT_PER_CATALOG
------------ ------------- -------------------
1049936 19327 54

prodcut_per_catalog=54,就是说如果catalogrelateproduct.catalogid上不收集柱状图统计信息的话,那么优化器会认为数据是均匀分布的,那么它就会认为cp.catalogid = :1 就是返回了54行,它就会选择你想要的执行计划.
下面来看看:

select *
      from (select tt.*, rownum no
              from (select /*+ cardinality(cp 60)*/p.id          as id,
                           p.image2      as img,
                           p.image3      as img3,
                           p.productname as name,
                           p.refprice    as price
                      from product p,catalogrelateproduct cp
                     where p.publishstatus = 3 and cp.catalogid = 122 and cp.productid = p.id
                       and bitand(p.status, 8) <> 8
                     order by p.priority desc, p.pageview desc) tt)
     where no >= 521
       and no <= 540;

我这里使用cardinality提示,告诉优化器对应cp.catalogid = 122 的只有很少的数据行,比如说这里告诉它是60个产品.

------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                        | Name                    | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                            |                         |      1 |     60 | 20700 |    66   (2)| 00:00:01 |     20 |00:00:00.16 |   39208 |
|   2 |   COUNT                          |                         |      1 |        |       |            |          |   3451 |00:00:00.15 |   39208 |
|   3 |    VIEW                          |                         |      1 |     60 | 19920 |    66   (2)| 00:00:01 |   3451 |00:00:00.15 |   39208 |
|   4 |     SORT ORDER BY                |                         |      1 |     60 | 13080 |    66   (2)| 00:00:01 |   3451 |00:00:00.15 |   39208 |
|   5 |      NESTED LOOPS                |                         |      1 |     60 | 13080 |    65   (0)| 00:00:01 |   3451 |00:00:00.15 |   39208 |
|* 6 |       INDEX RANGE SCAN           | PK_CATALOGRELATEPRODUCT |      1 |     60 |   600 |    35   (0)| 00:00:01 | 19545 |00:00:00.01 |          62 |
|* 7 |       TABLE ACCESS BY INDEX ROWID| PRODUCT                 | 19545 |      1 |   208 |     1   (0)| 00:00:01 |   3451 |00:00:00.13 |   39146 |
|* 8 |        INDEX UNIQUE SCAN         | PK_PRODUCT              | 19545 |      1 |       |     1   (0)| 00:00:01 | 19545 |00:00:00.05 |   19547 |
------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("NO"<=540 AND "NO">=521))
   6 - access("CP"."CATALOGID"=122)
   7 - filter((BITAND("P"."STATUS",8)<>8 AND "P"."PUBLISHSTATUS"=3))
   8 - access("CP"."PRODUCTID"="P"."ID")

它果然选择了我们想要的执行计划.

所以一种解决方案就是,仍然使用绑定变量,但加上/*+ cardinality(cp n)*/ n是一个很小的常数,比如说是1，10等.

另外一种解决方案就是,仍然使用绑定变量,但catalogrelateproduct.catalogid上不收集柱状图统计信息,这时优化器会认为数据是均匀分布的,那么它就会认为cp.catalogid = :1 就是返回了54行,
它也会选择我们想要的执行计划的.
但因为这个柱状图统计信息对于其它的sql来说确实是需要的,所以还是必须得收集高等直方图,所以也必须放弃这个方案.

当然,还有一种解决方案就是使用绑定变量的同时,使用/*+ index(p) */这样的提示来让它走我们想要的执行计划.
不过,要加提示的话,我更倾向于添加/*+ cardinality(cp 10)*/这样给优化器提供必要的信息的提示,但具体的访问路径,连接方式等还是由优化器自己来选择,而不使用/*+ index(p) */这样替优化器做决定的方式.

现在这个问题就算解决了.
当然,实际上问题还远没有结束.
cp.catalogid = 122 时,优化器为什么选择了一个不理想的执行计划呢?
继续讨论.
现在开始另一个问题,就是cp.catalogid = 122 时,优化器为什么选择了一个不理想的执行计划呢?这其实也是导致原来的SQL语句执行计划出现问题的一个很重要的原因.

select *
    from (select tt.*, rownum no
            from (select p.id          as id,
                         p.image2      as img,
                         p.image3      as img3,
                         p.productname as name,
                         p.refprice    as price
                    from product p,catalogrelateproduct cp
                   where p.publishstatus = 3 and cp.catalogid = 122 and cp.productid = p.id
                     and bitand(p.status, 8) <> 8
                   order by p.priority desc, p.pageview desc) tt)
   where no >= 521
     and no <= 540;

看来只能使用10053事件跟踪一下COST的计算了:

alter session set events '10053 trace name context forever,level 1';

select *
      from (select tt.*, rownum no
              from (select p1.id          as id,
                           p1.image2      as img,
                           p1.image3      as img3,
                           p1.productname as name,
                           p1.refprice    as price
                      from product p1,catalogrelateproduct cp1
                     where p1.publishstatus = 3 and cp1.catalogid = 122 and cp1.productid = p1.id
                       and bitand(p1.status, 8) <> 8
                     order by p1.priority desc, p1.pageview desc) tt)
     where no >= 521
       and no <= 540;

select *
      from (select tt.*, rownum no
              from (select /*+ use_hash(p1 cp1)*/p1.id          as id,
                           p1.image2      as img,
                           p1.image3      as img3,
                           p1.productname as name,
                           p1.refprice    as price
                      from product p1,catalogrelateproduct cp1
                     where p1.publishstatus = 3 and cp1.catalogid = 122 and cp1.productid = p1.id
                       and bitand(p1.status, 8) <> 8
                     order by p1.priority desc, p1.pageview desc) tt)
     where no >= 521
       and no <= 540;

select *
      from (select tt.*, rownum no
              from (select /*+ index(p1) */p1.id          as id,
                           p1.image2      as img,
                           p1.image3      as img3,
                           p1.productname as name,
                           p1.refprice    as price
                      from product p1,catalogrelateproduct cp1
                     where p1.publishstatus = 3 and cp1.catalogid = 122 and cp1.productid = p1.id
                       and bitand(p1.status, 8) <> 8
                     order by p1.priority desc, p1.pageview desc) tt)
     where no >= 521
       and no <= 540;

这里使用p1,cp1,而没有使用原来的p,cp的目的是:生成全新的SQL语句,使得这样的SQL语句必须硬分析,这样才能生成10053事件的跟踪文件.

10053事件生成跟踪文件的前提: cbo + 这个语句必须硬分析(游标在library cache中已经存在,不需要硬分析的不生成跟踪文件)

查看跟踪文件时,发现:
PARAMETERS WITH ALTERED VALUES
******************************
_pga_max_size                       = 491520 KB
optimizer_index_cost_adj            = 50
optimizer_index_caching             = 60

*****************************
SYSTEM STATISTICS INFORMATION
*****************************
Using NOWORKLOAD Stats
CPUSPEED: 1251 millions instruction/sec
IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
IOSEEKTIM: 10 milliseconds (default is 10)
***************************************
也就是说没有收集系统统计信息,而使用了默认值.
而optimizer_index_cost_adj和optimizer_index_caching是改变了的,没有使用默认值100和0.

再往下看:
NL Join
Outer table: Card: 176160.33 Cost: 6864.98 Resp: 6864.98 Degree: 1 Bytes: 208
Inner table: CATALOGRELATEPRODUCT Alias: CP1
Access Path: TableScan

OPTIMIZER PERCENT INDEX CACHING = 60
Access Path: index (UniqueScan)
    Index: PK_CATALOGRELATEPRODUCT
    resc_io: 0.00 resc_cpu: 1050
    ix_sel: 9.5159e-07 ix_sel_with_filters: 9.5159e-07
    NL Join: Cost: 6872.14 Resp: 6872.14 Degree: 1
      Cost_io: 6824.00 Cost_cpu: 722636557
      Resp_io: 6824.00 Resp_cpu: 722636557

Best NL cost: 6872.14
resc: 6872.14 resc_io: 6824.00 resc_cpu: 722636557
resp: 6872.14 resp_io: 6824.00 resp_cpu: 722636557

发现这里的INDEX UNIQUE SCAN| PK_CATALOGRELATEPRODUCT 的代价不是上面执行计划里显示的1,而是0.(实际上1和0在cbo里很多时候就是可以互相替代的)
所以导致内表访问的IO代价这一块变成了0,
实际*问内表增加的cost=6872.14-6864.98=7.16
我感觉这一部分是内表访问的CPU代价：
176160.33*1050/1251000000*1000/(10+8192/4096)=12.32 (这里8192是db_block_size)
虽然不一样,但感觉不是太不靠谱.
考虑到下面的optimizer_index_cost_adj= 50,如果索引访问的cpu代价也乘上这个百分比的话,
12.32*50/100=6.16,再ceil一下的话,就刚好是7了.呵呵,这里纯属瞎蒙了.就先不做研究了.

那为什么这里的内表索引访问的代价变成0了呢?

是不是optimizer_index_cost_adj= 50,optimizer_index_caching= 60 影响了执行计划的COST计算,导致了问题呢?
改回默认值看看:

show parameter index

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_index_caching              integer     60
optimizer_index_cost_adj             integer     50
skip_unusable_indexes                boolean     TRUE

alter session set optimizer_index_caching=0;
alter session set optimizer_index_cost_adj=100;

改回默认值后,再看看

select *
        from (select tt.*, rownum no
                from (select p1.id          as id,
                             p1.image2      as img,
                             p1.image3      as img3,
                             p1.productname as name,
                             p1.refprice    as price
                        from product p1,catalogrelateproduct cp1
                       where p1.publishstatus = 3 and cp1.catalogid = 122 and cp1.productid = p1.id
                         and bitand(p1.status, 8) <> 8
                       order by p1.priority desc, p1.pageview desc) tt)
       where no >= 521
         and no <= 540;

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time       | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                  |                         |      1 | 20774 | 6999K|       | 7948   (1)| 00:01:36 |     20 |00:00:01.11 |   39397 |
|   2 |   COUNT                |                         |      1 |        |       |       |            |          |   3512 |00:00:01.10 |   39397 |
|   3 |    VIEW                |                         |      1 | 20774 | 6735K|       | 7948   (1)| 00:01:36 |   3512 |00:00:01.10 |   39397 |
|   4 |     SORT ORDER BY      |                         |      1 | 20774 | 4442K|     9M| 7948   (1)| 00:01:36 |   3512 |00:00:01.09 |   39397 |
|* 5 |      HASH JOIN         |                         |      1 | 20774 | 4442K|       | 6958   (1)| 00:01:24 |   3512 |00:00:01.08 |   39397 |
|* 6 |       INDEX RANGE SCAN | PK_CATALOGRELATEPRODUCT |      1 | 20687 |   202K|       |    69   (0)| 00:00:01 | 19627 |00:00:00.02 | 63 |
|* 7 |       TABLE ACCESS FULL| PRODUCT                 |      1 |    174K|    34M|       | 6887   (1)| 00:01:23 |    175K|00:00:00.35 |   39334 |
----------------------------------------------------------------------------------------------------------------------------------------------------

发现执行计划变了,逻辑IO变成了3.9W,而不是原来的38W多了(虽然说它这里使用了hash join,而不是我想要的nl吧!)

原来的执行计划呢?

select *
        from (select tt.*, rownum no
                from (select /*+ full(p1) ordered use_nl(p1 cp1)*/p1.id          as id,
                             p1.image2      as img,
                             p1.image3      as img3,
                             p1.productname as name,
                             p1.refprice    as price
                        from product p1,catalogrelateproduct cp1
                       where p1.publishstatus = 3 and cp1.catalogid = 122 and cp1.productid = p1.id
                         and bitand(p1.status, 8) <> 8
                       order by p1.priority desc, p1.pageview desc) tt)
       where no >= 521
         and no <= 540;

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time       | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                  |                         |      1 | 20774 | 6999K|       |   182K (1)| 00:36:35 |     20 |00:00:01.77 |     391K|
|   2 |   COUNT                |                         |      1 |        |       |       |            |          |   3512 |00:00:01.77 |     391K|
|   3 |    VIEW                |                         |      1 | 20774 | 6735K|       |   182K (1)| 00:36:35 |   3512 |00:00:01.76 |     391K|
|   4 |     SORT ORDER BY      |                         |      1 | 20774 | 4442K| 9512K|   182K (1)| 00:36:35 |   3512 |00:00:01.75 |     391K|
|   5 |      NESTED LOOPS      |                         |      1 | 20774 | 4442K|       |   181K (1)| 00:36:23 |   3512 |00:00:01.74 |     391K|
|* 6 |       TABLE ACCESS FULL| PRODUCT                 |      1 |    174K|    34M|       | 6887   (1)| 00:01:23 |    175K|00:00:00.53 |   39334 |
|* 7 |       INDEX UNIQUE SCAN| PK_CATALOGRELATEPRODUCT |    175K|      1 |    10 |       |     1   (0)| 00:00:01 |   3512 |00:00:00.78 |     351K|
----------------------------------------------------------------------------------------------------------------------------------------------------

原来的执行计划,逻辑IO大.
cost也很大了,所以选择时被抛弃了，这确实是我想要的

select *
        from (select tt.*, rownum no
                from (select /*+ use_hash(p1,cp1) */p1.id          as id,
                             p1.image2      as img,
                             p1.image3      as img3,
                             p1.productname as name,
                             p1.refprice    as price
                        from product p1,catalogrelateproduct cp1
                       where p1.publishstatus = 3 and cp1.catalogid = 122 and cp1.productid = p1.id
                         and bitand(p1.status, 8) <> 8
                       order by p1.priority desc, p1.pageview desc) tt)
       where no >= 521
         and no <= 540;

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time       | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                  |                         |      1 | 20774 | 6999K|       | 7948   (1)| 00:01:36 |     20 |00:00:01.14 |   39397 |
|   2 |   COUNT                |                         |      1 |        |       |       |            |          |   3512 |00:00:01.13 |   39397 |
|   3 |    VIEW                |                         |      1 | 20774 | 6735K|       | 7948   (1)| 00:01:36 |   3512 |00:00:01.12 |   39397 |
|   4 |     SORT ORDER BY      |                         |      1 | 20774 | 4442K|     9M| 7948   (1)| 00:01:36 |   3512 |00:00:01.12 |   39397 |
|* 5 |      HASH JOIN         |                         |      1 | 20774 | 4442K|       | 6958   (1)| 00:01:24 |   3512 |00:00:01.11 |   39397 |
|* 6 |       INDEX RANGE SCAN | PK_CATALOGRELATEPRODUCT |      1 | 20687 |   202K|       |    69   (0)| 00:00:01 | 19627 |00:00:00.02 | 63 |
|* 7 |       TABLE ACCESS FULL| PRODUCT                 |      1 |    174K|    34M|       | 6887   (1)| 00:01:23 |    175K|00:00:00.35 |   39334 |
----------------------------------------------------------------------------------------------------------------------------------------------------

这就是未使用hint的时候优化器选择的执行计划,cost是最小的,所以被选择了
使用了hash join

select *
        from (select tt.*, rownum no
                from (select /*+ index(p1) */p1.id          as id,
                             p1.image2      as img,
                             p1.image3      as img3,
                             p1.productname as name,
                             p1.refprice    as price
                        from product p1,catalogrelateproduct cp1
                       where p1.publishstatus = 3 and cp1.catalogid = 122 and cp1.productid = p1.id
                         and bitand(p1.status, 8) <> 8
                       order by p1.priority desc, p1.pageview desc) tt)
       where no >= 521
         and no <= 540;

--------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                        | Name                    | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                            |                         |      1 | 20774 | 6999K|       | 21761   (1)| 00:04:22 |     20 |00:00:00.41 |   39376 |
|   2 |   COUNT                          |                         |      1 |        |       |       |            |          |   3512 |00:00:00.40 |   39376 |
|   3 |    VIEW                          |                         |      1 | 20774 | 6735K|       | 21761   (1)| 00:04:22 |   3512 |00:00:00.40 |   39376 |
|   4 |     SORT ORDER BY                |                         |      1 | 20774 | 4442K|     9M| 21761   (1)| 00:04:22 |   3512 |00:00:00.39 |   39376 |
|   5 |      NESTED LOOPS                |                         |      1 | 20774 | 4442K|       | 20771   (1)| 00:04:10 |   3512 |00:00:00.38 |   39376 |
|* 6 |       INDEX RANGE SCAN           | PK_CATALOGRELATEPRODUCT |      1 | 20687 |   202K|       |    69   (0)| 00:00:01 | 19628 |00:00:00.02 |          63 |
|* 7 |       TABLE ACCESS BY INDEX ROWID| PRODUCT                 | 19628 |      1 |   209 |       |     1   (0)| 00:00:01 |   3512 |00:00:00.30 |   39313 |
|* 8 |        INDEX UNIQUE SCAN         | PK_PRODUCT              | 19628 |      1 |       |       |     0   (0)|          | 19628 |00:00:00.12 |   19630 |
--------------------------------------------------------------------------------------------------------------------------------------------------------------

这是我们想要的执行计划,但这里计算出来的cost还是偏高,所以没有被选上

这里先来说说optimizer_index_caching和optimizer_index_cost_adj两个参数的用途:
Jonathan Lewis 在他的《Cost-Based Oracle Fundamentals》中大致是这样解释这两个参数的(当然,这里写的是我个人的理解,不一定正确)

在8I传统的计算COST的代价模型中,因为它只计算物理IO的代价,它的一个缺陷是假定每次逻辑读都导致一个物理读,也就是说每次读取的数据块以前都没有读取过,都不在BUFFER CACHE中。
但实际上呢?并不是每个索引访问都导致物理读的.很多时候,尤其是在OLTP系统中,有些索引块会被反复多次的读取(一个索引块在同一个SQL中会被反复多次读取,在不同的SQL间被多次读取,读取的时间间隔很短等),实际上很多时候是不导致物理读,只是逻辑读的.
为了解决这个问题,8I引入了两个参数:optimizer_index_cost_adj 和 optimizer_index_caching
不过,optimizer_index_caching这个参数是在in列表和nl的内表使用索引访问的时候用到的,对于简单的单表的索引扫描不起作用的.

一般来说,在OLTP系统中不建议使用默认值,而应该进行调整的.
但其实它还存在着一个隐含的前提假定,那就是TABLE ACCESS FULL都要导致物理读的,因为全表扫描的块默认是要放到LRU链表的冷端的,是很容易被age out出buffer cache的,下次要读取的时候是不在buffer cache中,需要物理读的.
默认的代价模型认为索引访问也是都要导致物理读的,而实际上不是这样的,所以引入了optimizer_index_caching这个参数,通过调整它,优化器就会认为一部分索引访问不导致物理读,而是逻辑读的.
也就是说设置optimizer_index_caching这个参数后,优化器会认为TABLE ACCESS FULL是要全部物理读,而一部分索引(比例和optimizer_index_caching的设置值有关)访问是不需要物理读,是逻辑读的,所以它的cost计算要调低一些.这样区别对待的.

但对于我们的系统来说,数据库并不是很大,一些频繁访问的表基本上都是在buffer cache中,都不需要物理读的(做很长时间跨度的awr报表,Pool Hit%基本上都是100的),无论全表扫描还是索引访问都只是逻辑读的.这样通过设置这个参数,让oracle优化器认为全表扫描是需要物理读,而一部分索引访问只是逻辑读的,是不符合数据库的实际情况的.需要让优化器同等对待它们,这时可以使用optimizer_index_caching的默认值0,认为它们都是需要物理读的.(虽然说最好让优化器认为它们都只需要逻辑读)

optimizer_index_cost_adj参数引入之初的目的是平衡单块IO的代价和多块IO的代价之间的不平衡(通过设置它也可以让优化器在计算索引访问时降低它的cost值，在一定程度上解决所有的访问都导致物理读的问题).但它是通过降低单块读的代价,而不是通过增加多块读的代价来平衡这种不平衡的.

9I开始引入的CPU代价模型可以很好的平衡单块IO的代价和多块IO的代价之间的不平衡,而且可以让优化器知道CPU的实际处理能力,对于sort,hash join等操作,可以更加准确的衡量它们的cost,得出更加理想的执行计划.
所以在10g的时候,是推荐收集系统统计信息从而使用CPU代价模型的,而不再推荐设置optimizer_index_cost_adj成非默认值了,保持默认值100就可以了.另外收集系统统计信息可以收集MBRC,实际的多块读信息,而不再推荐使用db_file_multiblock_read_count参数来手工的设置多块读信息了.

收集一下系统统计信息看看:
sys用户模式下:
exec dbms_stats.create_stat_table(ownname => 'SYS',stattab => 'SYSSTATS_TAB');

exec dbms_stats.export_system_stats(statown => 'SYS',stattab => 'SYSSTATS_TAB',statid => 'DEFAULT');
--把当前的系统统计信息给导出来一下,便于测试完后重新导回去,因为这个功能的使用与否还需要更多的测试工作.

--在上午9:30--11:30之间,收集系统统计信息,这是白天具有代表性的时间段
exec dbms_stats.gather_system_stats(statown => 'SYS',stattab => 'SYSSTATS_TAB',statid => 'DAYTIME',gathering_mode=>'INTERVAL',interval=>120);

--在中午的时候进行测试工作:
sys用户模式下:
exec dbms_stats.import_system_stats(statown =>'SYS',stattab => 'SYSSTATS_TAB',statid => 'DAYTIME');
查看sys.aux_stats$,系统统计信息确实变了

然后用普通用户测试:
使用系统统计信息,optimizer_index_caching使用0,optimizer_index_cost_adj使用默认值100.

alter session set optimizer_index_caching=0;
alter session set optimizer_index_cost_adj=100;
show parameter index

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_index_caching              integer     0
optimizer_index_cost_adj             integer     100
skip_unusable_indexes                boolean     TRUE

收集系统统计信息不会像收集模式统计信息那样invalid相关游标,导致这些游标重新硬分析的。
所以收集完系统统计信息后,共享池中的游标还是使用原来的执行计划,不会改变.
只有需要硬分析的时候,这些sql才会使用新的系统统计信息的计算cost的.
所以这里使用了全新的sql语句.

select *
          from (select tt.*, rownum no
                  from (select /*+ full(p6) ordered use_nl(p6 cp6)*/p6.id          as id,
                               p6.image2      as img,
                               p6.image3      as img3,
                               p6.productname as name,
                               p6.refprice    as price
                          from product p6,catalogrelateproduct cp6
                         where p6.publishstatus = 3 and cp6.catalogid = 122 and cp6.productid = p6.id
                           and bitand(p6.status, 8) <> 8
                         order by p6.priority desc, p6.pageview desc) tt)
         where no >= 521
           and no <= 540;

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time       | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                  |                         |      1 | 20687 | 6969K|       |   184K (1)| 00:15:59 |     20 |00:00:01.72 |     392K|
|   2 |   COUNT                |                         |      1 |        |       |       |            |          |   3541 |00:00:01.72 |     392K|
|   3 |    VIEW                |                         |      1 | 20687 | 6707K|       |   184K (1)| 00:15:59 |   3541 |00:00:01.71 |     392K|
|   4 |     SORT ORDER BY      |                         |      1 | 20687 | 4424K| 9480K|   184K (1)| 00:15:59 |   3541 |00:00:01.70 |     392K|
|   5 |      NESTED LOOPS      |                         |      1 | 20687 | 4424K|       |   183K (1)| 00:15:54 |   3541 |00:00:01.69 |     392K|
|* 6 |       TABLE ACCESS FULL| PRODUCT                 |      1 |    175K|    34M|       | 8130   (2)| 00:00:43 |    176K|00:00:00.53 |   39461 |
|* 7 |       INDEX UNIQUE SCAN| PK_CATALOGRELATEPRODUCT |    176K|      1 |    10 |       |     1   (0)| 00:00:01 |   3541 |00:00:00.75 |     352K|
----------------------------------------------------------------------------------------------------------------------------------------------------

select *
          from (select tt.*, rownum no
                  from (select /*+ use_hash(p6,cp6) */p6.id          as id,
                               p6.image2      as img,
                               p6.image3      as img3,
                               p6.productname as name,
                               p6.refprice    as price
                          from product p6,catalogrelateproduct cp6
                         where p6.publishstatus = 3 and cp6.catalogid = 122 and cp6.productid = p6.id
                           and bitand(p6.status, 8) <> 8
                         order by p6.priority desc, p6.pageview desc) tt)
         where no >= 521
           and no <= 540;

----------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time       | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                  |                         |      1 | 20687 | 6969K|       | 9190   (2)| 00:00:48 |     20 |00:00:01.22 |   39524 |
|   2 |   COUNT                |                         |      1 |        |       |       |            |          |   3541 |00:00:01.21 |   39524 |
|   3 |    VIEW                |                         |      1 | 20687 | 6707K|       | 9190   (2)| 00:00:48 |   3541 |00:00:01.21 |   39524 |
|   4 |     SORT ORDER BY      |                         |      1 | 20687 | 4424K|     9M| 9190   (2)| 00:00:48 |   3541 |00:00:01.20 |   39524 |
|* 5 |      HASH JOIN         |                         |      1 | 20687 | 4424K|       | 8203   (2)| 00:00:43 |   3541 |00:00:01.19 |   39524 |
|* 6 |       INDEX RANGE SCAN | PK_CATALOGRELATEPRODUCT |      1 | 20687 |   202K|       |    70   (2)| 00:00:01 | 19665 |00:00:00.02 | 63 |
|* 7 |       TABLE ACCESS FULL| PRODUCT                 |      1 |    175K|    34M|       | 8130   (2)| 00:00:43 |    176K|00:00:00.53 |   39461 |
----------------------------------------------------------------------------------------------------------------------------------------------------

select *
          from (select tt.*, rownum no
                  from (select /*+ index(p6) */p6.id          as id,
                               p6.image2      as img,
                               p6.image3      as img3,
                               p6.productname as name,
                               p6.refprice    as price
                          from product p6,catalogrelateproduct cp6
                         where p6.publishstatus = 3 and cp6.catalogid = 122 and cp6.productid = p6.id
                           and bitand(p6.status, 8) <> 8
                         order by p6.priority desc, p6.pageview desc) tt)
         where no >= 521
           and no <= 540;

--------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                        | Name                    | Starts | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW                            |                         |      1 | 20687 | 6969K|       | 21772   (1)| 00:01:54 |     20 |00:00:00.39 |   39450 |
|   2 |   COUNT                          |                         |      1 |        |       |       |            |          |   3541 |00:00:00.39 |   39450 |
|   3 |    VIEW                          |                         |      1 | 20687 | 6707K|       | 21772   (1)| 00:01:54 |   3541 |00:00:00.38 |   39450 |
|   4 |     SORT ORDER BY                |                         |      1 | 20687 | 4424K|     9M| 21772   (1)| 00:01:54 |   3541 |00:00:00.37 |   39450 |
|   5 |      NESTED LOOPS                |                         |      1 | 20687 | 4424K|       | 20785   (1)| 00:01:49 |   3541 |00:00:00.36 |   39450 |
|* 6 |       INDEX RANGE SCAN           | PK_CATALOGRELATEPRODUCT |      1 | 20687 |   202K|       |    70   (2)| 00:00:01 | 19665 |00:00:00.02 |          63 |
|* 7 |       TABLE ACCESS BY INDEX ROWID| PRODUCT                 | 19665 |      1 |   209 |       |     1   (0)| 00:00:01 |   3541 |00:00:00.29 |   39387 |
|* 8 |        INDEX UNIQUE SCAN         | PK_PRODUCT              | 19665 |      1 |       |       |     0   (0)|          | 19665 |00:00:00.12 |   19667 |
--------------------------------------------------------------------------------------------------------------------------------------------------------------

第一种方案的cost是最大的,可hash join方案的执行计划的cost还是比我相要的第三种方案的cost小呀,默认还是会选择hash join的。
但hash join的cpu消耗会比第三种方案要大,而且hash join的逻辑IO会稳定在这个逻辑IO基础上,因为TABLE ACCESS FULL| PRODUCT 是必须的。
而第三种方案的最大逻辑IO是这个值,逻辑IO值和INDEX RANGE SCAN PK_CATALOGRELATEPRODUCT返回的行数有很大关系,其实使用别的cp6.catalogid = :1绑定变量值的时候,逻辑IO会比这个值小很多的.

为什么还是这样呢?暂且不说这个问题.
还是再回来看看最初的执行计划里,nl连接的内表索引访问代价是0的问题吧!
可以参考一下这个文档:
http://www.centrexcc.com/papers.html
下的The Effects of optimizer_index_cost_adj and optimizer_index_caching on Access Plans

其实nl连接,内表使用索引扫描时候的IO代价可以这样计算:
nl cost= cost(outer)
+
cdn(outer)
*optimizer_index_cost_adj/100*
(
blevel + --INDEX UNIQUE SCAN的话,这一部分按0算的
(100-Optimizer_Index_Caching)/100*leaf_blocks * effective index selectivity + --这里可能会round或者ceil的,所以Optimizer_Index_Caching<>0时,这一部分结果可能是1甚至是0的,=0时,不会是0
clustering_factor * effective table selectivity --这里可能会round或者ceil的,INDEX UNIQUE SCAN并且回访表的话,这一部分按1算的(ceil的结果)
)

其实sql语句原来的问题就是因为蓝色字体部分引起的.
Optimizer_Index_Caching<>0,并且nl连接的内表使用唯一索引扫描的时候,索引访问的cost可能就会变成0,也就是说不管外表返回的card有多大,内表访问是完全免费的,导致优化器选择了性能很差的执行计划.

我觉得可以这样理解这个问题:
cost(inner)=card(outer)*round(0.3)=17.4W*0=0; (比如说外表返回了17.4W行数据,内表索引的访问成本是0.3)
而不是按cost(inner)=round(card(outer)*0.3)=round(17.4W*0.3)=52200 计算的.

下面看看例子:
还是使用默认的noworkload的系统统计信息,optimizer_index_cost_adj使用默认值100,改变optimizer_index_caching来看看内表使用索引唯一扫描时的效果(分为不需要回访表和需要回访表两种)

这个系统统计信息还是需要仔细测试一下的,暂时还是使用原来的统计信息吧:
sys用户下:
EXEC DBMS_STATS.delete_system_stats();
exec dbms_stats.import_system_stats(statown =>'SYS',stattab => 'SYSSTATS_TAB',statid => 'DEFAULT');
查看sys.aux_stats$,系统统计信息变回去了

先看不需要回访表的:
SQL> show parameter index

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_index_caching              integer     60
optimizer_index_cost_adj             integer     100
skip_unusable_indexes                boolean     TRUE
SQL> alter session set optimizer_index_caching=0;

SQL>explain plan for
    select *
            from (select tt.*, rownum no
                    from (select /*+ full(p1) ordered use_nl(p1 cp1)*/p1.id          as id,
                                 p1.image2      as img,
                                 p1.image3      as img3,
                                 p1.productname as name,
                                 p1.refprice    as price
                            from product p1,catalogrelateproduct cp1
                           where p1.publishstatus = 3 and cp1.catalogid = 122 and cp1.productid = p1.id
                             and bitand(p1.status, 8) <> 8
                           order by p1.priority desc, p1.pageview desc) tt)
           where no >= 521
             and no <= 540;

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3306616323

----------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                         | 20687 | 6969K|       |   183K (1)| 00:36:38 |
|* 1 | VIEW                  |                         | 20687 | 6969K|       |   183K (1)| 00:36:38 |
|   2 |   COUNT                |                         |       |       |       |            |          |
|   3 |    VIEW                |                         | 20687 | 6707K|       |   183K (1)| 00:36:38 |
|   4 |     SORT ORDER BY      |                         | 20687 | 4424K| 9480K|   183K (1)| 00:36:38 |
|   5 |      NESTED LOOPS      |                         | 20687 | 4424K|       |   182K (1)| 00:36:26 |
|* 6 |       TABLE ACCESS FULL| PRODUCT                 |   175K|    34M|       | 6909   (1)| 00:01:23 |
|* 7 |       INDEX UNIQUE SCAN| PK_CATALOGRELATEPRODUCT |     1 |    10 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("NO"<=540 AND "NO">=521)
   6 - filter(BITAND("P1"."STATUS",8)<>8 AND "P1"."PUBLISHSTATUS"=3)
   7 - access("CP1"."CATALOGID"=122 AND "CP1"."PRODUCTID"="P1"."ID")

21 rows selected.

SQL> alter session set optimizer_index_caching=1;

--sql语句就不再贴了

SQL> alter session set optimizer_index_caching=2;

----------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                         | 20687 | 6969K|       | 7906   (1)| 00:01:35 |
|* 1 | VIEW                  |                         | 20687 | 6969K|       | 7906   (1)| 00:01:35 |
|   2 |   COUNT                |                         |       |       |       |            |          |
|   3 |    VIEW                |                         | 20687 | 6707K|       | 7906   (1)| 00:01:35 |
|   4 |     SORT ORDER BY      |                         | 20687 | 4424K| 9480K| 7906   (1)| 00:01:35 |
|   5 |      NESTED LOOPS      |                         | 20687 | 4424K|       | 6921   (1)| 00:01:24 |
|* 6 |       TABLE ACCESS FULL| PRODUCT                 |   175K|    34M|       | 6909   (1)| 00:01:23 |
|* 7 |       INDEX UNIQUE SCAN| PK_CATALOGRELATEPRODUCT |     1 |    10 |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

--从2开始内表访问的IO代价变为0了,总的cost猛的下来了,此后再往大了改optimizer_index_caching的参数设置值,总的cost就不再改变了

再来看看需要回访表的:

SQL> show parameter index

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_index_caching              integer     100
optimizer_index_cost_adj             integer     100
skip_unusable_indexes                boolean     TRUE
SQL> alter session set optimizer_index_caching=0;

SQL> explain plan for
select *
            from (select tt.*, rownum no
                    from (select /*+ full(p1) ordered use_nl(p1 cp1) index(cp1 (CATALOGID,PRODUCTID)) */p1.id          as id,
                                 p1.image2      as img,
                                 p1.image3      as img3,
                                 p1.productname as name,
                                 p1.refprice    as price,
                                 cp1.priority
                            from product p1,catalogrelateproduct cp1
                           where p1.publishstatus = 3 and cp1.catalogid = 122 and cp1.productid = p1.id
                             and bitand(p1.status, 8) <> 8
                           order by p1.priority desc, p1.pageview desc) tt)
           where no >= 521
             and no <= 540;

SQL> select * from table(dbms_xplan.display());

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("NO"<=540 AND "NO">=521)
   6 - filter(BITAND("P1"."STATUS",8)<>8 AND "P1"."PUBLISHSTATUS"=3)
   8 - access("CP1"."CATALOGID"=122 AND "CP1"."PRODUCTID"="P1"."ID")

SQL> alter session set optimizer_index_caching=1;

SQL> alter session set optimizer_index_caching=2;

--从2开始内表访问的IO代价变为1了(INDEX UNIQUE SCAN cost变成0,只剩下回访表的代价1了),总的cost猛的下来了,此后总的cost就不再改变了
当然,这里不一定就是从2开始的,反正optimizer_index_caching<>0时,nl连接的内表使用唯一索引扫描的时候,从某个值开始,索引访问的cost可能就会变成0,内表访问是完全免费的,导致优化器选择了性能很差的执行计划.

最后再说说第2行绿色字体部分的问题.
单就catalogid = 122 这个sql而言,hash join和我想要的nl连接相比,它们的逻辑IO相差无几,可因为hash join本身抛开逻辑读之外,hash join本身也是极其消耗cpu的,所以hash join的方式cpu时间消耗比我想要的nl连接的方式大得多,所以hash join的方式执行时间比我想要的nl连接方式要大得多,可为什么它的cost是最小的,优化器选择了它呢?
仔细想想,其实10053事件也不需要看了.
其实我这里都是cache的,无论是全表扫描,还是索引的使用,就像我第1行绿色字体标识的那样,最好让优化器认为它们都是只需要逻辑读,这是最好的.没有办法下,我只能让优化器认为它们都是需要物理读的.这样第三个sql中,nl连接的方式下,nl连接后的20785的代价基本上就是物理IO的代价.而hash join的情况下,物理io的代价基本上就是8130+70=8200,实际执行时最耗资源的hash join,它耗费的是cpu资源,相对于物理读来说,它的资源消耗被大大弱化了,也就是说这部分cpu资源消耗转换为物理io代价时被大大缩小。最终导致了这样的cost的计算结果.但问题是我这里实际上不存在物理IO,都是逻辑IO,而逻辑IO它的CPU资源消耗相对于hash join来说倒是次要的了.
问题是我能不能让优化器知道我这里都是逻辑IO,也就是我这些块都cache起来了呢?
实际上oracle提供了两个隐含参数:
_cache_stats_monitor (我在10.2.0.4的版本上发现默认值是FALSE)
_optimizer_cache_stats (默认值是FALSE)
第一个隐含参数决定是否开启CACHE统计信息的收集,第二个参数决定优化器是否使用CACHE统计信息
提供了这样的功能.

NAME                           VALUE                     ISDEFAULT ISMOD      ISADJ
------------------------------ ------------------------- --------- ---------- -----
_cache_stats_monitor           FALSE                     TRUE      FALSE      FALSE
_optimizer_cache_stats         FALSE                     TRUE      FALSE      FALSE

这是系统默认值

alter system set "_cache_stats_monitor"=true;
因为这个数据库的负载平时还是比较低的,cache统计信息的收集不会给数据库带来太大的负载,所以直接在生产库上开启了cache统计信息的收集

NAME VALUE ISDEFAULT ISMOD ISADJ
------------------------------ ------------------------- --------- ---------- -----
_cache_stats_monitor TRUE TRUE SYSTEM_MOD FALSE
现在的值.

开启cache统计信息的收集之后有4,5个小时后,实验的时候,发现三种执行计划的cost没有任何的变化,只是TABLE ACCESS FULL PRODUCT的cost减小了1(6909->6908)导致最终hash join的执行计划的cost减小了1而已,cache统计信息的收集似乎不管用.
后来查找资料后发现sys.tab_stats$,sys.ind_stats$这两个表里没有任何的数据,似乎只有sys.cache_stats_1$表里有数据

随后进行了这样的操作:
execute dbms_stats.gather_table_stats(user,'PRODUCT',stattype => 'cache',cascade => true);
execute dbms_stats.gather_table_stats(user,'CATALOGRELATEPRODUCT',stattype => 'cache',cascade => true);
而后查询:
select COUNT(1) from sys.cache_stats_1$;
COUNT(1)
----------
       604
select COUNT(1) from sys.tab_stats$;
COUNT(1)
----------
         2
select COUNT(1) from sys.ind_stats$;
COUNT(1)
----------
        15
发现都有数据了.

alter session set optimizer_index_caching=0;
alter session set optimizer_index_cost_adj=100;
--当前的这两个参数的设置我没有改,所以会话里修改成了默认值

SQL> explain plan for
2 select *
3            from (select tt.*, rownum no
4                    from (select /*+ full(p8) ordered use_nl(p8 cp8)*/p8.id          as id,
5                                 p8.image2      as img,
6                                 p8.image3      as img3,
7                                 p8.productname as name,
8                                 p8.refprice    as price
9                            from product p8,catalogrelateproduct cp8
10                           where p8.publishstatus = 3 and cp8.catalogid = 122 and cp8.productid = p8.id
11                             and bitand(p8.status, 8) <> 8
12                           order by p8.priority desc, p8.pageview desc) tt)
13           where no >= 521
14             and no <= 540;

----------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                         | 20686 | 6969K|       |   183K (1)| 00:36:46 |
|* 1 | VIEW                  |                         | 20686 | 6969K|       |   183K (1)| 00:36:46 |
|   2 |   COUNT                |                         |       |       |       |            |          |
|   3 |    VIEW                |                         | 20686 | 6706K|       |   183K (1)| 00:36:46 |
|   4 |     SORT ORDER BY      |                         | 20686 | 4444K| 9480K|   183K (1)| 00:36:46 |
|   5 |      NESTED LOOPS      |                         | 20686 | 4444K|       |   182K (1)| 00:36:34 |
|* 6 |       TABLE ACCESS FULL| PRODUCT                 |   175K|    35M|       | 6909   (1)| 00:01:23 |
|* 7 |       INDEX UNIQUE SCAN| PK_CATALOGRELATEPRODUCT |     1 |    10 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

SQL> explain plan for
2 select *
3            from (select tt.*, rownum no
4                    from (select /*+ use_hash(p8,cp8) */p8.id          as id,
5                                 p8.image2      as img,
6                                 p8.image3      as img3,
7                                 p8.productname as name,
8                                 p8.refprice    as price
9                            from product p8,catalogrelateproduct cp8
10                           where p8.publishstatus = 3 and cp8.catalogid = 122 and cp8.productid = p8.id
11                             and bitand(p8.status, 8) <> 8
12                           order by p8.priority desc, p8.pageview desc) tt)
13           where no >= 521
14             and no <= 540;

----------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                         | 20686 | 6969K|       | 7970   (1)| 00:01:36 |
|* 1 | VIEW                  |                         | 20686 | 6969K|       | 7970   (1)| 00:01:36 |
|   2 |   COUNT                |                         |       |       |       |            |          |
|   3 |    VIEW                |                         | 20686 | 6706K|       | 7970   (1)| 00:01:36 |
|   4 |     SORT ORDER BY      |                         | 20686 | 4444K|     9M| 7970   (1)| 00:01:36 |
|* 5 |      HASH JOIN         |                         | 20686 | 4444K|       | 6980   (1)| 00:01:24 |
|* 6 |       INDEX RANGE SCAN | PK_CATALOGRELATEPRODUCT | 20687 |   202K|       |    69   (0)| 00:00:01 |
|* 7 |       TABLE ACCESS FULL| PRODUCT                 |   175K|    35M|       | 6909   (1)| 00:01:23 |
----------------------------------------------------------------------------------------------------------

SQL> explain plan for
2 select *
3            from (select tt.*, rownum no
4                    from (select /*+ index(p8) */p8.id          as id,
5                                 p8.image2      as img,
6                                 p8.image3      as img3,
7                                 p8.productname as name,
8                                 p8.refprice    as price
9                            from product p8,catalogrelateproduct cp8
10                           where p8.publishstatus = 3 and cp8.catalogid = 122 and cp8.productid = p8.id
11                             and bitand(p8.status, 8) <> 8
12                           order by p8.priority desc, p8.pageview desc) tt)
13           where no >= 521
14             and no <= 540;

--------------------------------------------------------------------------------------------------------------------
| Id | Operation                        | Name                    | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                         | 20686 | 6969K|       | 21760   (1)| 00:04:22 |
|* 1 | VIEW                            |                         | 20686 | 6969K|       | 21760   (1)| 00:04:22 |
|   2 |   COUNT                          |                         |       |       |       |            |          |
|   3 |    VIEW                          |                         | 20686 | 6706K|       | 21760   (1)| 00:04:22 |
|   4 |     SORT ORDER BY                |                         | 20686 | 4444K|     9M| 21760   (1)| 00:04:22 |
|   5 |      NESTED LOOPS                |                         | 20686 | 4444K|       | 20771   (1)| 00:04:10 |
|* 6 |       INDEX RANGE SCAN           | PK_CATALOGRELATEPRODUCT | 20687 |   202K|       |    69   (0)| 00:00:01 |
|* 7 |       TABLE ACCESS BY INDEX ROWID| PRODUCT                 |     1 |   210 |       |     1   (0)| 00:00:01 |
|* 8 |        INDEX UNIQUE SCAN         | PK_PRODUCT              |     1 |       |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("NO"<=540 AND "NO">=521)
   6 - access("CP8"."CATALOGID"=122)
   7 - filter(BITAND("P8"."STATUS",8)<>8 AND "P8"."PUBLISHSTATUS"=3)
   8 - access("CP8"."PRODUCTID"="P8"."ID")

--之前,优化器并不使用cache统计信息来计算COST.
SQL> alter session set "_optimizer_cache_stats"=true;
--当前会话,优化器使用cache统计信息来计算COST.

----------------------------------------------------------------------------------------------------
| Id | Operation              | Name              | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                   | 20686 | 6969K|       | 1249 (12)| 00:00:15 |
|* 1 | VIEW                  |                   | 20686 | 6969K|       | 1249 (12)| 00:00:15 |
|   2 |   COUNT                |                   |       |       |       |            |          |
|   3 |    VIEW                |                   | 20686 | 6706K|       | 1249 (12)| 00:00:15 |
|   4 |     SORT ORDER BY      |                   | 20686 | 4444K| 9480K| 1249 (12)| 00:00:15 |
|   5 |      NESTED LOOPS      |                   | 20686 | 4444K|       |   259 (55)| 00:00:04 |
|* 6 |       TABLE ACCESS FULL| PRODUCT           |   175K|    35M|       |    63 (42)| 00:00:01 |
|* 7 |       INDEX RANGE SCAN | IDX_CRP_ID_STATUS |     1 |    10 |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
IDX_CRP_ID_STATUS        :index(PRODUCTID, MAINBRANDSTATUS, CATALOGID)
PK_CATALOGRELATEPRODUCT :index(CATALOGID, PRODUCTID)
和false时对应的执行计划,可以是认为没有区别的,其实这里的焦点也不是这个执行计划的对比,而是另外两个执行计划的对比

----------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                         | 20686 | 6969K|       | 1056   (3)| 00:00:13 |
|* 1 | VIEW                  |                         | 20686 | 6969K|       | 1056   (3)| 00:00:13 |
|   2 |   COUNT                |                         |       |       |       |            |          |
|   3 |    VIEW                |                         | 20686 | 6706K|       | 1056   (3)| 00:00:13 |
|   4 |     SORT ORDER BY      |                         | 20686 | 4444K|     9M| 1056   (3)| 00:00:13 |
|* 5 |      HASH JOIN         |                         | 20686 | 4444K|       |    66 (43)| 00:00:01 |
|* 6 |       INDEX RANGE SCAN | PK_CATALOGRELATEPRODUCT | 20687 |   202K|       |     1   (0)| 00:00:01 |
|* 7 |       TABLE ACCESS FULL| PRODUCT                 |   175K|    35M|       |    63 (42)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------
| Id | Operation                        | Name                    | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                         | 20686 | 6969K|       | 1082   (1)| 00:00:13 |
|* 1 | VIEW                            |                         | 20686 | 6969K|       | 1082   (1)| 00:00:13 |
|   2 |   COUNT                          |                         |       |       |       |            |          |
|   3 |    VIEW                          |                         | 20686 | 6706K|       | 1082   (1)| 00:00:13 |
|   4 |     SORT ORDER BY                |                         | 20686 | 4444K|     9M| 1082   (1)| 00:00:13 |
|   5 |      NESTED LOOPS                |                         | 20686 | 4444K|       |    92   (7)| 00:00:02 |
|* 6 |       INDEX RANGE SCAN           | PK_CATALOGRELATEPRODUCT | 20687 |   202K|       |     1   (0)| 00:00:01 |
|* 7 |       TABLE ACCESS BY INDEX ROWID| PRODUCT                 |     1 |   210 |       |     0   (0)| 00:00:01 |
|* 8 |        INDEX UNIQUE SCAN         | PK_PRODUCT              |     1 |       |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("NO"<=540 AND "NO">=521)
   6 - access("CP8"."CATALOGID"=122)
   7 - filter(BITAND("P8"."STATUS",8)<>8 AND "P8"."PUBLISHSTATUS"=3)
   8 - access("CP8"."PRODUCTID"="P8"."ID")

可以发现,alter session set "_optimizer_cache_stats"=true;
执行前后,对应的执行计划的cost已经发生了很大的变化
特别是第一个NL连接方式的执行计划已经从183K->1249,虽然说使用的索引稍微不同些,但这个变化带来的代价变化应该很小的,实际代价变化很大,主要还是因为原来的物理读,现在优化器使用cache统计信息之后,知道它们都cache着呢,按照逻辑读来计算,所以变小了很多.第三个nl连接的方式,变化前后执行计划是完全相同的,但cost已经从21760->1082了,都是因为优化器使用了cache统计信息造成的.
但这里hash join的代价还是比第三种nl连接的代价要低,实际上不应该这样的,这是为什么呢?会不会和优化器所知道的CPU处理能力之类的系统统计信息有关呢?我现在还是使用的默认的系统统计信息,而不是使用的系统负载时的系统统计信息.那就使用系统负载时的系统统计信息.

EXEC DBMS_STATS.delete_system_stats();
exec dbms_stats.import_system_stats(statown =>'SYS',stattab => 'SYSSTATS_TAB',statid => 'DAYTIME');

alter session set optimizer_index_caching=0;
alter session set optimizer_index_cost_adj=100;
alter session set "_optimizer_cache_stats"=true;

SQL> explain plan for
2 select *
3            from (select tt.*, rownum no
4                    from (select /*+ full(p88) ordered use_nl(p88 cp88)*/p88.id          as id,
5                                 p88.image2      as img,
6                                 p88.image3      as img3,
7                                 p88.productname as name,
8                                 p88.refprice    as price
9                            from product p88,catalogrelateproduct cp88
10                           where p88.publishstatus = 3 and cp88.catalogid = 122 and cp88.productid = p88.id
11                             and bitand(p88.status, 8) <> 8
12                           order by p88.priority desc, p88.pageview desc) tt)
13           where no >= 521
14             and no <= 540;

----------------------------------------------------------------------------------------------------
| Id | Operation              | Name              | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                   | 20686 | 6969K|       | 1445 (23)| 00:00:07 |
|* 1 | VIEW                  |                   | 20686 | 6969K|       | 1445 (23)| 00:00:07 |
|   2 |   COUNT                |                   |       |       |       |            |          |
|   3 |    VIEW                |                   | 20686 | 6706K|       | 1445 (23)| 00:00:07 |
|   4 |     SORT ORDER BY      |                   | 20686 | 4444K| 9480K| 1445 (23)| 00:00:07 |
|   5 |      NESTED LOOPS      |                   | 20686 | 4444K|       |   453 (73)| 00:00:03 |
|* 6 |       TABLE ACCESS FULL| PRODUCT           |   175K|    35M|       |   105 (58)| 00:00:01 |
|* 7 |       INDEX RANGE SCAN | IDX_CRP_ID_STATUS |     1 |    10 |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

SQL> explain plan for
2 select *
3            from (select tt.*, rownum no
4                    from (select /*+ use_hash(p88,cp88) */p88.id          as id,
5                                 p88.image2      as img,
6                                 p88.image3      as img3,
7                                 p88.productname as name,
8                                 p88.refprice    as price
9                            from product p88,catalogrelateproduct cp88
10                           where p88.publishstatus = 3 and cp88.catalogid = 122 and cp88.productid = p88.id
11                             and bitand(p88.status, 8) <> 8
12                           order by p88.priority desc, p88.pageview desc) tt)
13           where no >= 521
14             and no <= 540;

----------------------------------------------------------------------------------------------------------
| Id | Operation              | Name                    | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                         | 20686 | 6969K|       | 1102   (7)| 00:00:06 |
|* 1 | VIEW                  |                         | 20686 | 6969K|       | 1102   (7)| 00:00:06 |
|   2 |   COUNT                |                         |       |       |       |            |          |
|   3 |    VIEW                |                         | 20686 | 6706K|       | 1102   (7)| 00:00:06 |
|   4 |     SORT ORDER BY      |                         | 20686 | 4444K|     9M| 1102   (7)| 00:00:06 |
|* 5 |      HASH JOIN         |                         | 20686 | 4444K|       |   110 (59)| 00:00:01 |
|* 6 |       INDEX RANGE SCAN | PK_CATALOGRELATEPRODUCT | 20687 |   202K|       |     1   (0)| 00:00:01 |
|* 7 |       TABLE ACCESS FULL| PRODUCT                 |   175K|    35M|       |   105 (58)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

SQL> explain plan for
2 select *
3            from (select tt.*, rownum no
4                    from (select /*+ index(p88) */p88.id          as id,
5                                 p88.image2      as img,
6                                 p88.image3      as img3,
7                                 p88.productname as name,
8                                 p88.refprice    as price
9                            from product p88,catalogrelateproduct cp88
10                           where p88.publishstatus = 3 and cp88.catalogid = 122 and cp88.productid = p88.id
11                             and bitand(p88.status, 8) <> 8
12                           order by p88.priority desc, p88.pageview desc) tt)
13           where no >= 521
14             and no <= 540;

--------------------------------------------------------------------------------------------------------------------
| Id | Operation                        | Name                    | Rows | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                         | 20686 | 6969K|       | 1092   (2)| 00:00:06 |
|* 1 | VIEW                            |                         | 20686 | 6969K|       | 1092   (2)| 00:00:06 |
|   2 |   COUNT                          |                         |       |       |       |            |          |
|   3 |    VIEW                          |                         | 20686 | 6706K|       | 1092   (2)| 00:00:06 |
|   4 |     SORT ORDER BY                |                         | 20686 | 4444K|     9M| 1092   (2)| 00:00:06 |
|   5 |      NESTED LOOPS                |                         | 20686 | 4444K|       |   100 (14)| 00:00:01 |
|* 6 |       INDEX RANGE SCAN           | PK_CATALOGRELATEPRODUCT | 20687 |   202K|       |     1   (0)| 00:00:01 |
|* 7 |       TABLE ACCESS BY INDEX ROWID| PRODUCT                 |     1 |   210 |       |     0   (0)| 00:00:01 |
|* 8 |        INDEX UNIQUE SCAN         | PK_PRODUCT              |     1 |       |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------------
SQL> explain plan for
2 select *
3            from (select tt.*, rownum no
4                    from (select p88.id          as id,
5                                 p88.image2      as img,
6                                 p88.image3      as img3,
7                                 p88.productname as name,
8                                 p88.refprice    as price
9                            from product p88,catalogrelateproduct cp88
10                           where p88.publishstatus = 3 and cp88.catalogid = 122 and cp88.productid = p88.id
11                             and bitand(p88.status, 8) <> 8
12                           order by p88.priority desc, p88.pageview desc) tt)
13           where no >= 521
14             and no <= 540;

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("NO"<=540 AND "NO">=521)
   6 - access("CP88"."CATALOGID"=122)
   7 - filter(BITAND("P88"."STATUS",8)<>8 AND "P88"."PUBLISHSTATUS"=3)
   8 - access("CP88"."PRODUCTID"="P88"."ID")

现在,执行计划已经是我想要的了,第三种执行计划的COST已经是最小的了.
你可能会说有找的成分,但至少说明系统统计信息对执行计划的COST的计算产生了影响,至少从这个SQL来看产生了有益的影响.

扫尾一下:
alter system set "_cache_stats_monitor"=false; --这个隐含参数恢复默认值
EXEC DBMS_STATS.delete_system_stats();
exec dbms_stats.import_system_stats(statown =>'SYS',stattab => 'SYSSTATS_TAB',statid => 'DEFAULT');
仍然使用原来的系统统计信息,真实的系统统计信息,在正常负载时段反复收集后,再导入吧!

其实这里展现了我这个案例的整个处理过程,可能显得很繁琐,下面就简单的总结一下:
其实这个案例总的来说是由两个问题引发的:
一个是绑定变量的peeking机制以及相应的字段上收集了柱状图统计信息而带来的执行计划的不稳定性,这个很容易理解,这里就不多说了.
另一个就是cache的问题,oracle引入optimizer_index_caching这个参数应该来说就是为了让优化器在计算cost时考虑index cache的影响,但手工设置容易带来这样那样负面的影响,并且加大了DBA的优化负担.所以oracle开始考虑自动收集cache统计信息,然后优化器在计算cost时使用这样的cache统计信息.但这应该是一个很麻烦的事情,比如说:一开始没有cache统计信息,优化器选择了某个执行计划A,执行计划A不断执行的结果可能是加强了有利于自身cost计算的cache信息,使得这个执行计划执行的越多,看来越优,但也许一开始这个执行计划其实不是最优的,还有其它更好的执行计划,比如说B,如果B执行计划能执行的话,cache统计信息会有利于它永远的最优下去,可优化器一开始选择了A,cache统计信息对A的cost计算越来越有利,执行一段时间后,即使纠正了原来导致优化器选择A放弃更好的B的信息,但现在看来,也是A比B好了.确实,这其中需要考虑的东西太多了(Jonathan Lewis 在他的《Cost-Based Oracle Fundamentals》中有一个小节讨论了cache的问题,大家有兴趣的话,可以看看),在解决一些问题的同时,可能会引入一些新的问题,所以默认oracle没有开启这样的功能.但cache信息确实是优化器在考虑代价时需要考量的因素.所以在oracle把自己的cache代价机制完善之前,可能optimizer_index_caching这个参数在某些系统中还是需要设置的,但需要根据系统的数据访问特征进行反复实验后找到一个真正反映系统数据特点的值来,而不能随随便便的从网上考了一个典型配置值设置了事,也不能某个值在自己的系统A表现很好,不经测试就把它直接用于系统B,因为它们两个的数据特点也许是不一样的.也需要知道它有这里提到的很负面的影响:Optimizer_Index_Caching<>0,并且nl连接的内表使用唯一索引扫描的时候,索引访问的cost可能就会变成0,也就是说不管外表返回的card有多大,内表访问是完全免费的,导致优化器选择了性能很差的执行计划.
另外在9I,特别是10G中,我觉得应该开启系统统计信息的收集,让优化器使用这样的统计信息了,以代替optimizer_index_cost_adj,db_file_multiblock_read_count这样一些参数的使用,但需要反复的收集,以得到具有代表性的数据来.收集了负载时的系统统计信息之后,optimizer_index_cost_adj就应该保持默认值100了,db_file_multiblock_read_count这个参数不再起作用,优化器会使用自己收集的MBRC来作为多块读的块数.

秒客网

从一个SQL使用了不理想的执行计划说开,浅谈执行计划如何估算cache信息的影响及系统统计信息的收集等(2010-10-15)

相关文章