Oracle 后台进程（五）SMON进程

转载自：刘相兵 Maclean Liu 文章

你所不知道的后台进程 SMON 功能

SMON(system monitor process)系统监控后台进程，有时候也被叫做 system cleanup process，

这么叫的原因是它负责完成很多清理(cleanup)任务。但凡学习过 Oracle 基础知识的技术人员

都会或多或少对该 background process 的功能有所了解。

我们所熟知的 SMON 是个兢兢业业的家伙，它负责完成一些列系统级别的任务。与

PMON(Process Monitor)后台进程不同的是，SMON 负责完成更多和整体系统相关的工作，这

导致它会去做一些不知名的”累活”，当系统频繁产生这些”垃圾任务”，则 SMON 可能忙不过

来。因此在 10g 中 SMON 变得有一点懒惰了，如果它在短期内接收到过多的工作通知(SMON:

system monitor process posted)，那么它可能选择消极怠工以便让自己不要过于繁忙(SMON:

Posted too frequently, trans recovery disabled)，之后会详细介绍。

了解你所不知道的 SMON 功能(一):清理临时段

触发场景

很多人错误地理解了这里所说的临时段 temporary segments，认为 temporary segments 是指

temporary tablespace 临时表空间上的排序临时段(sort segment)。事实上这里的临时段主要

指的是永久表空间(permanent tablespace)上的临时段，当然临时表空间上的 temporary

segments 也是由 SMON 来清理(cleanup)的，但这种清理仅发生在数据库实例启动时(instance

startup)。

永久表空间上同样存在临时段，譬如当我们在某个永久表空间上使用 create table/index 等

DDL 命令创建某个表/索引时，服务进程一开始会在指定的永久表空间上分配足够多的区间

(Extents)，这些区间在命令结束之前都是临时的(Temporary Extents)，直到表/索引完全建成

才将该temporary segment转换为permanent segment。另外当使用drop命令删除某个段时，

也会先将该段率先转换为 temporary segment，之后再来清理该 temporary segment(DROP

object converts the segment to temporary and then cleans up the temporary segment)。常规情

况下清理工作遵循谁创建 temporary segment，谁负责清理的原则。换句话说，因服务进程

rebuild index 所产生的 temporary segment 在 rebuild 完成后应由服务进程自行负责清理。一

旦服务进程在成功清理 temporary segment 之前就意外终止了，亦或者服务进程在工作过程

中遇到了某些 ORA-错误导致语句失败，那么 SMON 都会被要求(posted)负责完成 temporary

segment 的清理工作。

对于永久表空间上的 temporary segment，SMON 会三分钟清理一次(前提是接到 post)，如果

SMON 过于繁忙那么可能 temporary segment 长期不被清理。temporary segment 长期不被清理可能造成一个典型的问题是:在 rebuild index online 失败后，后续执行的 rebuild index 命令

要求之前产生的 temporary segment 已被 cleanup，如果 cleanup 没有完成那么就需要一直等

下去。在 10gR2 中我们可以使用 dbms_repair.online_index_clean 来手动清理 online index

rebuild 的遗留问题:

The dbms_repair.online_index_clean function has been created to cleanup online index

rebuilds.

Use the dbms_repair.online_index_clean function to resolve the issue.

Please note if you are unable to run the dbms_repair.online_index_clean function it is due

to the fact

that you have not installed the patch for Bug 3805539 or are not running on a release that

includes this fix.

The fix for this bug is a new function in the dbms_repair package called

dbms_repair.online_index_clean,

which has been created to cleanup online index [[sub]partition] [re]builds.

New functionality is not allowed in patchsets;

therefore, this is not available in a patchset but is available in 10gR2.

Check your patch list to verify the database is patched for Bug 3805539

using the following command and patch for the bug if it is not listed:

opatch lsinventory -detail

Cleanup after a failed online index [re]build can be slow to occurpreventing subsequent

such operations

until the cleanup has occured.

接着我们通过实践来看一下 smon 是如何清理永久表空间上的 temporary segment 的:

设置 10500 事件以跟踪 smon 进程，这个诊断事件后面会介绍

SQL> alter system set events '10500 trace name context forever,level 10';

System altered.

在第一个会话中执行 create table 命令，这将产生一定量的 Temorary ExtentsSQL> create table smon as select * from ymon;

在另一个会话中执行对 DBA_EXTENTS 视图的查询，可以发现产生了多少临时区间

SQL> SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';

COUNT(*)

----------

117

终止以上 create table 的 session，等待一段时间后观察 smon 后台进程的 trc 可以发现以下信息:

*** 2011-06-07 21:18:39.817

SMON: system monitor process posted msgflag:0x0200 (-/-/-/-/TMPSDROP/-/-)

*** 2011-06-07 21:18:39.818

SMON: Posted, but not for trans recovery, so skip it.

*** 2011-06-07 21:18:39.818

SMON: clean up temp segments in slave

SQL> SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';

COUNT(*)

----------

可以看到 smon 通过 slave 进程完成了对 temporary segment 的清理

与永久表空间上的临时段不同，出于性能的考虑临时表空间上的 Extents 并不在操作

(operations)完成后立即被释放和归还。相反，这些 Temporary Extents 会被标记为可用，以

便用于下一次的排序操作。SMON 仍会清理这些 Temporary segments，但这种清理仅发生在

实例启动时(instance startup):

For performance issues, extents in TEMPORARY tablespaces are not released ordeallocatedonce the operation is complete.Instead, the extent is simply marked as available for the

next sort operation.

SMON cleans up the segments at startup.

A sort segment is created by the first statement that used a TEMPORARY tablespacefor sorting,

after startup.

A sort segment created in a TEMPOARY tablespace is only released at shutdown.

The large number of EXTENTS is caused when the STORAGE clause has been incorrectly calculated.

现象

可以通过以下查询了解数据库中 Temporary Extent 的总数，在一定时间内比较其总数，若有

所减少那么说明 SMON 正在清理 Temporary segment

SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';

也可以通过 v$sysstat 视图中的”SMON posted for dropping temp segment”事件统计信息来了

解 SMON 收到清理要求的情况:

SQL> select name,value from v$sysstat where name like '%SMON%';

NAME VALUE

---------------------------------------------------------------- ----------

total number of times SMON posted 8

SMON posted for undo segment recovery 0

SMON posted for txn recovery for other instances 0

SMON posted for instance recovery 0

SMON posted for undo segment shrink 0

SMON posted for dropping temp segment 1另外在清理过程中 SMON 会长期持有 Space Transacton(ST)队列锁，其他会话可能因为得不

到 ST 锁而等待超时出现 ORA-01575 错误:

01575, 00000, "timeout waiting for space management resource"

// *Cause: failed to acquire necessary resource to do space management.

// *Action: Retry the operation.

如何禁止 SMON 清理临时段

可以通过设置诊断事件 event=’10061 trace name context forever, level 10′禁用 SMON 清理临

时段(disable SMON from cleaning temp segments)。

alter system set events '10061 trace name context forever, level 10';

相关诊断事件

除去 10061 事件外还可以用 10500 事件来跟踪 smon 的 post 信息，具体的事件设置方法见

了解你所不知道的 SMON 功能(二):合并空闲区间

SMON 的作用还包括合并空闲区间(coalesces free extent)

触发场景

早期 Oracle 采用 DMT 字典管理表空间，不同于今时今日的 LMT 本地管理方式，DMT 下通过

对FET$和UET$2张字典基表的递归操作来管理区间。SMON每5分钟(SMON wakes itself every 5 minutes and checks for tablespaces with default pctincrease != 0)会自发地去检查哪些默认存

储参数 pctincrease 不等于 0 的字典管理表空间，注意这种清理工作是针对 DMT 的，而 LMT

则无需合并。SMON 对这些 DMT 表空间上的连续相邻的空闲 Extents 实施 coalesce 操作以合

并成一个更大的空闲 Extent，这同时也意味着 SMON 需要维护 FET$字典基表。

现象

以下查询可以检查数据库中空闲 Extents 的总数，如果这个总数在持续减少那么说明 SMON

正在 coalesce free space：

SELECT COUNT(*) FROM DBA_FREE_SPACE;

在合并区间时 SMON 需要排他地(exclusive)持有 ST(Space Transaction)队列锁，其他会话可能

因为得不到 ST 锁而等待超时出现 ORA-01575 错误。同时 SMON 可能在繁琐的 coalesce 操作

中消耗 100%的 CPU。

如何禁止 SMON 合并空闲区间

可以通过设置诊断事件event=’10269 trace name context forever, level 10′来禁用SMON合并空

闲区间(Don’t do coalesces of free space in SMON)

10269, 00000, "Don't do coalesces of free space in SMON"

// *Cause: setting this event prevents SMON from doing free space coalesces

alter system set events '10269 trace name context forever, level 10';了解你所不知道的 SMON 功能(三):清理 obj$基表

SMON 的作用还包括清理 obj$数据字典基表(cleanup obj$)

OBJ$字典基表是 Oracle Bootstarp 启动自举的重要对象之一:

SQL> set linesize 80 ;

SQL> select sql_text from bootstrap$ where sql_text like 'CREATE TABLE OBJ$%';

SQL_TEXT

--------------------------------------------------------------------------------

CREATE TABLE OBJ$("OBJ#" NUMBER NOT NULL,"DATAOBJ#" NUMBER,"OWNER#" NUMBER NOT N

ULL,"NAME" VARCHAR2(30) NOT NULL,"NAMESPACE" NUMBER NOT NULL,"SUBNAME" VARCHAR2(

30),"TYPE#" NUMBER NOT NULL,"CTIME" DATE NOT NULL,"MTIME" DATE NOT NULL,"STIME"

DATE NOT NULL,"STATUS" NUMBER NOT NULL,"REMOTEOWNER" VARCHAR2(30),"LINKNAME" VAR

CHAR2(128),"FLAGS" NUMBER,"OID$" RAW(16),"SPARE1" NUMBER,"SPARE2" NUMBER,"SPARE3

" NUMBER,"SPARE4" VARCHAR2(1000),"SPARE5" VARCHAR2(1000),"SPARE6" DATE) PCTFREE

10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE ( INITIAL 16K NEXT 1024K MINEXTEN

TS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJNO 18 EXTENTS (FILE 1 BLOCK 121))

触发场景

OBJ$基表是一张低级数据字典表，该表几乎对库中的每个对象(表、索引、包、视图等)都包

含有一行记录。很多情况下，这些条目所代表的对象是不存在的对象(non-existent)，引起这

种现象的一种可能的原因是对象本身已经被从数据库中删除了，但是对象条目仍被保留下来

以满足消极依赖机制(negative dependency)。因为这些条目的存在会导致 OBJ$表不断膨胀，

这时就需要由 SMON 进程来删除这些不再需要的行。SMON 会在实例启动(after startup of DB

is started cleanup function again)时以及启动后的每 12 个小时执行一次清理任务(the cleanup

is scheduled to run after startup and then every 12 hours)。

我们可以通过以下演示来了解 SMON 清理 obj$的过程:

SQL> BEGIN

2 FOR i IN 1 .. 5000 LOOP

3 execute immediate ('create synonym gustav' || i || ' for 4 perfstat.sometable');

5 execute immediate ('drop synonym gustav' || i );

6 END LOOP;

7 END;

8 /

PL/SQL procedure successfully completed.

SQL> startup force;

ORACLE instance started.

Total System Global Area 1065353216 bytes

Fixed Size 2089336 bytes

Variable Size 486542984 bytes

Database Buffers 570425344 bytes

Redo Buffers 6295552 bytes

Database mounted.

Database opened.

SQL> select count(*) from user$ u, obj$ o

2 where u.user# (+)=o.owner# and o.type#=10 and not exists

3 (select p_obj# from dependency$ where p_obj# = o.obj#);

COUNT(*)

----------

5000

SQL> /

COUNT(*)

----------

5000

SQL> /

COUNT(*)

---------- 4951

SQL> oradebug setospid 18457;

Oracle pid: 8, Unix process pid: 18457, image: oracle@rh2.oracle.com (SMON)

SQL> oradebug event 10046 trace name context forever ,level 1;

Statement processed.

SQL> oradebug tracefile_name;

/s01/admin/G10R2/bdump/g10r2_smon_18457.trc

select o.owner#,

o.obj#,

decode(o.linkname,

null,

decode(u.name, null, 'SYS', u.name),

o.remoteowner),

o.name,

o.linkname,

o.namespace,

o.subname

from user$ u, obj$ o

where u.use r#(+) = o.owner#

and o.type# = :1

and not exists

(select p_obj# from dependency$ where p_obj# = o.obj#)

order by o.obj#

for update

select null

from obj$

where obj# = :1

and type# = :2

and obj# not in

(select p_obj# from dependency$ where p_obj# = obj$.obj#)

delete from obj$ where obj# = :1/* 删除过程其实较为复杂，可能要删除多个字典基表上的记录 */

现象

我们可以通过以下查询来了解 obj$基表中 NON-EXISTENT 对象的条目总数(type#=10)，若这

个总数在不断减少说明 smon 正在执行清理工作

select trunc(mtime), substr(name, 1, 3) name, count(*)

from obj$

where type# = 10

and not exists (select * from dependency$ where obj# = p_obj#)

group by trunc(mtime), substr(name, 1, 3);

select count(*)

from user$ u, obj$ o

where u.user#(+) = o.owner#

and o.type# = 10

and not exists (select p_obj# from dependency$ where p_obj# = o.obj#);

如何禁止 SMON 清理 obj$基表

我们可以通过设置诊断事件 event=’10052 trace name context forever’来禁止 SMON 清理

obj$基表，当我们需要避免 SMON 因 cleanup obj$的相关代码而意外终止或 spin 从而开展进

一步的诊断时可以设置该诊断事件。在 Oracle 并行服务器或 RAC 环境中，也可以设置该事

件来保证只有特定的某个节点来执行清理工作。

10052, 00000, "don't clean up obj$"

alter system set events '10052 trace name context

forever, level 65535';

Problem Description: We are receiving the below warning during db startup:

WARNING: kqlclo() has detected the following :

Non-existent object 37336 NOT deleted because an object

of the same name exists already.

Object name: PUBLIC.USER$

This is caused by the SMON trying to cleanup the SYS.OJB$.

SMON cleans all dropped objects which have a SYS.OBJ$.TYPE#=10.

This can happen very often when you create an object that have the same name as a public

synonym.

When SMON is trying to remove non-existent objects and fails because there are duplicates,

multiple nonexistent objects with same name.

This query will returned many objects with same name under SYS schema:

select o.name,u.user# from user$ u, obj$ o where u.user# (+)=o.owner# and o.type#=10

and not exists (select p_obj# from dependency$ where p_obj# = o.obj#);

To cleanup this message:Take a full backup of the database - this is crucial. If anything goes wrong during this

procedure,

your only option would be to restore from backup, so make sure you have a good backup before

proceeding.

We suggest a COLD backup. If you plan to use a HOT backup, you will have to restore point

in time if any problem happens

Normally DML against dictionary objects is unsupported,

but in this case we know exactly what the type of corruption,

also you are instructing to do this under guidance from Support.

Data dictionary patching must be done by an experienced DBA.

This solution is unsupported.

It means that if there were problems after applying this solution, a database backup must

be restored.

1. Set event 10052 at parameter file to disable cleanup of OBJ$ by SMON

EVENT="10052 trace name context forever, level 65535"

2. Startup database in restricted mode

3. Delete from OBJ$, COMMIT

SQL> delete from obj$ where (name,owner#) in ( select o.name,u.user# from user$ u, obj$ o

where u.user# (+)=o.owner# and o.type#=10 and not exists (select p_obj# from

dependency$ where p_obj# = o.obj#) );

SQL> commit;

SQL> Shutdown abort.

4. remove event 10052 from init.ora

5. Restart the database and monitor for the message in the ALERT LOG file了解你所不知道的 SMON 功能(四):维护 col_usage$字典基

表

SMON 的作用还包括维护 col_usage$列监控统计信息基表。

最早在 9i 中引入了 col_usage$字典基表，其目的在于监控 column 在 SQL 语句作为 predicate 的情况，

col_usage$的出现完善了 CBO 中柱状图自动收集的机制。

create table col_usage$

(

obj# number, /* object number */

intcol# number, /* internal column number */

equality_preds number, /* equality predicates */

equijoin_preds number, /* equijoin predicates */

nonequijoin_preds number, /* nonequijoin predicates */

range_preds number, /* range predicates */

like_preds number, /* (not) like predicates */

null_preds number, /* (not) null predicates */

timestamp date /* timestamp of last time this row was changed */

)

storage (initial 200K next 100k maxextents unlimited pctincrease 0)

create unique index i_col_usage$ on col_usage$(obj#,intcol#)

storage (maxextents unlimited)

在 10g 中我们默认使用’FOR ALL COLUMNS SIZE AUTO’的柱状图收集模式，而在 9i 中默认是’SIZE 1′

即默认不收集柱状图，这导致许多 9i 中正常运行的应用程序在 10g 中 CBO 执行计划异常，详见

<dbms_stats span="" <="">收集模式在 9i 和 10g 上的区别>;。’SIZE AUTO’意为由 Oracle 自动决定是否

收集柱状图及柱状图的桶数，Oracle 自行判断的依据就来源于 col_usage$字典基表，若表上的某一列曾在

硬解析(hard parse)过的 SQL 语句中充当过 predicate(通俗的说就是 where 后的 condition)的话，我们认为

此列上有收集柱状图的必要，那么 col_usage$上就会被加入该列曾充当 predicate 的记录。当

DBMS_STATS.GATHER_TABLE_STATS 存储过程以’SIZE AUTO’模式执行时，收集进程会检查

col_usage$基表以判断哪些列之前曾充当过 predicate，若充当过则说明该列有收集柱状图的价值。SMON 会每 15 分钟将 shared pool 中的 predicate columns 的数据刷新到 col_usage$基表中(until

periodically about every 15 minutes SMON flush the data into the data dictionary)，另外当 instance

shutdown 时 SMON 会扫描 col_usage$并找出已被 drop 表的相关 predicate columns 记录，并删除这部

分”orphaned”孤儿记录。

我们来具体了解 col_usage$的填充过程:

SQL> select * from v$version;

BANNER

----------------------------------------------------------------

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi

PL/SQL Release 10.2.0.4.0 - Production

CORE 10.2.0.4.0 Production

TNS for Linux: Version 10.2.0.4.0 - Production

NLSRTL Version 10.2.0.4.0 - Production

SQL> select * from global_name;

GLOBAL_NAME

--------------------------------------------------------------------------------

www.oracle.com

SQL> create table maclean (t1 int);

Table created.

SQL> select object_id from dba_objects where object_name='MACLEAN';

OBJECT_ID

----------

1323013

SQL> select * from maclean where t1=1;

no rows selectedSQL> set linesize 200 pagesize 2000;

注意 col_usage$的数据同*_tab_modifications 类似，

从查询到数据刷新到 col_usage$存在一段时间的延迟，

所以我们立即查询 col_usage$将得不到任何记录，

可以手动执行 DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO 将缓存中的信息刷新到字典上

SQL> select * from col_usage$ where obj#=1323013;

no rows selected

SQL> oradebug setmypid;

Statement processed.

针对 FLUSH_DATABASE_MONITORING_INFO 填充操作做 10046 level 12 trace

SQL> oradebug event 10046 trace name context forever,level 12;

Statement processed.

SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

PL/SQL procedure successfully completed.

SQL> select * from col_usage$ where obj#=1323013;

OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS

LIKE_PREDS NULL_PREDS TIMESTAMP

---------- ---------- -------------- -------------- ----------------- -----------

---------- ---------- ---------

1323013 1 1 0 0 0 0

0 19-AUG-11

=============10046 trace content====================

lock table sys.col_usage$ in exclusive mode nowait在测试中可以发现 10.2.0.4 上 DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO 存储过程会优先使用

lock in exclusive mode nowait 来锁住 col_usage$基表，

如果 lock 失败则会反复尝试 1100 次，

若仍不能锁住 col_usage$表则放弃更新 col_usage$上的数据，避免造成锁等待和死锁。

Cksxm.c

Monitor Modification Hash Table Base

modification hash table entry

modification hash table chunk

monitoring column usage element

ksxmlock_1

lock table sys.col_usage$ in exclusive mode

lock table sys.col_usage$ in exclusive mode nowait

update sys.col_usage$

set equality_preds = equality_preds +

decode(bitand(:flag, 1), 0, 0, 1),

equijoin_preds = equijoin_preds +

decode(bitand(:flag, 2), 0, 0, 1),

nonequijoin_preds = nonequijoin_preds +

decode(bitand(:flag, 4), 0, 0, 1),

range_preds = range_preds + decode(bitand(:flag, 8), 0, 0, 1),

like_preds = like_preds + decode(bitand(:flag, 16), 0, 0, 1),

null_preds = null_preds + decode(bitand(:flag, 32), 0, 0, 1),

timestamp = :time

where obj# = :ob jn

and intcol# = :coln

insert into sys.col_usage$

(obj#,

intcol#,

equality_preds,

equijoin_preds,

nonequijoin_preds,

range_preds,

like_preds, null_preds,

timestamp)

values

(:objn,

:coln,

decode(bitand(:flag, 1), 0, 0, 1),

decode(bitand(:flag, 2), 0, 0, 1),

decode(bitand(:flag, 4), 0, 0, 1),

decode(bitand(:flag, 8), 0, 0, 1),

decode(bitand(:flag, 16), 0, 0, 1),

decode(bitand(:flag, 32), 0, 0, 1),

:time)

使用 dbms_stats 的’SIZE AUTO’模式收集表上的统计信息会首先参考 col_usage$中的 predicate columns

记录:

SQL> begin

3 dbms_stats.gather_table_stats(ownname => 'SYS',

4 tabname => 'MACLEAN',

5 method_opt => 'FOR ALL COLUMNS SIZE AUTO');

6 end;

7 /

PL/SQL procedure successfully completed.

============10046 level 12 trace content======================

SELECT /*+ ordered use_nl(o c cu h) index(u i_user1) index(o i_obj2)

index(ci_obj#) index(cu i_col_usage$)

index(h i_hh_obj#_intcol#) */

C.NAME COL_NAME,

C.TYPE# COL_TYPE,

C.CHARSETFORM COL_CSF,

C.DEFAULT$ COL_DEF,C.NULL$ COL_NULL,

C.PROPERTY COL_PROP,

C.COL # COL_UNUM,

C.INTCOL# COL_INUM,

C.OBJ# COL_OBJ,

C.SCALE COL_SCALE,

H.BUCKET_CNT H_BCNT,

(T.ROWCNT - H.NULL_CNT) / GREATEST(H.DISTCNT, 1) H_PFREQ,

C.LENGTH COL_LEN,

CU.TIMES TAMP CU_TIME,

CU.EQUALITY_PREDS CU_EP,

CU.EQUIJOIN_PREDS CU_EJP,

CU.RANGE_PREDS CU_RP,

CU.LIKE_PREDS CU_LP,

CU.NONEQUIJOIN_PREDS CU_NEJP,

CU.NULL_PREDS NP

FROM SYS.USE R$ U,

SYS.OBJ$ O,

SYS.TAB$ T,

SYS.COL$ C,

SYS.COL_USAGE$ CU,

SYS.HIST_HEAD$ H

WHERE :B3 = '0'

AND U.NAME = :B2

AND O.OWNER# = U.USER#

AND O.TYPE# = 2

AND O.NAME = :B1

AND O.OBJ# = T.OBJ#

AND O.OBJ# = C.OBJ#

AND C.OBJ# = CU.OBJ#(+)

AND C.INTCOL# = CU.INTCOL#(+)

AND C.OBJ# = H.OBJ#(+)

AND C.INTCOL# = H.INTCOL#(+)

UNION ALL

SELECT /*+

ordered use_nl(c) */

C.KQFCONAM COL_NAME,C.KQFCODTY COL_TYPE,

DECODE(C.KQFCODTY, 1, 1, 0) COL_CSF,

NULL COL_DEF,

0 COL_NULL,

0 COL_PROP,

C.KQFCOCNO COL_UNUM,

C.KQFCOC NO COL_INUM,

O.KQFTAOBJ COL_OBJ,

DECODE(C.KQFCODTY, 2, -127, 0) COL_SCALE,

H.BUCKET_CNT H_BCNT,

(ST.ROWCNT - NULL_CNT) / GREATEST(H.DISTCNT, 1) H_PFREQ,

DECODE(C.KQFCODTY, 2, 22, C.KQFCOSIZ) COL_LEN,

CU.TIMESTAMP CU_TIME,

CU.EQUALITY_PREDS CU_EP,

CU.EQUIJOIN_PREDS CU_EJP,

CU.RANGE_PREDS CU_RP,

CU.LIKE_PREDS CU_LP,

CU.NONEQUIJOIN_PREDS CU _NEJP,

CU.NULL_PREDS NP

FROM SYS.X$KQFTA O,

SYS.TAB_STATS$ ST,

SYS.X$KQFCO C,

SYS.COL_USAGE$ CU,

SYS.HIST_HEAD$ H

WHERE :B3 != '0'

AND :B2 = 'SYS'

AND O.KQFTANAM = :B1

AND O.KQFTAOBJ = ST.OBJ#(+)

AND O.KQFTAOBJ = C.KQFCOTOB

AND C.KQFCOTOB = CU.OBJ#(+)

AND C.KQFCOCNO = CU.INTCOL#(+)

AND C.KQFCOTOB = H.OBJ#(+)

AND C.KQFCOCNO = H.INTCO L#(+)

现象根据 Metalink Note

332177.1]>:

Database Shutdown Immediate Takes Forever, Can Only Do Shutdown Abort [ID 332177.1]

Applies to:

Oracle Server - Enterprise Edition - Version: 9.2.0.4.0

This problem can occur on any platform.

Symptoms

The database is not shutting down for a considerable time when you issue the command :

shutdown immediate

To shut it down in a reasonable time you have to issue the command

shutdown abort

To collect some diagnostics before issuing the shutdown immediate command set a trace event

as follows:

Connect as SYS (/ as sysdba)

SQL> alter session set events '10046 trace name context forever,level 12';

SQL> shutdown immediate;

In the resultant trace file (within the udump directory) you see something similar to the

following :-

PARSING IN CURSOR #n

delete from sys.col_usage$ c where not exists (select 1 from sys.obj$ o where o.obj# =

c.obj# )

...followed by loads of.....

WAIT #2: nam='db file sequential read' ela= 23424 p1=1 p2=4073 p3=1

....

WAIT #2: nam='db file scattered read' ela= 1558 p1=1 p2=44161 p3=8etc

Then eventually

WAIT #2: nam='log file sync' ela= 32535 p1=4111 p2=0 p3=0

...some other SQL....then back to

WAIT #2: nam='db file sequential read' ela= 205 p1=1 p2=107925 p3=1

WAIT #2: nam='db file sequential read' ela= 1212 p1=1 p2=107926 p3=1

WAIT #2: nam='db file sequential read' ela= 212 p1=1 p2=107927 p3=1

WAIT #2: nam='db file scattered read' ela= 1861 p1=1 p2=102625 p3=8

etc....

To verify which objects are involved here you can use a couple of the P1 & P2 values from

above

a) a sequential read

SELECT owner,segment_name,segment_type

FROM dba_extents

WHERE file_id=1

AND 107927 BETWEEN block_id AND block_id + blocks

b) a scattered read

SELECT owner,segment_name,segment_type

FROM dba_extents

WHERE file_id=1

AND 102625 BETWEEN block_id AND block_id + blocks

The output confirms that the objects are

SYS.I_COL_USAGE$ (INDEX) and SYS.COL_USAGE$ (TABLE)

Finally, issue select count(*) from sys.col_usage$;

CauseIf the number of entries in sys.col_usage$ is large then you are very probably hitting the

issue raised in

Bug: 3540022 9.2.0.4.0 RDBMS Base Bug 3221945

Abstract: CLEAN-UP OF ENTRIES IN COL_USAGE$

Base Bug 3221945 9.2.0.3 RDBMS

Abstract: ORA-1631 ON COL_USAGE$

Closed as "Not a Bug"

However, when a table is dropped, the column usage statistics are not dropped. They are

left as they are.

When the database is shutdown (in normal mode), then these "orphaned" column usage entries

are deleted. The code

which does this gets called only during normal shutdown.

Unless and until the database is shutdown, the col_usage$ table will continue to grow.

Solution

To implement the workaround, please execute the following steps:

1. Periodically (eg once a day) run exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO will clean out redundant col_usage$ entries, and

when

you come to shutdown the database you should not have a huge number of entries left to clean

up.

该文档指出了在 shutdown instance 时 SMON 会着手清理 col_usage$中已被 drop 表的相关 predicate

columns 的”orphaned”记录，如果在本次实例的生命周期中曾生成大量最后被 drop 的中间表，那么

col_usage$中已经堆积了众多的”orphaned”记录，SMON 为了完成 cleanup 工作需要花费大量时间导致

shutdown 变慢。这个文档还指出定期执行 DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO 也

可以清理 col_usage$中的冗余记录。

我们来观察一下 SMON 的清理工作:begin

for i in 1 .. 5000 loop

execute immediate 'create table maclean1' || i ||' tablespace fragment as select 1 t1

from dual';

execute immediate 'select * from maclean1' || i || ' where t1=1';

end loop;

DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

for i in 1 .. 5000 loop

execute immediate 'drop table maclean1' || i;

end loop;

end;

SQL> purge dba_recyclebin;

DBA Recyclebin purged.

我们可以通过以下查询了解 col_usage$上的 orphaned 记录总数，这也将是在 instance shutdown 时

SMON 所需要清理的数目

select count(*)

from sys.col_usage$ c

where not exists (select /*+ unnest */

from sys.obj$ o

where o.obj# = c.obj#);

COUNT(*)

----------

10224

针对 SMON 做 10046 level 12 trace

SQL> oradebug setospid 30225;

Oracle pid: 8, Unix process pid: 30225, image: oracle@rh2.oracle.com (SMON)SQL> oradebug event 10046 trace name context forever,level 12;

Statement processed.

SQL> shutdown immediate;

=================10046 trace content==================

lock table sys.col_usage$ in exclusive mode nowait

delete from sys.col_usage$ where obj#= :1 and intcol#= :2

delete from sys.col_usage$ c

where not exists (select /*+ unnest */

from sys.obj$ o

where o.obj# = c.obj#)

如何禁止 SMON 维护 col_usage$字典基表

1.设置隐藏参数_column_tracking_level(column usage tracking)，该参数默认为 1 即启用 column 使用情

况跟踪。设置该参数为 0，将禁用 column tracking，该参数可以在 session 和 system 级别动态修改:

SQL> col name for a25

SQL> col DESCRIB for a25

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ

2 FROM SYS.x$ksppi x, SYS.x$ksppcv y

3 WHERE x.inst_id = USERENV ('Instance')

4 AND y.inst_id = USERENV ('Instance')

5 AND x.indx = y.indx

6 AND x.ksppinm LIKE '%_column_tracking_level%';

NAME VALUE DESCRIB

------------------------- ---------- -------------------------

_column_tracking_level 1 column usage trackingSQL> alter session set "_column_tracking_level"=0 ;

Session altered.

SQL> alter system set "_column_tracking_level"=0 scope=both;

System altered.

2.关闭 DML monitoring，可以通过设置隐藏参数_dml_monitoring_enabled(enable modification monitoring)

为 false 实现，disable dml monitoring 对 CBO 的影响较大，所以我们一般推荐上一种方式:

SQL> SELECT monitoring, count(*) from DBA_TABLES group by monitoring;

MON COUNT(*)

--- ----------

NO 79

YES 2206

SQL> alter system set "_dml_monitoring_enabled"=false;

System altered.

SQL> SELECT monitoring, count(*) from DBA_TABLES group by monitoring;

MON COUNT(*)

--- ----------

NO 2285

实际上 dba_tables 的 monitoring 列来源于内部参数_dml_monitoring_enabled

SQL> set long 99999

SQL> select text from dba_views where view_name='DBA_TABLES';

TEXT

--------------------------------------------------------------------------------select u.name, o.name, decode(bitand(t.property,2151678048), 0, ts.name, null),

decode(bitand(t.property, 1024), 0, null, co.name),

decode((bitand(t.property, 512)+bitand(t.flags, 536870912)),

0, null, co.name),

decode(bitand(t.trigflag, 1073741824), 1073741824, 'UNUSABLE', 'VALID'),

decode(bitand(t.property, 32+64), 0, mod(t.pctfree$, 100), 64, 0, null),

decode(bitand(ts.flags, 32), 32, to_number(NULL),

decode(bitand(t.property, 32+64), 0, t.pctused$, 64, 0, null)),

decode(bitand(t.property, 32), 0, t.initrans, null),

decode(bitand(t.property, 32), 0, t.maxtrans, null),

s.iniexts * ts.blocksize,

decode(bitand(ts.flags, 3), 1, to_number(NULL),

s.extsize * ts.blocksize),

s.minexts, s.maxexts,

decode(bitand(ts.flags, 3), 1, to_number(NULL),

s.extpct),

decode(bitand(ts.flags, 32), 32, to_number(NULL),

decode(bitand(o.flags, 2), 2, 1, decode(s.lists, 0, 1, s.lists))),

decode(bitand(ts.flags, 32), 32, to_number(NULL),

decode(bitand(o.flags, 2), 2, 1, decode(s.groups, 0, 1, s.groups))),

decode(bitand(t.property, 32+64), 0,

decode(bitand(t.flags, 32), 0, 'YES', 'NO'), null),

decode(bitand(t.flags,1), 0, 'Y', 1, 'N', '?'),

t.rowcnt,

decode(bitand(t.property, 64), 0, t.blkcnt, null),

decode(bitand(t.property, 64), 0, t.empcnt, null),

t.avgspc, t.chncnt, t.avgrln, t.avgspc_flb,

decode(bitand(t.property, 64), 0, t.flbcnt, null),

lpad(decode(t.degree, 32767, 'DEFAULT', nvl(t.degree,1)),10),

lpad(decode(t.instances, 32767, 'DEFAULT', nvl(t.instances,1)),10),

lpad(decode(bitand(t.flags, 8), 8, 'Y', 'N'),5),

decode(bitand(t.flags, 6), 0, 'ENABLED', 'DISABLED'),

t.samplesize, t.analyzetime,

decode(bitand(t.property, 32), 32, 'YES', 'NO'),

decode(bitand(t.property, 64), 64, 'IOT',

decode(bitand(t.property, 512), 512, 'IOT_OVERFLOW',

decode(bitand(t.flags, 536870912), 536870912, 'IOT_MAPPING', null))),

decode(bitand(o.flags, 2), 0, 'N', 2, 'Y', 'N'),

decode(bitand(o.flags, 16), 0, 'N', 16, 'Y', 'N'),

decode(bitand(t.property, 8192), 8192, 'YES',

decode(bitand(t.property, 1), 0, 'NO', 'YES')),

decode(bitand(o.flags, 2), 2, 'DEFAULT',

decode(s.cachehint, 0, 'DEFAULT', 1, 'KEEP', 2, 'RECYCLE', NULL)),

decode(bitand(t.flags, 131072), 131072, 'ENABLED', 'DISABLED'),

decode(bitand(t.flags, 512), 0, 'NO', 'YES'),

decode(bitand(t.flags, 256), 0, 'NO', 'YES'),

decode(bitand(o.flags, 2), 0, NULL,

decode(bitand(t.property, 8388608), 8388608,

'SYS$SESSION', 'SYS$TRANSACTION')),

decode(bitand(t.flags, 1024), 1024, 'ENABLED', 'DISABLED'),

decode(bitand(o.flags, 2), 2, 'NO',

decode(bitand(t.property, 2147483648), 2147483648, 'NO',

decode(ksppcv.ksppstvl, 'TRUE', 'YES', 'NO'))),

decode(bitand(t.property, 1024), 0, null, cu.name),

decode(bitand(t.flags, 8388608), 8388608, 'ENABLED', 'DISABLED'),

decode(bitand(t.property, 32), 32, null,

decode(bitand(s.spare1, 2048), 2048, 'ENABLED', 'DISABLED')),

decode(bitand(o.flags, 128), 128, 'YES', 'NO')

from sys.user$ u, sys.ts$ ts, sys.seg$ s, sys.obj$ co, sys.tab$ t, sys.obj$ o,

sys.obj$ cx, sys.user$ cu, x$ksppcv ksppcv, x$ksppi ksppi

where o.owner# = u.user#

and o.obj# = t.obj#

and bitand(t.property, 1) = 0

and bitand(o.flags, 128) = 0

and t.bobj# = co.obj# (+)

and t.ts# = ts.ts#

and t.file# = s.file# (+)

and t.block# = s.block# (+)

and t.ts# = s.ts# (+)

and t.dataobj# = cx.obj# (+)

and cx.owner# = cu.user# (+)

and ksppi.indx = ksppcv.indx and ksppi.ksppinm = '_dml_monitoring_enabled'

了解你所不知道的 SMON 功能 ( 五 ):Recover Dead

transaction

SMON 的作用还包括清理死事务:Recover Dead transaction。当服务进程在提交事务(commit)

前就意外终止的话会形成死事务(dead transaction)，PMON 进程负责轮询 Oracle 进程，找出

这类意外终止的死进程(dead process)，通知 SMON 将与该 dead process 相关的 dead

transaction 回滚清理，并且 PMON 还负责恢复 dead process 原本持有的锁和 latch。

我们来具体了解 dead transaction 的恢复过程:

SQL> select * from v$version;

BANNER

----------------------------------------------------------------

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi

PL/SQL Release 10.2.0.4.0 - Production

CORE 10.2.0.4.0 Production

TNS for Linux: Version 10.2.0.4.0 - Production

NLSRTL Version 10.2.0.4.0 - Production

SQL> select * from global_name;

GLOBAL_NAME

--------------------------------------------------------------------------------

www.oracle.com

SQL>alter system set fast_start_parallel_rollback=false;

System altered.设置 10500，10046 事件以跟踪 SMON 进程的行为

SQL> alter system set events '10500 trace name context forever,level 8';

System altered.

SQL> oradebug setospid 4424

Oracle pid: 8, Unix process pid: 4424, image: oracle@rh2.oracle.com (SMON)

SQL> oradebug event 10046 trace name context forever,level 8;

Statement processed.

在一个新的 terminal 中执行大批量的删除语句，在执行一段时间后使用操作系统命令将执行该删除操作的

服务进程 kill 掉，模拟一个大的 dead transaction 的场景

SQL> delete large_rb;

delete large_rb

[oracle@rh2 bdump]$ kill -9 4535

等待几秒后 pmon 进程会找出 dead process:

[claim lock for dead process][lp 0x7000003c70ceff0][p 0x7000003ca63dad8.1290666][hist

x9a514951]

在 x$ktube 内部视图中出现 ktuxecfl(Transaction flags)标记为 DEAD 的记录:

SQL> select sum(distinct(ktuxesiz)) from x$ktuxe where ktuxecfl = 'DEAD';

SUM(DISTINCT(KTUXESIZ))

-----------------------

29386

SQL> /

SUM(DISTINCT(KTUXESIZ))

-----------------------

28816以上 KTUXESIZ 代表事务所使用的 undo 块总数(number of undo blocks used by the transaction)

==================smon trace content==================

SMON: system monitor process posted

WAIT #0: nam='log file switch completion' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243332801935

WAIT #0: nam='log file switch completion' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243332815568

WAIT #0: nam='latch: row cache objects' ela= 95 address=2979418792 number=200 tries=1 obj#=1

tim=1278243333332734

WAIT #0: nam='latch: row cache objects' ela= 83 address=2979418792 number=200 tries=1 obj#=1

tim=1278243333356173

WAIT #0: nam='latch: undo global data' ela= 104 address=3066991984 number=187 tries=1 obj#=1

tim=1278243347987705

WAIT #0: nam='latch: object queue header operation' ela= 89 address=3094817048 number=131

tries=0 obj#=1 tim=1278243362468042

WAIT #0: nam='log file switch (checkpoint incomplete)' ela= 0 p1=0 p2=0 p3=0 obj#=1

tim=1278243419588202

Dead transaction 0x00c2.008.0000006d recovered by SMON

=====================

PARSING IN CURSOR #3 len=358 dep=1 uid=0 oct=3 lid=0 tim=1278243423594568 hv=3186851936

ad='ae82c1b8'

select smontabv.cnt,

smontab.time_mp,

smontab.scn,

smontab.num_mappings,

smontab.tim_scn_map,

smontab.orig_thread

from smon_scn_time smontab,

(select max(scn) scnmax,

count(*) + sum(NVL2(TIM_SCN_MAP, NUM_MAPPINGS, 0)) cnt

from smon_scn_time

where thread = 0) smontabv

where smontab.scn = smontabv.scnmax

and thread = 0

END OF STMT

PARSE #3:c=0,e=1354526,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1278243423594556

EXEC #3:c=0,e=106,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1278243423603269FETCH #3:c=0,e=47065,p=0,cr=319,cu=0,mis=0,r=1,dep=1,og=4,tim=1278243423650375

*** 2011-06-24 21:19:25.899

WAIT #0: nam='smon timer' ela= 299999999 sleep time=300 failed=0 p3=0 obj#=1

tim=1278243716699171

kglScanDependencyHandles4Unpin():

cumscan=3 cumupin=4 time=776 upinned=0

以上 SMON 回滚清理 Dead transaction 的过程从”system monitor process posted”开始到”Dead

transaction 0x00c2.008.0000006d recovered by SMON”结束。另外可以看到在恢复过程中

SMON 先后请求了’latch: row cache objects’、’latch: undo global data’、’latch: object queue

header operation’三种不同类型的 latch。

现象

fast_start_parallel_rollback 参数决定了 SMON 在回滚事务时使用的并行度，若将该参数设置

为 false 那么并行回滚将被禁用，若设置为 Low(默认值)那么会以 2*CPU_COUNT 数目的并行

度回滚，当设置为 High 则 4*CPU_COUNT 数目的回滚进程将参与进来。当我们通过以下查询

发现系统中存在大的 dead tranacation 需要回滚时我们可以通过设置

fast_start_parallel_rollback 为 HIGH 来加速恢复:

select sum(distinct(ktuxesiz)) from x$ktuxe where ktuxecfl = 'DEAD';

==============parallel transaction recovery===============

*** 2011-06-24 20:31:01.765

SMON: system monitor process posted msgflag:0x0000 (-/-/-/-/-/-/-)

*** 2011-06-24 20:31:01.765

SMON: process sort segment requests begin

*** 2011-06-24 20:31:01.765

SMON: process sort segment requests end

*** 2011-06-24 20:31:01.765SMON: parallel transaction recovery begin

WAIT #0: nam='DFS lock handle' ela= 504 type|mode=1413545989 id1=3 id2=11 obj#=2

tim=1308918661765715

WAIT #0: nam='DFS lock handle' ela= 346 type|mode=1413545989 id1=3 id2=12 obj#=2

tim=1308918661766135

WAIT #0: nam='DFS lock handle' ela= 565 type|mode=1413545989 id1=3 id2=13 obj#=2

tim=1308918661766758

WAIT #0: nam='DFS lock handle' ela= 409 type|mode=1413545989 id1=3 id2=14 obj#=2

tim=1308918661767221

WAIT #0: nam='DFS lock handle' ela= 332 type|mode=1413545989 id1=3 id2=15 obj#=2

tim=1308918661767746

WAIT #0: nam='DFS lock handle' ela= 316 type|mode=1413545989 id1=3 id2=16 obj#=2

tim=1308918661768146

WAIT #0: nam='DFS lock handle' ela= 349 type|mode=1413545989 id1=3 id2=17 obj#=2

tim=1308918661768549

WAIT #0: nam='DFS lock handle' ela= 258 type|mode=1413545989 id1=3 id2=18 obj#=2

tim=1308918661768858

WAIT #0: nam='DFS lock handle' ela= 310 type|mode=1413545989 id1=3 id2=19 obj#=2

tim=1308918661769224

WAIT #0: nam='DFS lock handle' ela= 281 type|mode=1413545989 id1=3 id2=20 obj#=2

tim=1308918661769555

*** 2011-06-24 20:31:01.769

SMON: parallel transaction recovery end

但是在 real world 的实践中可以发现当 fast_start_parallel_rollback= Low/High，即启用并行回

滚时常有并行进程因为各种资源互相阻塞导致回滚工作停滞的例子，当遭遇到这种问题时将

fast_start_parallel_rollback 设置为 FALSE 一般可以保证恢复工作以串行形式在较长时间内完

成。

如何禁止 SMON Recover Dead transaction

可以设置 10513 事件来临时禁止 SMON 恢复死事务，这在我们做某些异常恢复的时候显得

异常有效，当然不建议在一个正常的生产环境中设置这个事件:

SQL> alter system set events '10513 trace name context forever, level 2';System altered.

10531 -- event disables transaction recovery which was initiated by SMON

SQL> select ktuxeusn,

2 to_char(sysdate, 'DD-MON-YYYY HH24:MI:SS') "Time",

3 ktuxesiz,

4 ktuxesta

5 from x$ktuxe

6 where ktuxecfl = 'DEAD';

KTUXEUSN Time KTUXESIZ KTUXESTA

---------- -------------------------- ---------- ----------------

17 24-JUN-2011 22:03:10 0 INACTIVE

66 24-JUN-2011 22:03:10 0 INACTIVE

105 24-JUN-2011 22:03:10 0 INACTIVE

193 24-JUN-2011 22:03:10 33361 ACTIVE

194 24-JUN-2011 22:03:10 0 INACTIVE

197 24-JUN-2011 22:03:10 20171 ACTIVE

7 rows selected.

SQL> /

KTUXEUSN Time KTUXESIZ KTUXESTA

---------- -------------------------- ---------- ----------------

17 24-JUN-2011 22:03:10 0 INACTIVE

66 24-JUN-2011 22:03:10 0 INACTIVE

105 24-JUN-2011 22:03:10 0 INACTIVE

193 24-JUN-2011 22:03:10 33361 ACTIVE

194 24-JUN-2011 22:03:10 0 INACTIVE

197 24-JUN-2011 22:03:10 20171 ACTIVE

7 rows selected.================smon disabled trans recover trace==================

SMON: system monitor process posted

*** 2011-06-24 22:02:57.980

SMON: Event 10513 is level 2, trans recovery disabled.

了解你所不知道的 SMON 功能(六):清理 IND$字典基表

SMON 的作用还包括清理 IND$字典基表(cleanup ind$):

触发场景

当我们在线创建或重建索引时(create or rebuild index online)，服务进程会到 IND$字典基表中

将该索引对应的记录的 FLAGS 字段修改为十进制的 256 或者 512( 见上图0×100=256,0×200=512)，如:

SQL> create index macleans_index on larges(owner,object_name) online;

SQL> select obj# from obj$ where name='MACLEANS_INDEX';

OBJ#

----------

1343842

SQL> select FLAGS from ind$ where obj#=1343842;

FLAGS

----------

256

ind_online$字典基表记录了索引在线创建/重建的历史

SQL> select * from ind_online$;

OBJ# TYPE# FLAGS

---------- ---------- ----------

1343839 1 256

1343842 1 256

create table ind_online$

( obj# number not null,

type# number not null, /* what kind of index is this? */

/* normal : 1 */

/* bitmap : 2 */

/* cluster : 3 */

/* iot - top : 4 */

/* iot - nested : 5 */

/* secondary : 6 */

/* ansi : 7 */

/* lob : 8 */ /* cooperative index method : 9 */

flags number not null

/* index is being online built : 0x100 */

/* index is being online rebuilt : 0x200 */

)

原则上 online create/rebuild index 的的清理工作由实际操作的服务进程负责完成，这种清理

在 DDL 语句成功的情况下包括一系列数据字典的维护，在该 DDL 语句失败的情形中包括对

临时段的清理和数据字典的维护，无论如何都需要 drop 在线日志中间

表 SYS_JOURNAL_nnnnn(nnnn 为该索引的 obj#)。数据字典的维护工作就包含对 IND$基

表中相应索引记录的 FLAGS 标志位的恢复，但是如果服务进程在语句执行过程中意外终止的

话，那么短时间内 FLAGS 标志位字段就无法得到恢复，这将导致对该索引的后续操作因

ORA-8104 错误而无法继续:

SQL> drop index macleans_index;

drop index macleans_index

ERROR at line 1:

ORA-08104: this index object 1343842 is being online built or rebuilt

08104, 00000, "this index object %s is being online built or rebuilt"

// *Cause: the index is being created or rebuild or waited for recovering

// from the online (re)build

// *Action: wait the online index build or recovery to complete

SMON 负责在启动后(startup)的每小时执行一次对 IND$基表中因在线创建/重建索引失败所

留下记录的清理，这种清理工作由 kdicclean 函数驱动(kdicclean is run by smon every 1 hour，

called from SMON to find if there is any online builder death and cleanup our ind$ and obj$ and

drop the journal table, stop journaling)。

这种清理工作典型的调用堆栈 stack call 如下:

ksbrdp -> ktmSmonMain ktmmon -> kdicclean -> kdic_cleanup -> ktssdrp_segment

注意因为 SMON 进程的清理工作每小时才执行一次，而且在工作负载很高的情况下可能实际很久都不会得到清理，在这种情景中我们总是希望能尽快完成对索引的在线创建或重建，

在 10gr2 以后的版本中我们可以直接使用 dbms_repair.online_index_clean 来手动清理 online

index rebuild 的遗留问题:

SQL> drop index macleans_index;

drop index macleans_index

ERROR at line 1:

ORA-08104: this index object 1343842 is being online built or rebuilt

DECLARE

isClean BOOLEAN;

BEGIN

isClean := FALSE;

WHILE isClean=FALSE

LOOP

isClean := dbms_repair.online_index_clean(

dbms_repair.all_index_id, dbms_repair.lock_wait);

dbms_lock.sleep(10);

END LOOP;

END;

SQL> drop index macleans_index;

drop index macleans_index

ERROR at line 1:

ORA-01418: specified index does not exist

成功清理

但是如果在 9i 中的话就比较麻烦，可以尝试用以下方法(不是很推荐，除非你已经等了很久):

1.首先手工删除在线日志表，通过以下手段找出这个中间表的名字select object_name

from dba_objects

where object_name like

(select '%' || object_id || '%'

from dba_objects

where object_name = '&INDEX_NAME')

Enter value for index_name: MACLEANS_INDEX

old 6: where object_name = '&INDEX_NAME')

new 6: where object_name = 'MACLEANS_INDEX')

OBJECT_NAME

--------------------------------------------------------------------------------

SYS_JOURNAL_1343845

SQL> drop table SYS_JOURNAL_1343845;

Table dropped.

2.第二步要手动修改 IND$字典基表

!!!!!! 注意！手动修改数据字典要足够小心！！

select flags from ind$ where obj#=&INDEX_OBJECT_ID;

Enter value for index_object_id: 1343845

old 1: select flags from ind$ where obj#=&INDEX_OBJECT_ID

new 1: select flags from ind$ where obj#=1343845

FLAGS

----------

256

a) 针对 online create index，手动删除对应的记录

delete from IND$ where obj#=&INDEX_OBJECT_ID

b) 针对 online rebuild index，手动恢复对应记录的 FLAGS 标志位update IND$ set FLAGS=FLAGS-512 where obj#=&INDEX_OBJECT_ID

接下来我们实际观察一下清理工作的细节:

SQL> select obj# from obj$ where name='MACLEANS_INDEX';

OBJ#

----------

1343854

SQL> select FLAGS from ind$ where obj#=1343854;

FLAGS

----------

256

SQL> oradebug setmypid;

Statement processed.

SQL> oradebug event 10046 trace name context forever,level 8;

Statement processed.

SQL> DECLARE

2 isClean BOOLEAN;

3 BEGIN

4 isClean := FALSE;

5 WHILE isClean=FALSE

6 LOOP

7 isClean := dbms_repair.online_index_clean(

8 dbms_repair.all_index_id, dbms_repair.lock_wait);

10 dbms_lock.sleep(10);

11 END LOOP;

12 END;

13 /

PL/SQL procedure successfully completed.===============================10046 trace=============================

select i.obj#, i.flags, u.name, o.name, o.type#

from sys.obj$ o, sys.user$ u, sys.ind_online$ i

where (bitand(i.flags, 256) = 256 or bitand(i.flags, 512) = 512)

and (not ((i.type# = 9) and bitand(i.flags, 8) = 8))

and o.obj# = i.obj#

and o.owner# = u.user#

select u.name,

o.name,

o.namespace,

o.type#,

decode(bitand(i.property, 1024), 0, 0, 1)

from ind$ i, obj$ o, user$ u

where i.obj# = :1

and o.obj# = i.bo#

and o.owner# = u.user#

delete from object_usage

where obj# in (select a.obj#

from object_usage a, ind$ b

where a.obj# = b.obj#

and b.bo# = :1)

drop table "SYS"."SYS_JOURNAL_1343854" purge

delete from icoldep$ where obj# in (select obj# from ind$ where bo#=:1)

delete from ind$ where bo#=:1

delete from ind$ where obj#=:1

我们可以利用以下语句找出系统中可能需要恢复的 IND$记录，注意不要看到查询有结果就

认为这是操作失败的征兆，很可能是有人在线创建或重建索引:select i.obj#, i.flags, u.name, o.name, o.type#

from sys.obj$ o, sys.user$ u, sys.ind_online$ i

where (bitand(i.flags, 256) = 256 or bitand(i.flags, 512) = 512)

and (not ((i.type# = 9) and bitand(i.flags, 8) = 8))

and o.obj# = i.obj#

and o.owner# = u.user#

秒客网

Oracle 后台进程（五）SMON进程

相关文章