rac数据库默认sql tuning advisor,导致大量library cache lock
问题现象:
客户反映周六周日固定十点钟,一个程序会特别慢(大概10分钟),平时1到2秒。
查看当时的日志发现:
DBMS_STATS: GATHER_STATS_JOB encountered errors. Check the trace file.
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j002_51847.trc:
ORA-04021: timeout occurred while waiting to lock object
查看trc文件发现:
ORA-04021: timeout occurred while waiting to lock object
*** 2015-09-20 10:17:51.129
DBMS_STATS: GATHER_STATS_JOB: GATHER_TABLE_STATS('"CUSTOMERUSER"','"CUSTOM_TABLE"','"CUSTOMER_201509"', ...)
DBMS_STATS: ORA-04021: timeout occurred while waiting to lock object
工程师根据经验判断是锁导致,一下是解决方法和分析过程:
根据分析分析,该问题的原因是Automatic SQL Tuning Advisor Job在尝试调优
一个较为复杂的SQL "6ru8vx9kjbv1k" 时,花了太多的时间在CPU上,并且它以S模式持有了
CUSTOMERUSER.CUSTOM_TABLE上的library cache lock,而另外一个收集统计信息的job需要以X模式来
获取CUSTOMERUSER.CUSTOM_TABLE上的library cache lock,S和X不兼容导致阻塞,直到收集统计信息的job超时
在此期间,所有的需要在CUSTOMERUSER.CUSTOM_TABLE的hard parse由于都需要以S模式申请表上的library cache lock
而被收集统计信息的那个job阻塞,因此您的数据库会出现大量的library cache lock而拖慢了应用。
解决办法是关闭Automatic SQL Tuning Advisor Job,这不会对您的数据库产生任何性能影响,因为
Automatic SQL Tuning Advisor Job只是给出优化建议,并没有执行任何实质性的操作,除非您接受优化建议。
To implement the solution, please execute the following steps:
Disable the Automatic SQL Tuning Advisor Job, for example:
connect / as sysdba
BEGIN
DBMS_AUTO_TASK_ADMIN.DISABLE(
client_name => 'sql tuning advisor',
operation => NULL,
window_name => NULL);
END;
/
If you need to re-enable in future:
connect / as sysdba
BEGIN
DBMS_AUTO_TASK_ADMIN.ENABLE(
client_name => 'sql tuning advisor',
operation => NULL,
window_name => NULL);
END;
/
Note: Disable Automatic SQL Tuning Advisor Job has NO performance impacts on the database.
Thus the Automatic SQL Tuning Advisor Job will not ON CPU for a long time,
the GATHER_STATS_JOB which need to request S mode library cache lock
will not be blocked, thus the issue will not happen.
The following justifies how the issue is related to this issue:
1. An Automatic SQL Tuning Advisor Job (Instance 1, J001, OS id=1864, pid: 75, sid: 2178, sess_ser: 58349)
is tuning SQL "6ru8vx9kjbv1k" and ON CPU for too long time, it held the library cache lock on table object CUSTOMERUSER.CUSTOM_TABLE
in S mode.
O/S info: user: oracle, term: UNKNOWN, ospid: 1864
OSD pid info: Unix process pid: 1864, image: oracle@rac1 (J001)
(session) sid: 2178 ser: 58349 trans: (nil), creator: 0x127112e890
edition#: 100 oct: 3, prv: 0, sql: 0x131ce187b0, psql: 0x131c16a008, user: 85/CUSTOMERUSER
Current Wait Stack:
Not in wait; last wait ended 18 min 58 sec ago
LibraryObjectPin: Address=0x12ff8bd270 Handle=0x130db54d18 Mode=S Lock=0x12fd5fa510 ..... <<< hold in S mode
LibraryHandle: Address=0x130db54d18 Hash=37d0ddf6 LockMode=S PinMode=S LoadLockMode=0 Status=VALD
ObjectName: Name=CUSTOMERUSER.CUSTOM_TABLE <<<<
FullHashValue=edc24808d2cf3f125643190137d0ddf6 Namespace=TABLE/PROCEDURE(01) Type=TABLE(02) Identifier=87584 OwnerIdn=85
Current SQL:
LibraryHandle: Address=0x131ce187b0 Hash=6515ec32 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) opt_param('parallel_execution_enabled', 'false') result_cache */ COUNT(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "POINT") */ 1 AS C1 FROM "MONITOR_POINT" "POINT" WHERE ("POINT"."POINTNAME"=U'\6D77\6EE8\8DEF-\6606\660E\8DEF')) innerQuery
2. And then the GATHER_STATS_JOB(Instance 1, J000, OS id=1862, pid: 66, sid: 1920, sess_ser: 19755)
attempted to gather table stats on table CUSTOMERUSER.CUSTOM_TABLE and want to acquire the library cache lock(CUSTOMERUSER.CUSTOM_TABLE)
in X mode, so it was blocked by the Automatic SQL Tuning Advisor Job until time out (ORA-04021: timeout occurred while waiting to lock object)
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (orcl.orcl1)
os id: 1862 >>>>> DBMS_STATS: GATHER_STATS_JOB: GATHER_TABLE_STATS('"CUSTOMERUSER"','"CUSTOM_TABLE"','"CUSTOMER_201509"', ...)
process id: 66, oracle@rac1 (J000)
session id: 1920
session serial #: 19755
}
is waiting for 'library cache lock' with wait info:
{
p1: 'handle address'=0x130db54d18
p2: 'lock address'=0x133cd134c8
p3: '100*mode+namespace'=0x1562000010003 <<< the last is 3 which is in X mode.
time in wait: 4 min 49 sec
...
}
and is blocked by
=> Oracle session identified by:
{
instance: 1 (orcl.orcl1)
os id: 1864
process id: 75, oracle@rac1 (J001)
session id: 2178
session serial #: 58349
}
which is not in a wait:
{
last wait: 5 min 37 sec ago
......
}
3. Any other sessions which need to acquire the library cache lock(CUSTOMERUSER.CUSTOM_TABLE) in S mode during hard parse will be blocked
by the GATHER_STATS_JOB (because it request X mode library cache lock), thus the issue happened.
This is explained in the following note:
WAITEVENT: "library cache lock" Reference Note ( Doc ID 34578.1 )
The root cause of this issue is:
An Automatic SQL Tuning Advisor Job (Instance 1, J001, OS id=1864, pid: 75, sid: 2178, sess_ser: 58349)
is tuning SQL "6ru8vx9kjbv1k" and ON CPU for too long time, it held the library cache lock on table object CUSTOMERUSER.CUSTOM_TABLE
in S mode.
And then the GATHER_STATS_JOB(Instance 1, J000, OS id=1862, pid: 66, sid: 1920, sess_ser: 19755)
attempted to gather table stats on table CUSTOMERUSER.CUSTOM_TABLE and want to acquire the library cache lock(CUSTOMERUSER.CUSTOM_TABLE)
in X mode, so it was blocked by the Automatic SQL Tuning Advisor Job.
Any other sessions which need to acquire the library cache lock(CUSTOMERUSER.CUSTOM_TABLE) in S mode during hard parse will be blocked
by the GATHER_STATS_JOB (because it request X mode library cache lock), thus the issue happened.
-- Trace File --
FileName
------------
orcl1_diag_10396_140.trc
FileComment
-----------------
*** 2015-09-20 10:17:51.501
Process diagnostic dump for oracle@rac1 (J001), OS id=1864,
pid: 75, proc_ser: 123, sid: 2178, sess_ser: 58349
-------------------------------------------------------------------------------
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 10:17:50 ]
NOTE: scheduling delay has not been sampled for 0.915705 secs 0.000000 secs from [ 10:17:46 - 10:17:51 ], 5 sec avg
0.000000 secs from [ 10:16:52 - 10:17:51 ], 1 min avg
0.000000 secs from [ 10:12:52 - 10:17:51 ], 5 min avg
loadavg : 1.76 1.77 1.20
Memory (Avail / Total) = 141536.16M / 258301.68M
Swap (Avail / Total) = 64000.99M / 64000.99M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 R oracle 1864 1 99 80 0 - 19929533 ? 10:01 ? 00:15:55 ora_j001_orcl1
Short stack dump:
ksedsts <- ksdxfstk <- ksdxcb <- sspuser <- __sighandler
<- kpp_concatq <- qsmett_tokenize_text <- qsmett_tokenize_text <- qknrcAllocate <- qkadrv
<- opitca <- kksFullTypeCheck <- rpiswu2 <- kksLoadChild <- kxsGetRuntimeLock
<- kksfbc <- kkspsc0 <- kksParseCursor <- opiosq0 <- opiall0
<- opikpr <- opiodr <- rpidrus <- skgmstack <- rpiswu2
<- kprball <- qksdsExeStmt <- qksdsExecute <- kkoatVerifyEst <- kkeAdjSingTabCard
<- kkecdn <- kkotap <- kkoiqb <- kkooqb <- kkoqbc
<- apakkoqb <- apaqbdDescendents <- apaqbdList <- apaqbdDescendents <- apaqbdList
<- apaqbd <- kkqctCostTransfQB <- kkqctdrvCVM <- kkqvmTrMrg <- kkqvmdrv2
<- kkqctdrvTD <- kkqdrv <- kkqctdrvIT <- apadrv <- opitca
<- kksFullTypeCheck <- rpiswu2 <- kksLoadChild <- kxsGetRuntimeLock <- kksfbc
<- kkspsc0 <- kksParseCursor <- opiosq0 <- kpooprx <- kpoal8
<- opiodr <- kpoodrc <- rpiswu2 <- kpoodr <- upirtrc
<- kpurcsc <- kpuexec <- OCIStmtExecute <- qksanExecSql <- qksanAnalyzeSql
<- qksanAnalyzeSegSql <- kestsaDoRounds <- kestsaAutoTuneSql <- kestsaAutoTuneDrv <- kestsTuneSqlDrv
<- kesaiExecAction <- kesaiTuneSqlDrv <- spefcifa <- spefmccallstd <- pextproc
<- peftrusted <- psdexsp <- rpiswu2 <- psdextp <- pefccal
<- pefcal <- pevm_FCAL <- pfrinstr_FCAL <- pfrrun_no_tool <- pfrrun
<- plsql_run <- peicnt <- kkxexe <- opiexe <- kpoal8
<- opiodr <- kpoodr <- upirtrc <- kpurcsc <- kpuexec
<- OCIStmtExecute <- jslvec_execcb <- jslvswu <- jslve_execute0 <- jslve_execute
<- rpiswu2 <- kkjex1e <- kkjsexe <- kkjrdp <- opirip
<- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main
SO: 0x127112e890, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x127112e890, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:75, ser:123, calls cur/top: 0x11c582d260/0x11c5828d20
flags : (0x0) -
flags2: (0x10), flags3: (0x10)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 0 0 240
last post received-location: kqr.h LINE:2245 ID:kqrbtm
last process to post me: 0x13411b8788 2 6
last post sent: 0 0 35
last post sent-location: k sr2. h LINE:629 ID:ksrpublish
last process posted by me: 0x13411b8788 2 6
(latch info) wait_event=0 bits=0x0
Process Group: DEFAULT, pseudo proc: 0x1341292dd0
O/S info: user: oracle, term: UNKNOWN, ospid: 1864
OSD pid info: Unix process pid: 1864, image: oracle@rac1 (J001)
SO: 0x12b15ca2f8, type: 4, owner: 0x127112e890, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x127112e890, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 2178 ser: 58349 trans: (nil), creator: 0x127112e890
flags: (0x8010041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
DID: , short-term DID:
txn branch: (nil)
edition#: 100 oct: 3, prv: 0, sql: 0x131ce187b0, psql: 0x131c16a008, user: 85/CUSTOMERUSER
ksuxds FALSE at location: 0
service name: SYS$USERS
client details:
O/S info: user: oracle, term: UNKNOWN, ospid: 1864
machine: rac1 program: oracle@rac1 (J001)
application name: DBMS_SCHEDULER, hash value=2478762354
action name: ORA$AT_SQ_SQL_SW_5829, hash value=3238241282
Current Wait Stack:
Not in wait; last wait ended 18 min 58 sec ago
There are 1 sessions blocked by this session.
Dumping one waiter:
inst: 1, sid: 1920, ser: 19755
wait event: 'library cache lock'
p1: 'handle address'=0x130db54d18
p2: 'lock address'=0x133cd134c8
p3: '100*mode+namespace'=0x1562000010003
row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
min_blocked_time: 891 secs, waiter_cache_ver: 5529
LibraryObjectLock: Address=0x12fcdf1338 Handle=0x131ce187b0 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0
LibraryHandle: Address=0x131ce187b0 Hash=6515ec32 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) opt_param('parallel_execution_enabled', 'false') result_cache */ COUNT(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "POINT") */ 1 AS C1 FROM "MONITOR_POINT" "POINT" WHERE ("POINT"."POINTNAME"=U'\6D77\6EE8\8DEF-\6606\660E\8DEF')) innerQuery
NamespaceDump:
Parent Cursor: sql_id=6ru8vx9kjbv1k parent=0x112a35bdb0 maxchild=1 plk=y ppn=n
LibraryObjectPin: Address=0x12ff8bd270 Handle=0x130db54d18 Mode=S Lock=0x12fd5fa510 ClusterLock=0x12a210d508 User=0x12b15ca2f8 Session=0x12b15ca2f8 Count=1 Mask=0741 Flags=[00] SavepointNum=0x4a8a
LibraryHandle: Address=0x130db54d18 Hash=37d0ddf6 LockMode=S PinMode=S LoadLockMode=0 Status=VALD
ObjectName: Name=CUSTOMERUSER.CUSTOM_TABLE
FullHashValue=edc24808d2cf3f125643190137d0ddf6 Namespace=TABLE/PROCEDURE(01) Type=TABLE(02) Identifier=87584 OwnerIdn=85
-- Script Output --
FileName
------------
m_ash.dmp
FileComment
-----------------
Get the holder session information: instance 1 session 2178 serial# 58349
select t.dbid,
t.sample_id,
t.sample_time,
t.instance_number,
t.session_id,
t.sql_opcode,
t.sql_id,
t.top_level_sql_id,
t.session_type,
t.session_state,
t.program,
t.action
from m_ash_11426874721 t
where instance_number = 1
and session_id =2178
and session_serial# =58349
order by dbid, instance_number, sample_time;
SAMPLE_ID SAMPLE_TIME INSTANCE_NUMBER SESSION_ID SQL_OPCODE SQL_ID TOP_LEVEL_SQL_ID SESSION_TYPE SESSION_STATE PROGRAM ACTION
19169400 2015-09-20 10:02:06.492 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829
19169410 2015-09-20 10:02:16.532 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829
19169420 2015-09-20 10:02:26.562 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829
19169430 2015-09-20 10:02:36.602 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829
19169440 2015-09-20 10:02:46.642 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829
19169450 2015-09-20 10:02:56.662 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829
19172820 2015-09-20 10:59:18.029 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829
19172830 2015-09-20 10:59:28.049 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829
19172840 2015-09-20 10:59:38.089 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829
19172850 2015-09-20 10:59:48.119 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829
19172860 2015-09-20 10:59:58.149 1 2178 3 6ru8vx9kjbv1k 5zruc4v6y32f9 FOREGROUND ON CPU oracle@rac1 (J001) ORA$AT_SQ_SQL_SW_5829
Summary
=======
The holder is instance 1 session 2178 serial# 58349, it is running SQL Tuning Advisor on '6ru8vx9kjbv1k' and hang on CPU.
-- Trace File --
FileName
------------
orcl1_dia0_10404_315.trc
FileComment
-----------------
*** 2015-09-20 10:04:49.209
Verified Hangs in the System
Root Chain Total Hang
Hang Hang Inst Root #hung #hung Hang Hang Resolution
ID Type Status Num Sess Sess Sess Conf Span Action
----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------
385 HANG VICSELTD 1 2178 2 2 HIGH LOCAL IGNRD:LongOp:Parse
Hang Ignored Reason: Automatic hang resolution is not performed on a session
that is parsing SQL.
inst# SessId Ser# OSPID PrcNm Event
----- ------ ----- --------- ----- -----
1 1920 19755 1862 J000 library cache lock
1 2178 58349 1864 J001 not in wait
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (orcl.orcl1)
os id: 1862 >>>>> DBMS_STATS: GATHER_STATS_JOB: GATHER_TABLE_STATS('"CUSTOMERUSER"','"CUSTOM_TABLE"','"CUSTOMER_201509"', ...)
process id: 66, oracle@rac1 (J000)
session id: 1920
session serial #: 19755
}
is waiting for 'library cache lock' with wait info:
{
p1: 'handle address'=0x130db54d18
p2: 'lock address'=0x133cd134c8
p3: '100*mode+namespace'=0x1562000010003
time in wait: 4 min 49 sec
timeout after: 10 min 10 sec
wait id: 24853
blocking: 0 sessions
wait history:
* time between current wait and wait #1: 0.001207 sec
1. event: 'enq: IV - contention'
time waited: 0.000616 sec
wait id: 24852 p1: 'type|mode'=0x49560005
p2: 'id1'=0x53594e43
p3: 'id2'=0xf
* time between wait #1 and #2: 0.000107 sec
2. event: 'enq: IV - contention'
time waited: 0.001168 sec
wait id: 24851 p1: 'type|mode'=0x49560005
p2: 'id1'=0x4c4f434b
p3: 'id2'=0xf
* time between wait #2 and #3: 0.009759 sec
3. event: 'row cache lock'
time waited: 0.000081 sec
wait id: 24850 p1: 'cache id'=0x8
p2: 'mode'=0x0
p3: 'request'=0x5
}
and is blocked by
=> Oracle session identified by:
{
instance: 1 (orcl.orcl1)
os id: 1864
process id: 75, oracle@rac1 (J001)
session id: 2178
session serial #: 58349
}
which is not in a wait:
{
last wait: 5 min 37 sec ago
blocking: 1 session
wait history:
1. event: 'gc cr multi block request'
time waited: 0.000884 sec
wait id: 9858 p1: 'file#'=0x7e
p2: 'block#'=0x6426
p3: 'class#'=0x1
* time between wait #1 and #2: 0.000335 sec
2. event: 'gc cr multi block request'
time waited: 0.001178 sec
wait id: 9857 p1: 'file#'=0x7e
p2: 'block#'=0x6422
p3: 'class#'=0x1
* time between wait #2 and #3: 0.000343 sec
3. event: 'gc cr block 2-way'
time waited: 0.000560 sec
wait id: 9856 p1: ''=0x7e
p2: ''=0x5726
p3: ''=0x1
}
Chain 1 Signature: <='library cache lock'
Chain 1 Signature Hash: 0x24734cf