Undo 相关的等待事件和已知问题 (Doc ID 1575701.1)

Undo Related Wait Events & Known Issues (Doc ID 1575701.1)

APPLIES TO:

Oracle Database - Enterprise Edition - Version 9.2.0.8 to 11.2.0.4 [Release 9.2 to 11.2]
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.
***Checked for relevance on 24-Aug-2017***

PURPOSE

Below is the most commonly seen Undo related wait events. This document will help how to diagnose the troubleshoot the same

以下是最常见的与undo相关的等待事件。本文档将帮助您如何诊断相同的故障

Enq: US Contention
Buffer Busy waits on Undo
Wait for a undo record

TROUBLESHOOTING STEPS

Enq: US Contention:

As the number of transactions increases so is their need for space but if there is little space free because most is still allocated to unexpired blocks the sessions first search for free space in offline undo segments. If there are many of Offline undo segments, the search for space can generate lots of hits on dc_rollback_segments, the latch and US(Undo Segment)enqueue. This can lead to high 'latch: row cache objects' contention which may be seen on DC_ROLLBACK_SEGMENTS together with high 'enq: US - contention'

随着事务数量的增加，它们对空间的需求也随之增加，但是如果由于几乎所有空间仍分配给未过期的块而没有多少可用空间，则会话会首先在脱机undo段中搜索可用空间。如果存在许多“脱机”undo段，则对空间的搜索可能会在dc_rollback_segments，latch and US(Undo Segment)排队上产生很多匹配。这可能会导致较高的 'latch: row cache objects' 争用，这可能在 DC_ROLLBACK_SEGMENTS 上看到，同时还会出现较高的 'enq: US - contention'

Performance of the database is affected when this wait event occurs. Row cache objects latch protects the dictionary cache. The first thing to figure out whether most of the contention was contributed by a particular row cache objects child latch:

发生此等待事件时，数据库的性能会受到影响。行缓存对象latch可保护字典缓存。首先要弄清楚大多数争用是否是由特定的行缓存对象子latch引起的：

Use the following queries or Check AWR Report: 使用以下查询或检查AWR报告：

* Use the below queries 使用以下查询

1) select SEGMENT_NAME,STATUS,TABLESPACE_NAME from dba_rollback_segs where status = 'OFFLINE';

2) select latch#, child#, sleeps from v$latch_children where name='row cache objects' and sleeps > 0 order by sleeps desc;

    LATCH#     CHILD#     SLEEPS

     -------- ---------- ----------

       120      1   3531645

        10        5       400

3)Query v$rowcache to find the  confirm

    SQL> select parameter, gets from v$rowcache order by gets desc;

    PARAMETER                              GETS

    -------------------------------- ----------

    dc_rollback_segments              310995555

    dc_tablespaces                     76251831

    dc_segments                         3912096

Here it shows  dc_rollback_segments with highest gets.

* From AWR Report 从AWR报告中

Check for Top 5 Wait events 检查前5个等待事件
High 'latch: row cache objects' contention on dc_rollback_segmentstogether with high 'enq: US - contention'

dc_rollback_segments上的高 'latch: row cache objects' 争用与高的 'enq: US - contention' 一起

Top 5 Timed Events                                         Avg %Total         

~~~~~~~~~~~~~~~~~~                                        wait   Call         

Event                                 Waits    Time (s)   (ms)   Time Wait    

------------------------------ ------------ ----------- ------ ------

----------

latch: row cache objects          2,057,004     490,074    238   43.8

Concurrency

enq: US - contention              1,548,328     370,460    239   33.1

Other

* Other Information to collect: 收集的其他信息

1) When the issue occurs, collect hang analyze dumps and system state dumps.

$ sqlplus / as sysdba

SQL> oradebug setmypid

SQL> oradebug unlimit

SQL> oradebug hanganalyze 3

SQL> oradebug dump systemstate 266

Wait for 5 seconds, and then continue with:

SQL> oradebug dump systemstate 266

SQL> exit

Wait for 2 minutes, and then again:

$ sqlplus / as sysdba

SQL> oradebug setmypid

SQL> oradebug unlimit

SQL> oradebug hanganalyze 3

SQL> oradebug dump systemstate 266

2) AWR and/or ASH report of 30 or 60 minutes interval.

3) Alert.log from last startup

Known Bugs

1) BUG:7291739- CONTENTION UNDER AUTO-TUNED UNDO RETENTION
Fixed in
10.2.0.4.4, 10.2.0.5.0, 11.2.0.1.0

When using autotuned undo retention the high latch contention on 'latch: row cache objects' on dc_rollback_segments together with high enqueue contention on 'enq: US - contention' can occur.
使用自动调整的undo保留时，可能会在dc_rollback_segments上的 'latch: row cache objects' 上发生较高的latch争用，而在 'enq: US - contention' 上发生较高的排队争用。

Refer: Contention Under Auto-Tuned Undo Retention (Doc ID 742035.1) for other workarounds

2) Unpublished Bug 14226599 -Increase dc_rollback_segs hash buckets to reduce 'latch: row cache objects' waits
This is NOT a bug, but enhancement which increases the number of hash buckets in the "dc_rollback_segments" rowcache.
Versions confirmed being affected is 10.2.0.5,11.1.0.7,11.2.0.2 and 11.2.0.3

3) Bug 13252635 : ESSC: HIGH ENQ: US - CONTENTION, LATCH: GES RESOURCE HASH LIST ON STRESS TEST
closed as duplicate of unpublished Bug 11690639

4) Unpublished Bug 6870994: CREATE / ONLINE OF UNDO SEGMENTS SLOWER IN 11.1.0.6 THAN IN 10.2.0.3. If the number of enqueue gets is high while onlining of undo segments then we may have hit this problem.

5) Bug 5387030 : AUTOMATIC TUNING OF UNDO_RETENTION CAUSING SPACE PROBLEMS

Refer: How to correct performance issues with enq: US - contention related to undo segments (Doc ID 1332738.1)

Buffer Busy Waits on Undo 缓冲区忙等待undo

Buffer Busy Waits: 缓冲区忙等待

This wait happens when a session wants to access a database block in the buffer cache but it cannot as the buffer is "busy". The two main cases where this can occur are:

当会话要访问缓冲区高速缓存中的数据库块但由于缓冲区 "busy" 而无法访问数据库块时，会发生这种等待。其中，这可能会发生两种主要情况是

a) Another session is reading the block into the buffer 另一个会话读取块到缓冲区

b) Another session holds the buffer in an incompatible mode to our request 另一个会话持有不兼容的模式我们的请求缓冲区

Buffer Busy Waits on Undo happens when we want to NEW the block but the block is currently being read by another session (most likely for undo).

缓冲区忙等待上undo发生时，我们希望新的块，但另一个会话当前正在读取该块（很可能是undo操作）

1)Review the section Segments by Buffer Busy Waits, and note the segments with the highest waits

1) 查看“按缓冲区繁忙等待时间划分的细分”部分，并注意等待时间最长的细分

2)Run

SELECT p1 "File", p2 "Block", p3 "Reason"

    FROM v$session_wait

   WHERE event='buffer busy waits';

Note: In the above query there is no reference to WAIT_TIME as you are not interested in whether a session is currently waiting or not, just what buffers are causing waits. If a particular block or range of blocks keep showing waits you can try to isolate the object using the queries in Note:181306.1.

注意： 在上面的查询中，没有引用WAIT_TIME，因为您对会话当前是否正在等待不感兴趣，而仅是什么缓冲区导致了等待。如果某个特定的块或块的范围持续显示等待，则可以尝试使用注：181306.1中的查询来隔离对象。

3)Refer: How to Identify The Segment Associated with Buffer Busy Waits (Doc ID 413931.1)

If the block type is 'Undo Header' then solution would be to add more rollback segments

如果块类型为 'Undo Header' ，则解决方案是添加更多回滚段

Known bugs:

Bug 5439554 : BUFFER BUSY WAITS TIMEOUTS ON INSERT INTENSIVE WORKLOAD/POSSIBLE DEADLOCK

Happens when buffer busy wait" timeouts may be seen when running with automatic undo management and in memory undo.

在使用自动undo管理和内存undo运行时，可能会看到“buffer busy wait”超时发生。

Solutions: 解决方案
=======

First, it is very important to check UNDO space healthness and mis-configurations.

首先，检查UNDO空间的健康状况和配置错误非常重要

Check the following document and follow the described checks/steps :

检查以下文档，并按照描述的检查/步骤操作

Troubleshooting ORA-01555 - snapshot too old: rollback segment number "string" with name "string" too small (Doc ID 1580790.1)

Afterwards, apply one/both of the following solutions :

然后，应用以下一种/两种方法

1. Setting _ROLLBACK_SEGMENT_COUNT to a high number to keep undo segments online:

1.将_ROLLBACK_SEGMENT_COUNT设置为较高的数字，以使undo段保持在线

ALTER SYSTEM SET "_rollback_segment_count"=<n> scope=both;

Note: In databases with high query activity, particularly parallel query and a high setting for _ROLLBACK_SEGMENT_COUNT, you can expect to see wait contention on the row cache for DC_ROLLBACK_SEGS. It is highly recommended in these environments where setting _ROLLBACK_SEGMENT_COUNT to a high value (10s of thousands and higher) apply the patch for Bug:14226599 base Bug:1421197. This will increase the hash buckets on the DC_ROLLBACK_SEGS row cache to help alleviate latch contention.

注意：在具有高查询活动（特别是并行查询）和_ROLLBACK_SEGMENT_COUNT的较高设置的数据库中，可以期望在行高速缓存上看到DC_ROLLBACK_SEGS的等待争用。强烈建议在这些环境中，其中设置_ROLLBACK_SEGMENT_COUNT到高值（数千10S和更高）应用补丁为错误：14226599碱基错误：1421197。这将增加DC_ROLLBACK_SEGS行缓存上的哈希存储桶，以帮助减轻锁存器争用。

2. Set the event 10511 which disables SMON from offlining the undo segments which avoids the contention for US enqueue.

2. 设置事件10511，该事件使SMON无法使undo段脱机，从而避免争用US排队

Setting of this event does not effect the regular shrink/space reclaims as it only disables SMON from offlining to avoid excessive onlines of undo segments.

设置此事件不会影响常规的收缩/空间回收，因为它仅使SMON无法脱机，以避免过多的undo段联机。

alter system set events '10511 trace name context forever, level 1';

LATCH: UNDO GLOBAL DATA

This latch serializes the access to the Undo (aka Rollback) segment information in the SGA.

此锁存器序列化对SGA中的“undo”（又名“回滚”）段信息的访问。

Every time a session wants to know about the state of the Undo Segments, it has to get this latch.

会话每次想知道“undo段”的状态时，都必须获取此闩锁。

Known Bugs/Issues:

"LATCH: UNDO GLOBAL DATA" In The Top Wait Events (Doc ID 1451536.1)

Bug 5751672 - "In memory undo latch" / "undo global data" latch contention from kturimugur (Doc ID 5751672.8)

Bug 7299191 - Contention on "undo global data" from concurrent Flashback queries (Doc ID 7299191.8)

Solutions:

To implement a solution for this problem, please execute any of the below alternative solutions:

要实现此问题的解决方案，请执行以下任一替代解决方案

Increase the undo tablespace size to make more free space available in the undo tablespace 增加undo表空间的大小，以使undo表空间中有更多可用空间
OR

Reduce the UNDO_RETENTION value, so less undo is being retained 减小UNDO_RETENTION值，以便保留较少的undo
OR

Enable undo auto-tuning, by unsetting the_UNDO_AUTOTUNE instance parameter. 通过取消设置_UNDO_AUTOTUNE实例参数来启用undo自动调整

wait for a undo record 等待undo记录

You can disable parallel rollback by setting the following parameter

您可以通过设置以下参数来禁用并行回滚

fast_start_parallel_rollback = false

BEWARE: that setting this parameter dynamically can cause problems on a busy instance with a lot of active transaction work and it is safer to set this with an instance restart so as not to change the rollback strategy on active transactions.

注意：动态设置此参数会在繁忙的实例上导致大量活动事务工作，从而导致问题，并且在实例重新启动时进行设置更安全，以免更改活动事务的回滚策略。

REFERENCES

NOTE:5751672.8 - Bug 5751672 - "In memory undo latch" / "undo global data" latch contention from kturimugur
NOTE:7299191.8 - Bug 7299191 - Contention on "undo global data" from concurrent Flashback queries
NOTE:1332738.1 - How to correct performance issues with enq: US - contention related to undo segments
NOTE:1451536.1 - "LATCH: UNDO GLOBAL DATA" In The Top Wait Events