db file sequential read等待事件总结

时间:2023-01-28 07:52:40
db file sequential read

The db file sequential read wait event has three parameters: file#, first block#, and block count. In Oracle Database 10g, this wait event falls under the User I/O wait class. Keep the following key thoughts in mind when dealing with the db file sequential read wait event.

该等待事件的参数:file#,first block#,and block count(一般是1)可以从dba_extents去确定访问的段,属于I/O类的等待。

  • The Oracle process wants a block that is currently not in the SGA, and it is waiting for the database block to be read into the SGA from disk.

  • The two important numbers to look for are the TIME_WAITED and AVERAGE_WAIT by individual sessions.

  • Significant db file sequential read wait time is most likely an application issue.

Common Causes, Diagnosis, and Actions

The db file sequential read wait event is initiated by SQL statements (both user and recursive) that perform single-block read operations against indexes, rollback (or undo) segments, and tables (when accessed via rowid), control files and data file headers. This wait event normally appears as one of the top five wait events, according to systemwide waits.

Physical I/O requests for these objects are perfectly normal, so the presence of the db file sequential read waits in the database does not necessarily mean that there is something wrong with the database or the application. It may not even be a bad thing if a session spends a lot of time on this event. In contrast, it is definitely bad if a session spends a lot of time on events like enqueue or latch free. This is where this single-block read subject becomes complicated. At what point does the db file sequential read event become an issue? How do you define excessive? Where do you draw the line? These are tough questions, and there is no industry standard guideline. You should establish a guideline for your environment. For example, you may consider it excessive when the db file sequential read wait represents a large portion of a process response time. Another way is to simply adopt the nonscientific hillbilly approach—that is, wait till the users start screaming.

You can easily discover which session has high TIME_WAITED on the db file sequential read wait event from the V$SESSION_EVENT view. The TIME_WAITED must be evaluated with the LOGON_TIME and compared with other nonidle events that belong to the session for a more accurate analysis. Sessions that have logged on for some time (days or weeks) may accumulate a good amount of time on the db file sequential read event. In this case, a high TIME_WAITED may not be an issue. Also, when the TIME_WAITED is put in perspective with other nonidle events, it prevents you from being blindsided. You may find another wait event which is of a greater significance. Based on the following example, SID# 192 deserves your attention and should be investigated: 


当进程需要的信息不在SGA,要等从磁盘读入SGA中,此时进程等待此事件。

一般是由sql或者递归sql中发出,从索引,回滚段,表(rowid回表),控制文件,数据文件头处读取信息。

select a.sid,
       a.event,
       a.time_waited,
       a.time_waited / c.sum_time_waited * 100 pct_wait_time,
       round((sysdate - b.logon_time) * 24) hours_connected
from   v$session_event a, v$session b,
       (select sid, sum(time_waited) sum_time_waited
        from   v$session_event
        where  event not in (
                    'Null event',
                    'client message',
                    'KXFX: Execution Message Dequeue - Slave',
                    'PX Deq: Execution Msg',
                    'KXFQ: kxfqdeq - normal deqeue',
                    'PX Deq: Table Q Normal',
                    'Wait for credit - send blocked',
                    'PX Deq Credit: send blkd',
                    'Wait for credit - need buffer to send',
                    'PX Deq Credit: need buffer',
                    'Wait for credit - free buffer',
                    'PX Deq Credit: free buffer',
                    'parallel query dequeue wait',
                    'PX Deque wait',
                    'Parallel Query Idle Wait - Slaves',
                    'PX Idle Wait',
                    'slave wait',
                    'dispatcher timer',
                    'virtual circuit status',
                    'pipe get',
                    'rdbms ipc message',
                    'rdbms ipc reply',
                    'pmon timer',
                    'smon timer',
                    'PL/SQL lock timer',
                    'SQL*Net message from client',
                    'WMON goes to sleep')
        having sum(time_waited) > 0 group by sid) c
where  a.sid         = b.sid
and    a.sid         = c.sid
and    a.time_waited > 0
and    a.event       = 'db file sequential read'
order by hours_connected desc, pct_wait_time;


 SID EVENT                   TIME_WAITED PCT_WAIT_TIME HOURS_CONNECTED
---- ----------------------- ----------- ------------- ---------------
 186 db file sequential read       64446    77.0267848             105
 284 db file sequential read     1458405     90.992838             105
 194 db file sequential read     1458708    91.0204316             105
 322 db file sequential read     1462557    91.1577045             105
 139 db file sequential read      211325    52.6281055              11
 256 db file sequential read      247236    58.0469755              11
 192 db file sequential read 243113 88.0193625 2

There are two things you can do to minimize the db file sequential read waits:

  • Optimize the SQL statement that initiated most of the waits by reducing the number of physical and logical reads.

  • Reduce the average wait time.


Unless you trace a session with the event 10046 or have a continuously running wait event data collector as discussed in Chapter 4, it is difficult to determine the SQL statement that is responsible for the cumulated wait time. Take the preceding SID #192 again, for example. The 243113 centiseconds wait time may be caused by one long-running or many fast SQL statements. The latter case may not be an issue. Furthermore, the SQL statement that is currently running may or may not be the one that is responsible for the waits. This is why interactive diagnosis without historical data is often unproductive. You can query the V$SQL view for statements with high average DISK_READS, but then how can you tell they belong to the session? Due to these limitations, you may have to identify and trace the session the next time around to nail down the offending SQL statement. Once you have found it, the optimization goal is to reduce the amount of physical and logical reads.

  Note

In addition to the DISK_READS column, the V$SQL and V$SQLAREA views in Oracle Database 10g have exciting new columns: USER_IO_WAIT_TIME, DIRECT_WRITES, APPLICATION_WAIT_TIME, CONCURRENCY_WAIT_TIME, CLUSTER_WAIT_TIME, PLSQL_EXEC_TIME, and JAVA_EXEC_TIME. You can discover the SQL statement with the highest cumulative or average USER_IO_WAIT_TIME.

Another thing you can do to minimize the impact of the db file sequential read event is reduce the AVERAGE_WAIT time. This is the average time a session has to wait for a single block fetch from disk; the information is available in the V$SESSION_EVENT view. In newer storage subsystems, an average single-block read shouldn’t take more than 10ms (milliseconds) or 1cs (centisecond). You should expect an average wait time of 4 to 8ms (0.4 to 0.8cs) with SAN (storage area network) due to large caches. The higher the average wait time, the costlier it is to perform a single-block read,  and the overall process response time will suffer. On the other hand, a lower average wait time is more forgiving and has a lesser impact on the response times of processes that perform a lot of single-block reads. (We are not encouraging you to improve the average wait time to avoid SQL optimization. If the application has SQL statements that perform excessive amounts of single-block reads, they must first be inspected and optimized.) The db file sequential read “System-Level Diagnosis” section has some ideas on how to improve the AVERAGE_WAIT time.

As you monitor a session and come across the db file sequential read event, you should translate its P1 and P2 parameters into the object that they represent. You will find that the object is normally an index or a table. The DBA_EXTENTS view is commonly used for object name resolution. However, as mentioned in Chapter 4, the DBA_EXTENTS is a complex view and is not query-friendly in regards to performance. Object name resolution is much faster using the X$BH and DBA_OBJECTS views. The caveat is that you must wait for the block to be read into the buffer cache; otherwise the X$BH view has no information on the buffer that is referenced by the P1 and P2 parameters. Also, the DBA_OBJECTS view does not contain rollback or undo segment objects that the P1 and P2 parameters may be referencing.

要减少这个等待事件,要么减少它的次数,要么减少平均等待时间。通过调优SQL来减少逻辑读,留意效率低的大范围索引扫描回表(可能全表扫更好),可以减低次数。用更高响应时间的存储,分散热点文件,可以减轻平均等待时间。在新的存储子系统,平均单块读等待时间不应超过10ms(千分之一秒),如果用有大cache的SAN一般4-8ms为佳。通过p1与p2参数与dba_extents视图,我们定位到等待访问的段,然后来分散热点。


--这段SQL的找出频繁发生db file sequential read的对象。

select b.sid,
       nvl(substr(a.object_name,1,30),
                  'P1='||b.p1||' P2='||b.p2||' P3='||b.p3) object_name,
       a.subobject_name,
       a.object_type
from   dba_objects a, v$session_wait b, x$bh c
where  c.obj = a.object_id(+)
and    b.p1 = c.file#(+)
and    b.p2 = c.dbablk(+)
and    b.event = 'db file sequential read'
union
select b.sid,
       nvl(substr(a.object_name,1,30),
                  'P1='||b.p1||' P2='||b.p2||' P3='||b.p3) object_name,
       a.subobject_name,
       a.object_type
from   dba_objects a, v$session_wait b, x$bh c
where  c.obj = a.data_object_id(+)
and    b.p1 = c.file#(+)
and    b.p2 = c.dbablk(+)
and    b.event = 'db file sequential read'
order  by 1;

  SID OBJECT_NAME               SUBOBJECT_NAME            OBJECT_TYPE
----- ------------------------- ------------------------- -----------------
   12 DVC_TRX_REPOS             DVC_TRX_REPOS_PR64        TABLE PARTITION
  128 DVC_TRX_REPOS             DVC_TRX_REPOS_PR61        TABLE PARTITION
  154 ERROR_QUEUE               ERROR_QUEUE_PR1           TABLE PARTITION
  192 DVC_TRX_REPOS_1IX         DVC_TRX_REPOS_20040416    INDEX PARTITION
  194 P1=22 P2=30801 P3=1
  322 P1=274 P2=142805 P3=1
  336 HOLD_Q1_LIST_PK                                     INDEX

Sequential Reads Against Indexes

The main issue is not index access; it is waits that are caused by excessive and unwarranted index reads. If the db file sequential read event represents a significant portion of a session’s response time, all that tells you is that the application is doing a lot of index reads. This is an application issue. Inspect the execution plans of the SQL statements that access data through indexes. Is it appropriate for the SQL statements to access data through index lookups? Is the application an online transaction processing (OLTP) or decision support system (DSS)? Would full table scans be more efficient? Do the statements use the right driving table? And so on. The optimization goal is to minimize both the number of logical and physical I/Os.

If you have access to the application code, you should examine the application logic. Look at the overall logic and understand what it is trying to do. You may be able to recommend a better approach.

Index reads performance can be affected by slow I/O subsystem and/or poor database files layout, which result in a higher average wait time. However, I/O tuning should not be prioritized over the application and SQL tuning, which many DBAs often do. I/O tuning does not solve the problem if SQL statements are not optimized and the demand for physical I/Os remains high. You should also push back when the application team tries to circumvent code changes by asking for more powerful hardware. Getting the application team to change the code can be like pulling teeth. If the application is a rigid third-party solution, you may explore the stored outline feature, introduce new indexes, or modify the current key compositions whenever appropriate.

In addition to SQL tuning, it may also be worthwhile to check the index’s clustering factor if the execution plan calls for table access by index rowid. The clustering factor of an index defines how ordered the rows are in the table. It affects the number of I/Os required for the whole operation. If the DBA_INDEXES.CLUSTERING_FACTOR of the index approaches the number of blocks in the table, then most of the rows in the table are ordered. This is desirable. However, if the clustering factor approaches the number of rows in the table, it means the rows in the table are randomly ordered. In this case, it is unlikely for the index entries in the same leaf block to point to rows in the same data block, and thus it requires more I/Os to complete the operation. You can improve the index’s clustering factor by rebuilding the table so that rows are ordered according to the index key and rebuilding the index thereafter. What happens if the table has more than one index? Well, that is the downside. You can only cater to the most used index.

Also check to see if the application has recently introduced a new index using the following query. The introduction of a new index in the database may cause the optimizer to choose a different execution plan for SQL statements that access the table. The new plan may yield a better, neutral, or worse performance than the old one.

cluster_factor——表明有多少邻近的索引条目指到不同的数据块。如果表里的数据与索引排序是相似的,聚簇因子就小,最小值是表里非空的数据块总数。如果表里的数据和索引排序迥异,聚簇因子就大,最大值是索引中的键数。值越小,越表明表行是有序存储的

select owner, 
       substr(object_name,1,30) object_name, 
       object_type, 
       created
from   dba_objects
where  object_type in ('INDEX','INDEX PARTITION')
order by created;

The OPTIMIZER_INDEX_COST_ADJ and OPTIMIZER_INDEX_CACHING initialization parameters can influence the optimizer to favor the nested loops operation and choose an index access path over a full table scan. The default value for the OPTIMIZER_INDEX_COST_ADJ parameter is 100. A lower value tricks the optimizer into thinking that index access paths are cheaper. The default value for the OPTIMIZER_INDEX_CACHING parameter is 0. A higher value informs the optimizer that a higher percentage of index blocks is already in the buffer cache and that nested loops operations are cheaper. Some third-party applications use this method to promote index usage. Inappropriate use of these parameters can cause significant I/O wait time. Find out what values the sessions are running with. Up to Oracle9i Database, this information could only be obtained by tracing the sessions with the trace event 10053 at level 1 and examining the trace files. In Oracle Database 10g, this is as simple as querying the V$SES_OPTIMIZER_ENV view.

Make sure all object statistics are representative of the current data, as inaccurate statistics can certainly cause the optimizer to generate poor execution plans that call for index reads when they shouldn’t. Remember, statistics need to be representative and not necessarily up-to-date, and execution plan may change each time statistics are gathered.

  Note

When analyzing tables or indexes with a low ESTIMATE value, Oracle normally uses single block reads, and this will add to the db file sequential read statistics for the session (V$SESSION_EVENT) and instance (V$SYSTEM_EVENT).

Sequential Reads Against Tables

You may see db file sequential read wait events in which the P1 and P2 parameters resolve to a table instead of an index. This is normal for SQL statements that access tables by rowids obtained from the indexes, as shown in the following explain plan. Oracle uses single-block I/O when reading a table by rowids.

LVL OPERATION                         OBJECT                
--- --------------------------------- ---------------------
  1 SELECT STATEMENT 
  2   TABLE ACCESS BY INDEX ROWID     RESOURCE_ASGN_SNP   
  3     INDEX RANGE SCAN              RESOURCE_ASGN_SNP_4IX

System-Level Diagnosis

The V$SYSTEM_EVENT view provides the data for system-level diagnosis. For I/O related events, the two columns of interest are the AVERAGE_WAIT and TIME_WAITED.

Remember to evaluate the TIME_WAITED with the instance startup in mind. It is normal for an older instance to show a higher db file sequential read wait time. Also, always query the V$SYSTEM_EVENT view in the order of TIME_WAITED such as in the following example. This allows you to compare the db file sequential read waits with other significant events in the system. If the db file sequential read wait time is not in the top five category, don’t worry about it because you have bigger fish to fry. Even if the db file sequential read wait time is in the top five category, all it tells you is that the database has seen a lot of single-block I/O calls. The high wait time may be comprised of waits from many short-running OLTP sessions or a few long-running batch processes, or both. At the system level, there is no information as to who made the I/O calls, when the calls were made, what objects were accessed, and the SQL statements that initiated the calls. In other words, system-level statistics offer very limited diagnosis capability.

select a.event, 
       a.total_waits, 
       a.time_waited, 
       a.time_waited/a.total_waits average_wait,
       sysdate – b.startup_time days_old
from   v$system_event a, v$instance b
order by a.time_waited;

The AVERAGE_WAIT column is more useful. We showed what you should consider as normal in the preceding paragraphs. If your average single-block read wait time exceeds this allowance, you may have a problem in the I/O subsystem or hot spots on disk. If your database is built on file systems, make sure the database mount points contain only Oracle files. Do not share your database mount points with the application or another database. Also, if possible, avoid sharing I/O devices. Several mount points can be mapped to the same I/O device. According to the following Veritas vxprint output, mount points u02, u03, u04, and u05 are all mapped to device c2t2d0. You should find out how your database files are mapped to I/O controllers and I/O devices or physical disks. For databases on the Veritas file system, the vxprint –ht command shows the mount point mappings.

v  oracle_u02   -            ENABLED  ACTIVE   20480000 fsgen  -      SELECT
pl oracle_u02-01 oracle_u02  ENABLED  ACTIVE   20482560 CONCAT -      RW
sd oracle01-01  oracle_u02-01 oracle01 0       20482560 0      c2t2d0 ENA


v  oracle_u03   -            ENABLED  ACTIVE   20480000 fsgen  -      SELECT
pl oracle_u03-01 oracle_u03  ENABLED  ACTIVE   20482560 CONCAT -      RW
sd oracle01-02  oracle_u03-01 oracle01 20482560 20482560 0     c2t2d0 ENA


v  oracle_u04   -            ENABLED  ACTIVE   20480000 fsgen  -      SELECT
pl oracle_u04-01 oracle_u04  ENABLED  ACTIVE   20482560 CONCAT -      RW
sd oracle01-03  oracle_u04-01 oracle01 40965120 20482560 0     c2t2d0 ENA


v  oracle_u05   -            ENABLED  ACTIVE   30720000 fsgen  -      SELECT
pl oracle_u05-01 oracle_u05  ENABLED  ACTIVE   30723840 CONCAT -      RW
sd oracle01-04  oracle_u05-01 oracle01 266273280 30723840 0    c2t2d0 ENA

Make sure the database files are properly laid out to avoid hot spots. Monitor I/O activities using operating system commands such as iostat and sar. Pay attention to disk queue length, disk service time, and I/O throughput. If a device is particularly busy, then consider relocating some of the data files that are on the device. On the Solaris operating system, you can get I/O statistics on controllers and devices with the iostat –dxnC command. However, hot spots tuning is easier said than done. You need to know how the application uses I/O. Furthermore, if the application is immature and new functionalities are constantly being added, the hot spots may be moving targets. DBAs are normally not apprised of new developments and often have to discover them reactively. This is why I/O balancing can be a never ending task. If you can upgrade to Oracle Database 10g, ASM (Automatic Storage Management) can help with I/O balancing.

By the way, in addition to the systemwide db file sequential read average wait time from the V$SYSTEM_EVENT view, Oracle also provides single-block read statistics for every database file in the V$FILESTAT view. The file-level single-block average wait time can be calculated by dividing the SINGLEBLKRDTIM with the SINGLEBLKRDS, as shown next. (The SINGLEBLKRDTIM is in centiseconds.) You can quickly discover which files have unacceptable average wait times and begin to investigate the mount points or devices and ensure that they are exclusive to the database.

select a.file#, 
       b.file_name, 
       a.singleblkrds, 
       a.singleblkrdtim, 
       a.singleblkrdtim/a.singleblkrds average_wait
from   v$filestat a, dba_data_files b 
where  a.file# = b.file_id   
and    a.singleblkrds > 0
order by average_wait;


FILE# FILE_NAME                     SINGLEBLKRDS SINGLEBLKRDTIM AVERAGE_WAIT
----- ----------------------------- ------------ -------------- ------------ 
  367 /dev/vgEMCp113/rPOM1P_4G_039          5578            427   .076550735
  368 /dev/vgEMCp113/rPOM1P_4G_040          5025            416    .08278607
  369 /dev/vgEMCp113/rPOM1P_4G_041         13793           1313   .095193214
  370 /dev/vgEMCp113/rPOM1P_4G_042          6232            625   .100288832
  371 /dev/vgEMCp113/rPOM1P_4G_043          4663            482   .103366931
  372 /dev/vgEMCp108/rPOM1P_8G_011        164828         102798   .623668309
  373 /dev/vgEMCp108/rPOM1P_8G_012        193071         125573    .65039804
  374 /dev/vgEMCp108/rPOM1P_8G_013        184799         126720   .685717996
  375 /dev/vgEMCp108/rPOM1P_8G_014        175565         125969   .717506337