前几天群里的一个朋友说他的数据库shutdown immediate后不能关闭。 hang住了。这种情况以前也遇到过一次,是测试库。 就直接shutdown abort了。当然生产库不建议这么做。
在Oracle metalink上搜了一些有关执行shutdown immedate命令hang住或者slow的文章。整理如下。
一. BUG 导致shutdown immediate Hang住
1.1 Bug 6512622 SHUTDOWN IMMEDIATE hangs / OERI[3708]
This note gives a brief overview bug 6512622.
The content was last updated on: 18-NOV-2010
Click here for details of each of the sections below.
Affects:
Product (Component) |
Oracle Server (Rdbms) |
Range of versions believed to be affected |
Versions BELOW 11.1 |
Versions confirmed as being affected |
|
Platforms affected |
Generic (all / most platforms affected) |
Fixed:
This issue is fixed in |
1.2 Bug 5057695: Shutdown Immediate Very Slow To Close Database [ID 428688.1]
Apply one off Patch 5057695 available on My Oracle Support on top of 9.2.0.8, 10.2.0.2 or 10.2.0.3.
The fix is included as of the 10.2.0.4 patchset.
在这里就列举2个,貌似看到的bug不止这2个。没有细数了。
二. Shutdown immediate hang住的一般原因
Metalink 的说明如下:
Shutdown Immediate Hangs: Common Causes [ID 106474.1]
http://www.cndba.cn/Dave/article/549
在这篇文章里提到了2个因素:
(1)Rollbacks
(2)Temp segment cleanup
Rollbacks: If a large dml statement is in progress it must roll back before the shutdown completes. In pre 8.1 versions of Oracle this can be made to go faster with a higher cleanup_rollback_entries, but be careful because too high a setting can cause resource contention during normal database operation.
Temp segment cleanup: If your tablespace used for temp sort segments is of type temporary (as shown by the CONTENTS column of DBA_TABLESPACES) the extents, once allocated, are never deallocated until shutdown. Instead they are tracked in the SGA. This helps reduce dictionary cache contention during normal database operation. Unfortunately, when we shutdown these extents must all be deallocated at once which can lead to the dictionary cache contention we avoided while we were running. The tablespace of type temporary was introduced in 7.3.
These two problems can also combine to make things even slower as a rollback will also deallocate extents, leading to further contention on the parts of the dictionary cache that track used and free extents.
三.跟踪导致hang住的原因
Metalink 的文章:
How to Check Why Shutdown Immediate Hangs? [ID 164504.1]
http://www.cndba.cn/Dave/article/547
这篇文章提到了2种不同方法:
A. While shutdown immediate is hanging
Start Server Manager (or SQL*Plus for 8i or higher)
SVRMGRL> connect internal (or SYSDBA for 8i or higher)
SVRMGRL> select * from x$ktuxe where ktuxecfl = 'DEAD';
This shows dead transactions that smon is looking to rollback.
B. Plan to shutdown again and gather some information. Before issuing the
shutdown immediate command set some events as follows:
SVRMGRL> connect internal
SVRMGRL> alter session set events '10046 trace name context forever,level 12';
SVRMGRL> alter session set events '10400 trace name context forever, level 1';
SVRMGRL> shutdown immediate;
10046 turns on extended SQL_TRACE for the shutdown process.
10400 dumps a systemstate every 5 minutes.
在Oracle 10g 和 11g的官方文档都没有搜到SVRMGRL命令的相关说明。这个应该是8i版本里的命令。
以前整理过一篇在DB hang住情况下使用sqlplus的文章:
Oracle sqlplus prelim参数介绍
http://blog.csdn.net/tianlesoftware/archive/2009/12/10/4980545.aspx
在系统已经hang的时,可以在sqlplus命令中使用参数: -prelim来连接。这个时候连接到的是SGA而不是数据库。因为也没有session被创建。
连接上之后我们就可以启动trace或者使用sql来查看DB的相关信息。
四.关闭DB时hang住该做些什么
官网上的一篇文章:
What To Do and Not To Do When 'shutdown immediate' Hangs [ID 375935.1]
http://www.cndba.cn/Dave/article/548
The big problem in these situations is that it is noticed only after the shutdown immediate has been issued.
This kind of situation is mostly caused by 2 things:
1. a large query running at the shutdown moment.
2. a large transaction running at the shutdown moment.
Both have to complete in order for the database to be brought down when shutdown immediate is issued.
Actually, the files cannot be closed consistently because of one of the 2 possibilities above and, as such, the transition from OPEN to MOUNT is postponed until the files are closed, which means that either the large query completes or the large transaction is rolled back.This is not a hang, it is the expected behavior.
--这段讲的很清楚。 DB dismount时需要做的2个操作。
So, before issuing the shutdown immediate, it would be recommended to check the following views, especially when the database needs to be brought down for a very short period of time:
1. for large queries:
select count(*) from v$session_longops where time_remaining>0;
2. for large transactions:
select sum(used_ublk) from v$transaction;
A result greater than 0 for the first query and a large value returned for the second one would mean a relatively long time to wait until the shutdown immediate completes.
1. For the large queries situation, when the shutdown immediate is hanging, you can justbring down the database using: shutdown abort, as the database could be easily brought to a consistent state by:
startup restrict followed by shutdown immediate.
One should take the backup and/or do whatever else need to be done after the shutdown immediate.
--对于大量查询的,我们可以使用shutdown abort来结束。
2. For the second situation, the workaround cannot be applied, especially when it's needed to take a cold backup. The database must be closed in a consistent state in order to do this and the consistent state cannot be achieved until all the transactions have completed one way or another (commit/rollback).
As such, it's up to the local personnel to decide what to do, depending on the local needs.
It is very important to realize that:by shutting down a database you do not solve a performance problem caused by a large transaction. You are only making things worse.
There are situations when the database is brought down even when a large transaction/large recovery is taking place. Then it's brought up again and a new shutodwn is tried. Again, the shutdown immediate is hanging, for a very simple reason - the large recovery is still going on.
At this moment, the v$transaction view is not displaying anything. However, it is still possible to check the recovery operation by checking the:
select * from v$fast_start_transactions;
and/or
select * from v$fast_start_servers;
views. They are the ones that display the recovery status.
As such, when a large transaction is taking place, do not try successive shutdown aborts, startups and shutdown immediate. The hang will reoccur. The database must be consistent when the database is dismounted - performing successive shutdowns/startups is not helping at all, it's only making the recovery even more lengthy.
--对于大量事务的情况下,不要强制关闭数据库。
You should prevent these situations by notifying the users a shutdown will be done and no large operations should be started.
If a large operation has already started at the moment when you want to shutdown immedate, assess what would be faster - rollback the current situation or allow it to complete.
另一篇文章:
Shutdown Normal or Shutdown Immediate Hangs. SMON disabling TX Recovery [ID 1076161.6]
http://www.cndba.cn/Dave/article/546
在这篇文章里提到了如下内容:
During a SHUTDOWN IMMEDIATE and SHUTDOWN NORMAL, SMON is cleaning up extents which are no longer needed and marking them as freed.
Either wait for SMON to clean up the free extents in the database as it shuts down or perform a SHUTDOWN ABORT to shutdown the instance. A SHUTDOWN ABORT will not perform a clean shutdown.
五.建议关闭DB的顺序
1.停应用
2.停监听
3. shutdown immediate
如果这样还是无法关闭的话,可以尝试手工提交(commit)和checkpoint(alter system checkpoint),如果还没有办法,就kill session 和 监听的进程(LOCAL=NO)。
Oracle服务器进程中的 LOCAL=NO和 LOCAL=YES
http://www.cndba.cn/Dave/article/1160
在这篇文章里做了一点说明, 就是在服务器上连接数据库,它是不走监听的。 就是LOCAL=YES的进程。对于非本地的连接,都是走监听。 在服务器上显示的是LOCAL=NO。
注意:对于已经建立的非本地监听连接,即LOCAL=NO。关闭监听对它们是没有影响的。 相关进程仍然存在,通过它们与数据库连接的session仍然可以继续执行事务操作。
------------------------------------------------------------------------------
QQ:492913789
Email:ahdba@qq.com
Blog: http://www.cndba.cn/dave
网上资源: http://tianlesoftware.download.csdn.net
相关视频:http://blog.csdn.net/tianlesoftware/archive/2009/11/27/4886500.aspx
DBA1群:62697716(满); DBA2群:62697977(满)
DBA3群:62697850 DBA 超级群:63306533;
聊天群:40132017
--加群需要在备注说明Oracle表空间和数据文件的关系,否则拒绝申请