oracle实例诡异down的真实原因

时间:2021-01-21 21:09:45

 

 

 

oracle实例诡异down的真实原因

 

 

1、监控同事说oracle测试库又自动down了。

我登录检查一看,没有oracle的后台进程在跑,oracle实例确实被关闭,马上去启动吧。

# 启动失败

SQL> startup;

ORACLE instance started.

 

Total System Global Area 1603411968 bytes

Fixed Size              2213776 bytes

Variable Size                989857904 bytes

Database Buffers      603979776 bytes

Redo Buffers                 7360512 bytes

Database mounted.

ORA-03113: end-of-file on communication channel                                                      

Process ID: 13735

Session ID: 191 Serial number: 3

 

 

SQL>

 

 

 

 

2、分析down的原因

去看后台alert日志:

Errors in file /oracle/app/oracle/diag/rdbms/powerdes/powerdes/trace/powerdes_ora_8179.trc:

ORA-19815: WARNING: db_recovery_file_dest_size of 5218762752 bytes is 100.00% used, and has 0 remaining bytes available.

************************************************************************

You have following choices to free up space from recovery area:

1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard,

   then consider changing RMAN ARCHIVELOG DELETION POLICY.

2. Back up files to tertiary device such as tape using RMAN

   BACKUP RECOVERY AREA command.

3. Add disk space and increase db_recovery_file_dest_size parameter to

   reflect the new space.

4. Delete unnecessary files using RMAN DELETE command. If an operating

   system command was used to delete files, then use RMAN CROSSCHECK and

   DELETE EXPIRED commands.

************************************************************************

Errors in file /oracle/app/oracle/diag/rdbms/powerdes/powerdes/trace/powerdes_ora_8179.trc:

ORA-19809: limit exceeded for recovery files

ORA-19804: cannot reclaim 50173952 bytes disk space from 5218762752 limit

ARCH: Error 19809 Creating archive log file to '/oracle/app/oracle/flash_recovery_area/POWERDES/archivelog/2016_09_09/o1_mf_1_1589_%u_.arc'

Errors in file /oracle/app/oracle/diag/rdbms/powerdes/powerdes/trace/powerdes_ora_8179.trc:

ORA-16038: log 2 sequence# 1589 cannot be archived

ORA-19809: limit exceeded for recovery files

ORA-00312: online log 2 thread 1: '/data/oracle/powerdes/redo02.log'

USER (ospid: 8179): terminating the instance due to error 16038

Instance terminated by USER, pid = 8179

 

原因分析找到了,是ORA-19815: WARNING:db_recovery_file_dest_size of 5218762752 bytes is 100.00% used归档日志闪回区满了,所以oracle自动down了。而且也提示了4种解决方案:

(1)      设置归档日志过期策略

(2)      执行BACKUP RECOVERY AREA命令

(3)      增加db_recovery_file_dest_size大小

(4)      RMAN删除过期的归档日志

 

这里准备采用速度最快的第(4)种解决方案:RMAN删除归档日志

 

 

3、RMAN删除归档日志解决

(1)先以mount方式启动数据库,保证能rman登录。

# 先以mount启动

SQL> startup mount;

ORACLE instance started.

 

Total System Global Area 1603411968 bytes

Fixed Size              2213776 bytes

Variable Size                989857904 bytes

Database Buffers      603979776 bytes

Redo Buffers                 7360512 bytes

Database mounted.

SQL>

SQL>

 

 

 

(2)使用rman登录进去,准备清理过期的归档日志,delete archivelog all completed before 'sysdate-1';删除一天前的归档日志:

 

# 然后rman登录清理归档日志

[oracle@hch_test_121_90 ~]$

[oracle@hch_test_121_90 ~]$ rlwrap rman target /

 

Recovery Manager: Release 11.2.0.1.0 - Production on Wed Sep 14 13:39:15 2016

 

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

 

connected to target database: POWERDES (DBID=3458668465, not open)

 

RMAN> delete archivelog all completed before 'sysdate-1';

 

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=129 device type=DISK

List of Archived Log Copies for database with db_unique_name POWERDES

=====================================================================

 

Key     Thrd Seq     S Low Time

------- ---- ------- - ---------

43      1    1469    A 21-MAR-16

        Name: /oracle/app/oracle/flash_recovery_area/POWERDES/archivelog/2016_03_21/o1_mf_1_1469_ch00ykdh_.arc

 

44      1    1470    A 21-MAR-16

        Name: /oracle/app/oracle/flash_recovery_area/POWERDES/archivelog/2016_03_22/o1_mf_1_1470_ch07y0vk_.arc

……

 

 

(3)最后打开数据库

# 最后打开数据库

SQL>

SQL>

SQL> alter database open;

 

Database altered.

 

SQL>

 

 

 

 

4、长久解决之道

别的处理办法添加db_recovery_file_dest_size值,修改完后,添加到启动参数里面,永久生效:

SQL> show parameter db_recovery_file_dest_size;

 

NAME                                        TYPE       VALUE

------------------------------------ ----------- ------------------------------

db_recovery_file_dest_size         big integer 4977M

SQL>

 

# 扩大参数值

SQL> ALTER SYSTEM SET db_recovery_file_dest_size=8g scope=both;

 

System altered.

 

SQL> show parameter db_recovery_file_dest_size;

 

NAME                                        TYPE       VALUE

------------------------------------ ----------- ------------------------------

db_recovery_file_dest_size         big integer 8G

SQL>

 

# 报错修改到启动参数

SQL> create pfile from spfile;

 

File created.

 

SQL>

 

# 查看使用情况

SQL> select name,space_limit,space_used,number_of_files from v$recovery_file_dest;

 

NAME

--------------------------------------------------------------------------------

SPACE_LIMIT SPACE_USED NUMBER_OF_FILES

----------- ---------- ---------------

/oracle/app/oracle/flash_recovery_area

 6442450944 2905126912                   61

 

 

SQL>

 

 

 

 

5、永久解决之道

第4步骤,增加db_recovery_file_dest_size值后,会延长这个闪回区的使用时间,但是总归有一天是会满的,为了永久性的解决问题,准备做个定时任务每天去清理一遍过期归档日志,这样就可以基本保证闪回区是有足够的存储空间的。

 

清理过期归档日志脚本archivelog_clear.sh:

#!/bin/sh

BACK_DIR=/oracle/backup/data

export DATE=`date +%F`

su - oracle -c "

mkdir -p $BACK_DIR/$DATE

rman log=$BACK_DIR/$DATE/rman_backup.log target / <<EOF

crosscheck backup;

crosscheck archivelog all;

delete noprompt expired backup;

delete noprompt expired archivelog all;

delete noprompt obsolete;

exit

EOF

"

 

 

建立crontab 任务每天运行清理一次过期归档日志

10 01 * * * /oracle/backup/scripts/archivelog_clear.sh  >> /oracle/backup/data/_fullback.log 2>&1