oracle实例诡异down的真实原因
1、监控同事说oracle测试库又自动down了。
我登录检查一看,没有oracle的后台进程在跑,oracle实例确实被关闭,马上去启动吧。
# 启动失败 SQL> startup; ORACLE instance started.
Total System Global Area 1603411968 bytes Fixed Size 2213776 bytes Variable Size 989857904 bytes Database Buffers 603979776 bytes Redo Buffers 7360512 bytes Database mounted. ORA-03113: end-of-file on communication channel Process ID: 13735 Session ID: 191 Serial number: 3
SQL> |
2、分析down的原因
去看后台alert日志:
Errors in file /oracle/app/oracle/diag/rdbms/powerdes/powerdes/trace/powerdes_ora_8179.trc: ORA-19815: WARNING: db_recovery_file_dest_size of 5218762752 bytes is 100.00% used, and has 0 remaining bytes available. ************************************************************************ You have following choices to free up space from recovery area: 1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard, then consider changing RMAN ARCHIVELOG DELETION POLICY. 2. Back up files to tertiary device such as tape using RMAN BACKUP RECOVERY AREA command. 3. Add disk space and increase db_recovery_file_dest_size parameter to reflect the new space. 4. Delete unnecessary files using RMAN DELETE command. If an operating system command was used to delete files, then use RMAN CROSSCHECK and DELETE EXPIRED commands. ************************************************************************ Errors in file /oracle/app/oracle/diag/rdbms/powerdes/powerdes/trace/powerdes_ora_8179.trc: ORA-19809: limit exceeded for recovery files ORA-19804: cannot reclaim 50173952 bytes disk space from 5218762752 limit ARCH: Error 19809 Creating archive log file to '/oracle/app/oracle/flash_recovery_area/POWERDES/archivelog/2016_09_09/o1_mf_1_1589_%u_.arc' Errors in file /oracle/app/oracle/diag/rdbms/powerdes/powerdes/trace/powerdes_ora_8179.trc: ORA-16038: log 2 sequence# 1589 cannot be archived ORA-19809: limit exceeded for recovery files ORA-00312: online log 2 thread 1: '/data/oracle/powerdes/redo02.log' USER (ospid: 8179): terminating the instance due to error 16038 Instance terminated by USER, pid = 8179 |
原因分析找到了,是ORA-19815: WARNING:db_recovery_file_dest_size of 5218762752 bytes is 100.00% used归档日志闪回区满了,所以oracle自动down了。而且也提示了4种解决方案:
(1) 设置归档日志过期策略
(2) 执行BACKUP RECOVERY AREA命令
(3) 增加db_recovery_file_dest_size大小
(4) RMAN删除过期的归档日志
这里准备采用速度最快的第(4)种解决方案:RMAN删除归档日志
3、RMAN删除归档日志解决
(1)先以mount方式启动数据库,保证能rman登录。
# 先以mount启动 SQL> startup mount; ORACLE instance started.
Total System Global Area 1603411968 bytes Fixed Size 2213776 bytes Variable Size 989857904 bytes Database Buffers 603979776 bytes Redo Buffers 7360512 bytes Database mounted. SQL> SQL>
|
(2)使用rman登录进去,准备清理过期的归档日志,delete archivelog all completed before 'sysdate-1';删除一天前的归档日志:
# 然后rman登录清理归档日志 [oracle@hch_test_121_90 ~]$ [oracle@hch_test_121_90 ~]$ rlwrap rman target /
Recovery Manager: Release 11.2.0.1.0 - Production on Wed Sep 14 13:39:15 2016
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: POWERDES (DBID=3458668465, not open)
RMAN> delete archivelog all completed before 'sysdate-1';
using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=129 device type=DISK List of Archived Log Copies for database with db_unique_name POWERDES =====================================================================
Key Thrd Seq S Low Time ------- ---- ------- - --------- 43 1 1469 A 21-MAR-16 Name: /oracle/app/oracle/flash_recovery_area/POWERDES/archivelog/2016_03_21/o1_mf_1_1469_ch00ykdh_.arc
44 1 1470 A 21-MAR-16 Name: /oracle/app/oracle/flash_recovery_area/POWERDES/archivelog/2016_03_22/o1_mf_1_1470_ch07y0vk_.arc …… |
(3)最后打开数据库
# 最后打开数据库 SQL> SQL> SQL> alter database open;
Database altered.
SQL> |
4、长久解决之道
别的处理办法添加db_recovery_file_dest_size值,修改完后,添加到启动参数里面,永久生效:
SQL> show parameter db_recovery_file_dest_size;
NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_recovery_file_dest_size big integer 4977M SQL>
# 扩大参数值 SQL> ALTER SYSTEM SET db_recovery_file_dest_size=8g scope=both;
System altered.
SQL> show parameter db_recovery_file_dest_size;
NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_recovery_file_dest_size big integer 8G SQL>
# 报错修改到启动参数 SQL> create pfile from spfile;
File created.
SQL>
# 查看使用情况 SQL> select name,space_limit,space_used,number_of_files from v$recovery_file_dest;
NAME -------------------------------------------------------------------------------- SPACE_LIMIT SPACE_USED NUMBER_OF_FILES ----------- ---------- --------------- /oracle/app/oracle/flash_recovery_area 6442450944 2905126912 61
SQL> |
5、永久解决之道
第4步骤,增加db_recovery_file_dest_size值后,会延长这个闪回区的使用时间,但是总归有一天是会满的,为了永久性的解决问题,准备做个定时任务每天去清理一遍过期归档日志,这样就可以基本保证闪回区是有足够的存储空间的。
清理过期归档日志脚本archivelog_clear.sh:
#!/bin/sh BACK_DIR=/oracle/backup/data export DATE=`date +%F` su - oracle -c " mkdir -p $BACK_DIR/$DATE rman log=$BACK_DIR/$DATE/rman_backup.log target / <<EOF crosscheck backup; crosscheck archivelog all; delete noprompt expired backup; delete noprompt expired archivelog all; delete noprompt obsolete; exit EOF " |
建立crontab 任务每天运行清理一次过期归档日志
10 01 * * * /oracle/backup/scripts/archivelog_clear.sh >> /oracle/backup/data/_fullback.log 2>&1 |