CONTROLFILE AUTOBACKUP ON 间接导致数据库hang

时间:2022-12-14 23:40:35
今天在上班的路上,客户就联系我说数据库停了,赶忙一路小跑到了公司:
在RAC 11.2.0.4上部署的rman自动备份脚本
具体步骤是先备份数据文件,再备份归档并delete input,再crosscheck,再delete expired 

但是今天数据库整个hang了!

紧急连上,查看了一下,放归档的+FRA磁盘组都满了
asmcmd中看到近半个月的归档一点都没删!

先看看
select * from v$recover_file;
是空的

于是赶忙
RMAN> delete archivelog all completed before  'sysdate-7' ;
先把数据库跑起来,业务恢复了,OK

然后看看v$RMAN_BACKUP_JOB_DETAILS ,发现近半个月的备份全部fail~

赶紧看看备份脚本情况:
1
input datafile file number=00029 name=+DATA/orcl/datafile/lgbddb01.dbf
2
input datafile file number=00031 name=+DATA/orcl/datafile/tlgldpdb01.dbf
3
channel d1: starting piece 1 at 15-SEP-17
4
channel d1: finished piece 1 at 15-SEP-17
5
piece handle=/database_backup/oradb/rmanbackup/orcl_full_37sei8jl_1_1 tag=TAG20170915T230005 comment=NONE
6
channel d1: backup set complete, elapsed time: 00:24:05
7
Finished backup at 15-SEP-17
8
9
Starting Control File and SPFILE Autobackup at 15-SEP-17
10
released channel: d1
11
RMAN-00571: ===========================================================
12
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
13
RMAN-00571: ===========================================================
14
RMAN-03009: failure of Control File and SPFILE Autobackup command on d1 channel at 09/15/2017 23:24:18
15
ORA-19809: limit exceeded for recovery files
16
ORA-19804: cannot reclaim 68157440 bytes disk space from 8589934592 limit

没空间?备份磁盘还还好几十TB呢!!再说备份磁盘里面每天的全备都有啊,为什么rman视图会说fail?

最近半个月全是这个情况!!
仔细看:
Starting Control File and SPFILE   Autobackup  at 15-SEP-17

立刻联想到:
当初我部署备份脚本的时候,发现 CONTROLFILE AUTOBACKUP的特性那么好,为啥不用呢?于是我就把rman设置:
CONFIGURE CONTROLFILE AUTOBACKUP ON;

难道是这个配置的原因?立刻联想到,我没配置过RMAN备份的默认路径,备份路径全在脚本中指定的,那么默认会放在哪呢?
查资料发现:
CONTROLFILE AUTOBACKUP ON的情况下,当进行备份(不管是全备还是增备,命令还是脚本),会将controlfile 和spfile存放到rman配置的format位置:
如果rman没有显式的配置备份路径,也就是没有如下操作,
RMAN>CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/backup/%F' ;

假如配置了,那么:
RMAN> show all;

RMAN configuration parameters for database with db_unique_name ORCL are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP OFF; # default
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/backup/%F''; # default
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u02/app/oracle/product/11.2.0/db_home/dbs/snapcf_orcl1.f'; # default

继续

假如不配置 CONTROLFILE AUTOBACKUP FORMAT,又开启了恢复区

SQL> show parameter   db_recovery_file_dest
NAME                                TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest                string      +BACK
db_recovery_file_dest_size          big integer 20G

那么就会将备份存放到恢复区里面!
假如没配置恢复区呢?就会存放在$ORACLE_HOME/dbs下
--总之很坑。。。

但是这个位置是很容易满的,假如这个位置满了, 自动备份脚本在autobackup那一步就会失败,后面的备份归档和删归档语句就会失败,结果导致一直攒了半个月的归档,数据库hang了。。。

解决方案:
1. 关闭controlfile autobackup
2. 配置rman  format
3. 再次调大 db_recovery_file_dest_size,但是还可能犯,治标不治本

最后还是修改了
CONFIGURE CONTROLFILE AUTOBACKUP OFF;
等DB闲暇时手动执行备份脚本, v$RMAN_BACKUP_JOB_DETAILS里提示成功了,错误没再出现。
继续观察几天吧。