oracle rac 11gr2 OCR丢失恢复

时间:2024-03-29 09:16:07

一、故障概述

因为数据库服务器重启出现OCR盘丢失导致oracle数据库集群宕机。

二、故障重现

在测试环境破坏保存OCR信息的磁盘组+CVD

  • 使用dd清除OCR的磁盘头信息(因为OCR是normal,所以掉了两个才无法启动)

[[email protected] ~]# dd if=/dev/zero of=/dev/oracleasm/disks/CVD01 bs=1024 count=1000

[[email protected] ~]# dd if=/dev/zero of=/dev/oracleasm/disks/CVD02 bs=1024 count=1000

  • 打开集群的alert日志并打开集群观察日志

[[email protected] ~]$ tail -200f /u01/app/11.2/grid/log/rac1/alertrac1.log

[[email protected] grid]# crsctl start crs

然后发现日志出现以下报错信息:

oracle rac 11gr2 OCR丢失恢复

 说明集群找不到voting盘文件,ocr记录的是集群配置信息

使用ocrcheck检测报错也无法显示信息:

[[email protected] ~]# ocrcheck

PROT-602: Failed to retrieve data from the cluster registry

PROC-26: Error while accessing the physical storage

三、故障恢复

  • 强制停止crs

[[email protected] ~]# crsctl stop crs –f

  • 检查确认集群停止成功

[[email protected] ~]# ps -ef|grep crs

[[email protected] ~]# ps -ef|grep css

[[email protected] ~]# ps -ef|grep ohas

[[email protected] ~]# ps -ef|grep evm

  • 以独占模式启动crs

[[email protected] grid]# crsctl start crs -excl -nocrs

CRS-4123: Oracle High Availability Services has been started.

CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1'

CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'

CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'

CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'

CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'

CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded

CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.cssd' on 'rac1'

CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'

CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded

CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac1'

CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac1'

CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'

CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'

CRS-2676: Start of 'ora.drivers.acfs' on 'rac1' succeeded

CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded

CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded

CRS-2672: Attempting to start 'ora.asm' on 'rac1'

CRS-2676: Start of 'ora.asm' on 'rac1' succeeded

说明:

-excl 该参数指定使用独占模式

-nocrs 该参数指定忽略查找crsvoting

  • 创建CVD磁盘组以及spfile

[[email protected] ~]$ asmcmd

ASMCMD> ls

空的,因为spfile文件存在CVD磁盘组当中,spfile也因此丢失,读不到DATA和ARCH磁盘组,需要重建spfile参数文件。

 

1)创建一个pfile参数文件

[[email protected] ~]$ cat /tmp/asm_pfile.ora

+ASM1.asm_diskgroups='ARCHASMGROUP', 'DATAASMGROUP'#Manual Mount

+ASM2.asm_diskgroups='ARCHASMGROUP', 'DATAASMGROUP'#Manual Mount

*.asm_diskstring='/dev/oracleasm/disks/*'

*.asm_power_limit=1

*.diagnostic_dest='/u01/app/grid'

*.instance_type='asm'

*.large_pool_size=12M

*.remote_login_passwordfile='EXCLUSIVE'

2)关闭crs

[[email protected] ~]# crsctl stop crs

[[email protected] ~]# crsctl start crs -excl –nocrs

3)使用参数文件启动ASM

[[email protected] ~]$ sqlplus / as sysasm

SQL>shutdown immediate

ASM diskgroups dismounted

ASM instance shutdown

SQL> startup pfile='/tmp/asm_pfile.ora';

ASM instance started

SQL> col path for a50

SQL> set linesize 170

SQL> select group_number, disk_number, mount_status, header_status, path from v$asm_disk;

 

4)创建CRS磁盘组(注意磁盘组和原先的一致,否则在后续恢复的时候会报错)

SQL> create diskgroup CVD normal redundancy disk '/dev/oracleasm/disks/CVD1','/dev/oracleasm/disks/CVD2', '/dev/oracleasm/disks/CVD3' attribute 'au_size'='1M' 'compatible.asm'='11.2.0.0.0', 'compatible.rdbms'='11.2.0.0.0';

create diskgroup CVD normal redundancy disk '/dev/oracleasm/disks/CVD01','/dev/oracleasm/disks/CVD02', '/dev/oracleasm/disks/CVD03'

*

ERROR at line 1:

ORA-15018: diskgroup cannot be created

ORA-15033: disk '/dev/oracleasm/disks/CVD3' belongs to diskgroup "CVD"    --这里报错是因为CVD3没清除磁盘头信息

 

5)清除CVD03磁盘头信息

[[email protected] dev]# dd if=/dev/zero of=/dev/oracleasm/disks/CVD03 bs=1024 count=1000

SQL> create diskgroup CVD normal redundancy disk '/dev/oracleasm/disks/CVD01','/dev/oracleasm/disks/CVD02', '/dev/oracleasm/disks/CVD03' attribute 'au_size'='1M' 'compatible.asm'='11.2.0.0.0', 'compatible.rdbms'='11.2.0.0.0';

Diskgroup created.

 

6)创建spfile参数文件

SQL> create spfile='+CVD' from pfile='/tmp/asm_pfile.ora';

File created.

 

7)查看参数文件已经存在CVD盘中

[[email protected] ~]$ asmcmd

ASMCMD> ls

CVD/

ARCHASMGROUP/

DATAASMGROUP/

ASMCMD> ls CVD/

rac

ASMCMD> ls CVD/rac

ASMPARAMETERFILE/

ASMCMD> ls CVD/rac/ASMPARAMETERFILE

REGISTRY.253.941399897

 

  • 利用备份恢复OCR

[[email protected] ~]# ocrconfig -restore /u01/app/11.2/grid/cdata/rac/backup00.ocr

  • 恢复voting disk

[[email protected] ~]# crsctl replace votedisk +CVD

Successful addition of voting disk 429d9c75-6a804fdf-bfeb8f1d-fa8358e9.

Successful addition of voting disk af38c76f-9d474f1f-bfa5c594-64fb0368.

Successful addition of voting disk 185a41992e3d4f57bf86248889c08561.

Successfully replaced voting disk group with +CVD.

CRS-4266: Voting file(s) successfully replaced

  • 检测

[[email protected] ~]# ocrcheck

oracle rac 11gr2 OCR丢失恢复

[[email protected] ~]# crsctl query css votedisk

oracle rac 11gr2 OCR丢失恢复

  • 停止CRS已正常的方式启动

[[email protected] ~]# crsctl stop crs

[[email protected] ~]# crsctl start crs

[[email protected] ~]# crsctl start crs

[[email protected] ~]# crsctl stat res -t

四、检查恢复环境

  • 集群信息

[[email protected] ~]# crsctl stat res -t

  • 检查ASM spfile信息

SQL> show parameter spfile

  • 检查ocr vote信息

[[email protected] ~]# ocrcheck

  • 检查DG是否正常

SQL>  archive log list

SQL>  select count(*) from v$archived_gap;

SQL>  select process,status,thread#,sequence# from v$managed_standby;

SQL>  select THREAD#,SEQUENCE#,ARCHIVED,APPLIED from v$archived_log;