Oracle 11g DG手工switchover切换标准化流程
环境:RHEL 6.5 + Oracle GI 11.2.0.4 + Oracle DB 11.2.0.4
Primary RAC(2 nodes) + Standby RAC(2 nodes)
Oracle DG切换类型有两种:switchover和failover。对于switchover而言,是计划内的由DBA主动去执行的操作,所以它的操作步骤一定是可以形成标准化流程的。
本文就在我的实验环境下做一次基本的标准化switchover流程:
准备工作
切换前准备:最好可以先关闭主备库RAC的其他节点
我这里是分别关闭主备库的第二个节点:
PRIMARY NODE2'Instance
[grid@jyrac2 ~]$ srvctl stop instance -d jyzhao -i jyzhao2
STANDBY NODE2'Instance
[grid@jystdrac2 ~]$ srvctl stop instance -d mynas -i jyzhao2
当然也可以SQLplus操作关闭其他节点的实例.
注:如果不关闭,正常切换时也会自动被关闭。只是为了防止某些环境有其他问题,手工先关闭其他实例可以方便排查。
1.主库切换为备库
1.主库切换为备库:
ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN;
观察主库alert日志:
Sun Aug 13 09:54:53 2017
alter database commit to switchover to standby with session shutdown
ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY [Process Id: 13741] (jyzhao1)
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Waiting for all FAL entries to be archived...
All FAL entries have been archived.
Waiting for potential Physical Standby switchover target to become synchronized...
Active, synchronized Physical Standby switchover target has been identified
Sun Aug 13 09:54:56 2017
Errors in file /opt/app/oracle/diag/rdbms/jyzhao/jyzhao1/trace/jyzhao1_j000_29834.trc:
ORA-12012: error on auto execute of job 3
ORA-16456: switchover to standby in progress or completed
Switchover End-Of-Redo Log thread 1 sequence 182 has been fixed
Switchover: Primary highest seen SCN set to 0x0.0x456df2
ARCH: Noswitch archival of thread 1, sequence 182
ARCH: End-Of-Redo Branch archival of thread 1 sequence 182
ARCH: LGWR is actively archiving destination LOG_ARCHIVE_DEST_2
ARCH: Standby redo logfile selected for thread 1 sequence 182 for destination LOG_ARCHIVE_DEST_2
Archived Log entry 411 added for thread 1 sequence 182 ID 0x958da9ee dest 1:
ARCH: Archiving is disabled due to current logfile archival
Primary will check for some target standby to have received alls redo
Final check for a synchronized target standby. Check will be made once.
Sun Aug 13 09:54:59 2017
Process (ospid 4297) is suspended due to switchover to physical standby operation.
LOG_ARCHIVE_DEST_2 is a potential Physical Standby switchover target
Active, synchronized target has been identified
Target has also received all redo
Backup controlfile written to trace file /opt/app/oracle/diag/rdbms/jyzhao/jyzhao1/trace/jyzhao1_ora_13741.trc
Clearing standby activation ID 2509089262 (0x958da9ee)
The primary database controlfile was created using the
'MAXLOGFILES 192' clause.
There is space for up to 188 standby redo logfiles
Use the following SQL commands on the standby database to create
standby redo logfiles that match the primary database:
ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE 'srl5.f' SIZE 52428800;
Archivelog for thread 1 sequence 182 required for standby recovery
Switchover: Primary controlfile converted to standby controlfile succesfully.
Switchover: Complete - Database shutdown required
USER (ospid: 13741): terminating the instance
Sun Aug 13 09:55:00 2017
ORA-1092 : opitsk aborting process
Instance terminated by USER, pid = 13741
Completed: alter database commit to switchover to standby with session shutdown
Shutting down instance (abort)
License high water mark = 11
Sun Aug 13 09:55:01 2017
Instance shutdown complete
主要注意到正常应该有“End-Of-Redo Branch archival”字样,并且最终成功切换到standby,最后数据库是关闭的。
2.备库切换为主库
操作之前,可以看alert日志,也可以使用SQL查询是否可以切换:
select OPEN_MODE, DATABASE_ROLE, SWITCHOVER_STATUS, FORCE_LOGGING, DATAGUARD_BROKER, GUARD_STATUS from v$database;
2.备库切换为主库:
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
如果因为活动会话无法关闭可以考虑加上with session shutdown选项。
查看备库alert日志:
Sun Aug 13 09:58:30 2017
alter database commit to switchover to primary
ALTER DATABASE SWITCHOVER TO PRIMARY (jyzhao1)
Maximum wait for role transition is 15 minutes.
Switchover: Media recovery is still active
Role Change: Canceling MRP - no more redo to apply
Sun Aug 13 09:58:31 2017
MRP0: Background Media Recovery cancelled with status 16037
Errors in file /opt/app/oracle/diag/rdbms/mynas/jyzhao1/trace/jyzhao1_mrp0_7745.trc:
ORA-16037: user requested cancel of managed recovery operation
Sun Aug 13 09:58:31 2017
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
MRP0: Background Media Recovery process shutdown (jyzhao1)
Role Change: Canceled MRP
All dispatchers and shared servers shutdown
CLOSE: killing server sessions.
CLOSE: all sessions shutdown successfully.
Sun Aug 13 09:58:34 2017
SMON: disabling cache recovery
Backup controlfile written to trace file /opt/app/oracle/diag/rdbms/mynas/jyzhao1/trace/jyzhao1_ora_7669.trc
SwitchOver after complete recovery through change 4550130
Online log +DATA/mynas/onlinelog/group_1.266.951608731: Thread 1 Group 1 was previously cleared
Online log +FRA/mynas/onlinelog/group_1.257.951608737: Thread 1 Group 1 was previously cleared
Online log +DATA/mynas/onlinelog/group_2.267.951608745: Thread 1 Group 2 was previously cleared
Online log +FRA/mynas/onlinelog/group_2.258.951608751: Thread 1 Group 2 was previously cleared
Online log +DATA/mynas/onlinelog/group_3.268.951608757: Thread 2 Group 3 was previously cleared
Online log +FRA/mynas/onlinelog/group_3.259.951608763: Thread 2 Group 3 was previously cleared
Online log +DATA/mynas/onlinelog/group_4.269.951608769: Thread 2 Group 4 was previously cleared
Online log +FRA/mynas/onlinelog/group_4.260.951608775: Thread 2 Group 4 was previously cleared
Standby became primary SCN: 4550128
AUDIT_TRAIL initialization parameter is changed back to its original value as specified in the parameter file.
Switchover: Complete - Database mounted as primary
Completed: alter database commit to switchover to primary
Sun Aug 13 09:59:07 2017
ARC1: Becoming the 'no SRL' ARCH
最后注意到备库成功切换到主库,启动到mount状态。
3.新主库open,新备库启动并开启MRP
上面已经完成了切换,这一步只是把新主库open,新备库启动并开启MRP:
--NEW PRIMARY:
ALTER DATABASE OPEN;
--NEW STANDBY:
STARTUP
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
4.确定同步正常,启动其他节点
确定同步正常,启动其他节点:
PRIMARY NODE2'Instance
[grid@jyrac2 ~]$ srvctl start instance -d jyzhao -i jyzhao2
STANDBY NODE2'Instance
[grid@jystdrac2 ~]$ srvctl start instance -d mynas -i jyzhao2
至此,完成Oracle 11g标准化switchover切换操作。
我这里发现一个小问题,就是切换后发现无法实时同步,最终发现是备库的配置还是ARCH:
log_archive_dest_2 string SERVICE=jyzhao ARCH VALID_FOR=
(ONLINE_LOGFILES,PRIMARY_ROLE)
DB_UNIQUE_NAME=jyzhao
修改备库的配置,去掉ARCH,也就是使用LGWR传输即可:
alter SYSTEM SET log_archive_dest_2 = 'SERVICE=jyzhao VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=jyzhao' sid='*' SCOPE=BOTH;
关于使用这两种方式传输日志的区别可参考《ARCH和LGWR进程同步DG日志的区别》。