ORACLE RAC 11.2.0.4 for RHEL6.8 集群CRS异常导致集群命令无法使用

时间:2024-03-30 16:10:08

    近期的Oracle数据库巡检中,一地市巡检日志发现ASM使用率查询的语句报错,如图:

ORACLE RAC 11.2.0.4 for RHEL6.8 集群CRS异常导致集群命令无法使用

ASM使用率的查看语句如下:

prompt **************************** 监控asm状态磁盘使用率,asm磁盘使用率大于85%异常 **************************;
SELECT group_number,
       name,
       state,
       total_mb,
       free_mb,
       (1 - trunc(free_mb / total_mb, 2)) * 100 || '%' as pct_used
  FROM V$ASM_DISKGROUP;

正常情况下的返回结果如图:

ORACLE RAC 11.2.0.4 for RHEL6.8 集群CRS异常导致集群命令无法使用

登录服务器端,然后分别使用grid和oracle以sysdba身份登录数据库查询ASM使用率,执行上边查看ASM使用率的语句,发现报错依旧,然后使用grid执行

crs_stat -t查看集群状态,发现crs联系不到crs磁盘组了(2个节点均是相同状况),如图:

ORACLE RAC 11.2.0.4 for RHEL6.8 集群CRS异常导致集群命令无法使用

ORACLE RAC 11.2.0.4 for RHEL6.8 集群CRS异常导致集群命令无法使用

grid登录数据库查看ASM磁盘组的状态,发现CRSDG确实下线了:

ORACLE RAC 11.2.0.4 for RHEL6.8 集群CRS异常导致集群命令无法使用

手工将crsdg上线,命令能够执行成功:

ORACLE RAC 11.2.0.4 for RHEL6.8 集群CRS异常导致集群命令无法使用

ORACLE RAC 11.2.0.4 for RHEL6.8 集群CRS异常导致集群命令无法使用

虽然,通过grid以sysasm将crsdg重新上线了,但是grid再次执行crs_stat -t,发现命令还是报错说联系不到crs磁盘组:CRS-0184

root,执行crsctl start crs,则报错说命令无法执行:crsd进程正在running

检查节点zs09的alert_+ASM1.log有如下报错:

Tue Jul 04 01:24:52 2017
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 3.
Tue Jul 04 01:24:52 2017
NOTE: process _b000_+asm1 (88495) initiating offline of disk 0.4004286220 (CRSDG_0000) with mask 0x7e in group 3
NOTE: process _b000_+asm1 (88495) initiating offline of disk 1.4004286219 (CRSDG_0001) with mask 0x7e in group 3
NOTE: process _b000_+asm1 (88495) initiating offline of disk 2.4004286221 (CRSDG_0002) with mask 0x7e in group 3
NOTE: checking PST: grp = 3
GMON checking disk modes for group 3 at 13 for pid 27, osid 88495
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: checking PST for grp 3 done.
NOTE: initiating PST update: grp = 3, dsk = 0/0xeeac8f0c, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 1/0xeeac8f0b, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 2/0xeeac8f0d, mask = 0x6a, op = clear
GMON updating disk modes for group 3 at 14 for pid 27, osid 88495
ERROR: no read quorum in group: required 2, found 0 disks
Tue Jul 04 01:24:52 2017
NOTE: cache dismounting (not clean) group 3/0xAE3C7FF5 (CRSDG) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 88497, image: [email protected] (B001)
WARNING: Offline for disk CRSDG_0000 in mode 0x7f failed.
WARNING: Offline for disk CRSDG_0001 in mode 0x7f failed.
WARNING: Offline for disk CRSDG_0002 in mode 0x7f failed.
Tue Jul 04 01:24:52 2017
NOTE: halting all I/Os to diskgroup 3 (CRSDG)
Tue Jul 04 01:24:52 2017
NOTE: LGWR doing non-clean dismount of group 3 (CRSDG)
NOTE: LGWR sync ABA=5.5 last written ABA 5.5
Tue Jul 04 01:24:52 2017
kjbdomdet send to inst 2
detach from dom 3, sending detach message to inst 2
Tue Jul 04 01:24:52 2017
NOTE: No asm libraries found in the system
Tue Jul 04 01:24:52 2017
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 2, cluster inc 4)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 3 invalid = TRUE 
 6 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
freeing rdom 3
Tue Jul 04 01:24:52 2017
WARNING: dirty detached from domain 3
NOTE: cache dismounted group 3/0xAE3C7FF5 (CRSDG) 
SQL> alter diskgroup CRSDG dismount force /* ASM SERVER:2923200501 */ 
Tue Jul 04 01:24:52 2017
NOTE: cache deleting context for group CRSDG 3/0xae3c7ff5
GMON dismounting group 3 at 15 for pid 28, osid 88497
NOTE: Disk CRSDG_0000 in mode 0x7f marked for de-assignment
NOTE: Disk CRSDG_0001 in mode 0x7f marked for de-assignment
NOTE: Disk CRSDG_0002 in mode 0x7f marked for de-assignment
NOTE:Waiting for all pending writes to complete before de-registering: grpnum 3
ASM Health Checker found 1 new failures
Tue Jul 04 01:25:22 2017
SUCCESS: diskgroup CRSDG was dismounted
SUCCESS: alter diskgroup CRSDG dismount force /* ASM SERVER:2923200501 */
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group CRSDG
Tue Jul 04 01:25:22 2017
NOTE: diskgroup resource ora.CRSDG.dg is offline
Tue Jul 04 01:25:22 2017
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM1/trace/+ASM1_ora_164334.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM1/trace/+ASM1_ora_164334.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM1/trace/+ASM1_ora_164334.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM1/trace/+ASM1_ora_164334.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM1/trace/+ASM1_ora_164334.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM1/trace/+ASM1_ora_164334.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM1/trace/+ASM1_ora_164334.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM1/trace/+ASM1_ora_164334.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM1/trace/+ASM1_ora_164334.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Tue Jul 04 04:11:22 2017
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM1/trace/+ASM1_ora_164334.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
WARNING: requested mirror side 1 of virtual extent 4 logical extent 0 offset 425984 is not allocated; I/O request failed
WARNING: requested mirror side 2 of virtual extent 4 logical extent 1 offset 425984 is not allocated; I/O request failed
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM1/trace/+ASM1_ora_164334.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Tue Jul 04 04:11:22 2017
SQL> alter diskgroup CRSDG check /* proxy */ 
ORA-15032: not all alterations performed
ORA-15001: diskgroup "CRSDG" does not exist or is not mounted
ERROR: alter diskgroup CRSDG check /* proxy */

检查节点zs09的alertzs09.log有如下报错:

2017-07-04 04:11:22.352: 
[crsd(164279)]CRS-1006:The OCR location +crsdg is inaccessible. Details in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:44.061: 
[/u01/oracle/app/grid/home/bin/orarootagent.bin(164441)]CRS-5822:Agent '/u01/oracle/app/grid/home/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:3:4392} in /u01/oracle/app/grid/home/log/zs09/agent/crsd/orarootagent_root/orarootagent_root.log.
2017-07-04 04:12:44.061: 
[/u01/oracle/app/grid/home/bin/oraagent.bin(164609)]CRS-5822:Agent '/u01/oracle/app/grid/home/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:22254} in /u01/oracle/app/grid/home/log/zs09/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2017-07-04 04:12:44.061: 
[/u01/oracle/app/grid/home/bin/oraagent.bin(164437)]CRS-5822:Agent '/u01/oracle/app/grid/home/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:6} in /u01/oracle/app/grid/home/log/zs09/agent/crsd/oraagent_grid/oraagent_grid.log.
2017-07-04 04:12:44.063: 
[ohasd(163621)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs09'.
2017-07-04 04:12:45.386: 
[crsd(113304)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:45.392: 
[crsd(113304)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:46.084: 
[ohasd(163621)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs09'.
2017-07-04 04:12:47.402: 
[crsd(113321)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:47.408: 
[crsd(113321)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:48.103: 
[ohasd(163621)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs09'.
2017-07-04 04:12:49.421: 
[crsd(113338)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:49.427: 
[crsd(113338)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:50.123: 
[ohasd(163621)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs09'.
2017-07-04 04:12:51.441: 
[crsd(113392)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:51.447: 
[crsd(113392)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:52.142: 
[ohasd(163621)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs09'.
2017-07-04 04:12:53.462: 
[crsd(113405)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:53.468: 
[crsd(113405)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:54.162: 
[ohasd(163621)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs09'.
2017-07-04 04:12:55.482: 
[crsd(113418)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:55.488: 
[crsd(113418)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:56.182: 
[ohasd(163621)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs09'.
2017-07-04 04:12:57.491: 
[crsd(113431)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:57.497: 
[crsd(113431)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:58.202: 
[ohasd(163621)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs09'.
2017-07-04 04:12:59.530: 
[crsd(113453)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:12:59.536: 
[crsd(113453)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:13:00.221: 
[ohasd(163621)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs09'.
2017-07-04 04:13:01.538: 
[crsd(113466)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:13:01.544: 
[crsd(113466)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:13:02.241: 
[ohasd(163621)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs09'.
2017-07-04 04:13:03.557: 
[crsd(113480)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:13:03.563: 
[crsd(113480)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs09/crsd/crsd.log.
2017-07-04 04:13:04.261: 
[ohasd(163621)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs09'.
2017-07-04 04:13:04.261: 
[ohasd(163621)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2017-07-04 04:13:04.261: 
[ohasd(163621)]CRS-2769:Unable to failover resource 'ora.crsd'.

检查节点zs09的oraagent_grid.log,有如下报错:

2017-07-04 01:25:22.485: [    AGFW][329238272]{1:25901:1741} Agent received the message: RESOURCE_STOP[ora.CRSDG.dg zs09 1] ID 4099:828692
2017-07-04 01:25:22.485: [    AGFW][329238272]{1:25901:1741} Preparing STOP command for: ora.CRSDG.dg zs09 1
2017-07-04 01:25:22.485: [    AGFW][329238272]{1:25901:1741} ora.CRSDG.dg zs09 1 state changed from: ONLINE to: STOPPING
2017-07-04 01:25:22.486: [ora.CRSDG.dg][308602624]{1:25901:1741} [stop] (:CLSN00108:) clsn_agent::stop {
2017-07-04 01:25:22.486: [ora.CRSDG.dg][308602624]{1:25901:1741} [stop] DgpAgent::stop: enter { 
2017-07-04 01:25:22.486: [ora.CRSDG.dg][308602624]{1:25901:1741} [stop] getResAttrib: attrib name USR_ORA_OPI value true len 4
2017-07-04 01:25:22.486: [ora.CRSDG.dg][308602624]{1:25901:1741} [stop] Agent::flagUsrOraOpiIsSet(true) reason not dependency
2017-07-04 01:25:22.486: [ora.CRSDG.dg][308602624]{1:25901:1741} [stop] DgpAgent::stop: tha exit }
2017-07-04 01:25:22.486: [ora.CRSDG.dg][308602624]{1:25901:1741} [stop] DgpAgent::stopSingle status:2 }
2017-07-04 01:25:22.486: [ora.CRSDG.dg][308602624]{1:25901:1741} [stop] (:CLSN00108:) clsn_agent::stop }
2017-07-04 01:25:22.486: [    AGFW][308602624]{1:25901:1741} Command: stop for resource: ora.CRSDG.dg zs09 1 completed with status: SUCCESS
2017-07-04 01:25:22.486: [    AGFW][329238272]{1:25901:1741} Agent sending reply for: RESOURCE_STOP[ora.CRSDG.dg zs09 1] ID 4099:828692
2017-07-04 01:25:22.486: [ora.CRSDG.dg][308602624]{1:25901:1741} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2017-07-04 01:25:22.509: [ora.CRSDG.dg][308602624]{1:25901:1741} [check] DgpAgent::runCheck: asm stat asmRet 0
2017-07-04 01:25:22.509: [ora.CRSDG.dg][308602624]{1:25901:1741} [check] DgpAgent::getConnxn connected
2017-07-04 01:25:22.512: [ora.CRSDG.dg][308602624]{1:25901:1741} [check] DgpAgent::queryDgStatus excp no data found
2017-07-04 01:25:22.512: [ora.CRSDG.dg][308602624]{1:25901:1741} [check] DgpAgent::queryDgStatus no data found in v$asm_diskgroup_stat
2017-07-04 01:25:22.512: [ora.CRSDG.dg][308602624]{1:25901:1741} [check] DgpAgent::queryDgStatus dgName CRSDG ret 1
2017-07-04 01:25:22.512: [    AGFW][329238272]{1:25901:1741} ora.CRSDG.dg zs09 1 state changed from: STOPPING to: OFFLINE
2017-07-04 01:25:22.512: [    AGFW][329238272]{1:25901:1741} Agent sending last reply for: RESOURCE_STOP[ora.CRSDG.dg zs09 1] ID 4099:828692
2017-07-04 01:25:22.516: [ USRTHRD][301483776]{1:25903:1741} Processing the event CRS_RESOURCE_STATE_CHANGE

.....

2017-07-04 04:12:44.060: [ CRSCOMM][407422720] IpcC: IPC client connection 6c to member 0 has been removed
2017-07-04 04:12:44.060: [CLSFRAME][407422720] Removing IPC Member:{Relative|Node:0|Process:0|Type:1}
2017-07-04 04:12:44.060: [CLSFRAME][407422720] Disconnected from CRSD:zs09 process: {Relative|Node:0|Process:0|Type:1}
2017-07-04 04:12:44.061: [   AGENT][329238272]{0:1:6} {0:1:6} Created alert : (:CRSAGF00117:) :  Disconnected from server, Agent is shutting down.

检查节点zs10的alert_+ASM2.log有如下报错:

Thu Jun 29 01:37:44 2017
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
Thu Jun 29 01:37:44 2017
NOTE: process _b000_+asm2 (47520) initiating offline of disk 0.3916020174 (CRSDG_0000) with mask 0x7e in group 3
NOTE: process _b000_+asm2 (47520) initiating offline of disk 1.3916020175 (CRSDG_0001) with mask 0x7e in group 3
NOTE: process _b000_+asm2 (47520) initiating offline of disk 2.3916020176 (CRSDG_0002) with mask 0x7e in group 3
NOTE: checking PST: grp = 3
GMON checking disk modes for group 3 at 17 for pid 27, osid 47520
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: checking PST for grp 3 done.
NOTE: initiating PST update: grp = 3, dsk = 0/0xe969b9ce, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 1/0xe969b9cf, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 3, dsk = 2/0xe969b9d0, mask = 0x6a, op = clear
GMON updating disk modes for group 3 at 18 for pid 27, osid 47520
ERROR: no read quorum in group: required 2, found 0 disks
Thu Jun 29 01:37:44 2017
NOTE: cache dismounting (not clean) group 3/0xAE394938 (CRSDG) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 47522, image: [email protected] (B001)
Thu Jun 29 01:37:44 2017
NOTE: halting all I/Os to diskgroup 3 (CRSDG)
Thu Jun 29 01:37:44 2017
NOTE: LGWR doing non-clean dismount of group 3 (CRSDG)
NOTE: LGWR sync ABA=7.61 last written ABA 7.61
WARNING: Offline for disk CRSDG_0000 in mode 0x7f failed.
WARNING: Offline for disk CRSDG_0001 in mode 0x7f failed.
WARNING: Offline for disk CRSDG_0002 in mode 0x7f failed.
Thu Jun 29 01:37:44 2017
kjbdomdet send to inst 1
detach from dom 3, sending detach message to inst 1
Thu Jun 29 01:37:44 2017
NOTE: No asm libraries found in the system
Thu Jun 29 01:37:44 2017
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 4)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 3 invalid = TRUE 
 2 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
Thu Jun 29 01:37:45 2017
WARNING: dirty detached from domain 3
NOTE: cache dismounted group 3/0xAE394938 (CRSDG) 
SQL> alter diskgroup CRSDG dismount force /* ASM SERVER:2922989880 */ 
Thu Jun 29 01:37:45 2017
NOTE: cache deleting context for group CRSDG 3/0xae394938
GMON dismounting group 3 at 19 for pid 28, osid 47522
NOTE: Disk CRSDG_0000 in mode 0x7f marked for de-assignment
NOTE: Disk CRSDG_0001 in mode 0x7f marked for de-assignment
NOTE: Disk CRSDG_0002 in mode 0x7f marked for de-assignment
NOTE:Waiting for all pending writes to complete before de-registering: grpnum 3
ASM Health Checker found 1 new failures
Thu Jun 29 01:38:15 2017
SUCCESS: diskgroup CRSDG was dismounted
SUCCESS: alter diskgroup CRSDG dismount force /* ASM SERVER:2922989880 */
Thu Jun 29 01:38:15 2017
NOTE: diskgroup resource ora.CRSDG.dg is offline
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group CRSDG
Thu Jun 29 01:38:15 2017
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6099.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6099.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6099.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6099.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6099.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6099.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6099.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6099.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6099.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Thu Jun 29 03:54:58 2017
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6099.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
WARNING: requested mirror side 1 of virtual extent 4 logical extent 0 offset 425984 is not allocated; I/O request failed
WARNING: requested mirror side 2 of virtual extent 4 logical extent 1 offset 425984 is not allocated; I/O request failed
Errors in file /u01/oracle/app/grid/base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_6099.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Thu Jun 29 03:54:58 2017
SQL> alter diskgroup CRSDG check /* proxy */ 
ORA-15032: not all alterations performed
ORA-15001: diskgroup "CRSDG" does not exist or is not mounted
ERROR: alter diskgroup CRSDG check /* proxy */
Thu Jun 29 03:56:33 2017
NOTE: client exited [6087]
Thu Jun 29 03:56:34 2017
NOTE: [[email protected] (TNS V1-V3) 66505] opening OCR file
Thu Jun 29 03:56:36 2017
NOTE: [[email protected] (TNS V1-V3) 66522] opening OCR file
Thu Jun 29 03:56:38 2017
NOTE: [[email protected] (TNS V1-V3) 66536] opening OCR file
Thu Jun 29 03:56:40 2017
NOTE: [[email protected] (TNS V1-V3) 66549] opening OCR file
Thu Jun 29 03:56:42 2017
NOTE: [[email protected] (TNS V1-V3) 66562] opening OCR file
Thu Jun 29 03:56:44 2017
NOTE: [[email protected] (TNS V1-V3) 66575] opening OCR file
Thu Jun 29 03:56:46 2017
NOTE: [[email protected] (TNS V1-V3) 66588] opening OCR file
Thu Jun 29 03:56:48 2017
NOTE: [[email protected] (TNS V1-V3) 66605] opening OCR file
Thu Jun 29 03:56:50 2017
NOTE: [[email protected] (TNS V1-V3) 66659] opening OCR file
Thu Jun 29 03:56:52 2017
NOTE: [[email protected] (TNS V1-V3) 66681] opening OCR file
Tue Jul 04 01:25:27 2017
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
Tue Jul 04 01:25:27 2017
 Received dirty detach msg from inst 1 for dom 3
Tue Jul 04 01:25:27 2017
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 2, cluster inc 4)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 3 invalid = TRUE 
 0 GCS resources traversed, 0 cancelled
freeing rdom 3
Dirty Detach Reconfiguration complete
WARNING: Waited 19 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 20 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 20 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 20 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 20 secs for write IO to PST disk 0 in group 4.
Fri Jul 07 02:15:03 2017
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
Fri Jul 07 02:17:52 2017
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 4.
Sat Jul 08 01:21:20 2017
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 4.
Sat Jul 08 01:23:40 2017
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 4.
Wed Jul 12 01:33:39 2017
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 19 secs for write IO to PST disk 0 in group 4.
Wed Jul 12 01:40:20 2017
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 4.

检查节点zs10的alertzs10.log有如下报错:

2017-06-29 03:54:58.913: 
[crsd(6087)]CRS-1006:The OCR location +crsdg is inaccessible. Details in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:33.239: 
[/u01/oracle/app/grid/home/bin/orarootagent.bin(6206)]CRS-5822:Agent '/u01/oracle/app/grid/home/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:13129} in /u01/oracle/app/grid/home/log/zs10/agent/crsd/orarootagent_root/orarootagent_root.log.
2017-06-29 03:56:33.240: 
[/u01/oracle/app/grid/home/bin/scriptagent.bin(6348)]CRS-5822:Agent '/u01/oracle/app/grid/home/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:9:7} in /u01/oracle/app/grid/home/log/zs10/agent/crsd/scriptagent_grid/scriptagent_grid.log.
2017-06-29 03:56:33.240: 
[/u01/oracle/app/grid/home/bin/oraagent.bin(6199)]CRS-5822:Agent '/u01/oracle/app/grid/home/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:7} in /u01/oracle/app/grid/home/log/zs10/agent/crsd/oraagent_grid/oraagent_grid.log.
2017-06-29 03:56:33.240: 
[/u01/oracle/app/grid/home/bin/oraagent.bin(6546)]CRS-5822:Agent '/u01/oracle/app/grid/home/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:11:16245} in /u01/oracle/app/grid/home/log/zs10/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2017-06-29 03:56:33.246: 
[ohasd(4966)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs10'.
2017-06-29 03:56:34.682: 
[crsd(66505)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:34.687: 
[crsd(66505)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:35.266: 
[ohasd(4966)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs10'.
2017-06-29 03:56:36.584: 
[crsd(66522)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:36.589: 
[crsd(66522)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:37.285: 
[ohasd(4966)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs10'.
2017-06-29 03:56:38.597: 
[crsd(66536)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:38.602: 
[crsd(66536)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:39.305: 
[ohasd(4966)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs10'.
2017-06-29 03:56:40.620: 
[crsd(66549)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:40.625: 
[crsd(66549)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:41.324: 
[ohasd(4966)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs10'.
2017-06-29 03:56:42.638: 
[crsd(66562)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:42.643: 
[crsd(66562)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:43.343: 
[ohasd(4966)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs10'.
2017-06-29 03:56:44.654: 
[crsd(66575)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:44.659: 
[crsd(66575)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:45.362: 
[ohasd(4966)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs10'.
2017-06-29 03:56:46.678: 
[crsd(66588)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:46.682: 
[crsd(66588)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:47.382: 
[ohasd(4966)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs10'.
2017-06-29 03:56:48.699: 
[crsd(66605)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:48.704: 
[crsd(66605)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:49.403: 
[ohasd(4966)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs10'.
2017-06-29 03:56:50.716: 
[crsd(66659)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:50.721: 
[crsd(66659)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:51.421: 
[ohasd(4966)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs10'.
2017-06-29 03:56:52.736: 
[crsd(66681)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:52.741: 
[crsd(66681)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/oracle/app/grid/home/log/zs10/crsd/crsd.log.
2017-06-29 03:56:53.441: 
[ohasd(4966)]CRS-2765:Resource 'ora.crsd' has failed on server 'zs10'.
2017-06-29 03:56:53.441: 
[ohasd(4966)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2017-06-29 03:56:53.441: 
[ohasd(4966)]CRS-2769:Unable to failover resource 'ora.crsd'.
[client(65804)]CRS-10001:21-Jul-17 15:49 ACFS-9204: false
[client(66031)]CRS-10001:21-Jul-17 15:49 ACFS-9204: false

检查节点zs10的oraagent_grid.log,有如下报错:

2017-06-29 03:56:33.239: [ CRSCOMM][1777547008] IpcC: IPC client connection 6c to member 0 has been removed
2017-06-29 03:56:33.239: [CLSFRAME][1777547008] Removing IPC Member:{Relative|Node:0|Process:0|Type:1}
2017-06-29 03:56:33.239: [CLSFRAME][1777547008] Disconnected from CRSD:zs10 process: {Relative|Node:0|Process:0|Type:1}
2017-06-29 03:56:33.240: [   AGENT][1767040768]{0:1:7} {0:1:7} Created alert : (:CRSAGF00117:) :  Disconnected from server, Agent is shutting down.
2017-06-29 03:56:33.240: [    AGFW][1767040768]{0:1:7} Agent is exiting with exit code: 1

检查2个节点的/var/log/messages*日志,发现2个节点均没有日志记录,但是syslogd进程正常,只有日志切换却没有日志内容。

经查看集群相关日志可以确定,由于存储磁盘出现IO问题(或光线闪断、或IO延迟),导致集群CRS异常宕机。但是,比较奇怪的是,虽然CSR掉线了,ASM实例

和DB实例却好好的,还可以正常使用。查询oracle support发现一篇文章1581864.1 提到ASM CRS仲裁盘访问超时与隐藏参数_asm_hbeatiowait有关系,而ASM的

隐藏参数_asm_hbeatiowait由于操作系统多路径Multipath配置的polling_interval有关,具体的故障原因是操作系统盘的判断访问超时远大于数据库ASM仲裁盘访问超

时,导致ORACLE RAC判定ASM中仲裁盘无法访问从而将仲裁盘强制Offline。解决的思路是:首先,确定操作系统polling_interval参数与数据库ASM隐藏参数值_asm_hbeatiowait,将_asm_hbeatiowait的值调整到比polling_interval值大即可。

处理措施:

1、查看数据库RAC ASM的_asm_hbeatiowait值(默认是15秒):

SQL> SELECT   ksppinm, ksppstvl, ksppdesc
   FROM   x$ksppi x, x$ksppcv y
  WHERE   x.indx = y.indx AND  ksppinm = '_asm_hbeatiowait'
SQL> 
KSPPINM     KSPPSTVL KSPPDESC
-------------------- ---------- ------------------------------------------------------------
_asm_hbeatiowait     15 number of secs to wait for PST Async Hbeat IO return
SQL> 

2、查看操作存储盘访问超时时间(RHEL6.8默认是30秒)

[[email protected] ~]# cat /sys/block/sdb/device/timeout 
30
[[email protected] ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.8 (Santiago)
[[email protected] ~]# 

3、将_asm_hbeatiowait 的值调整为45秒(该参数是静态参数,需要重启集群)

SQL> alter system set "_asm_hbeatiowait"=45 scope=spfile sid='*';
System altered.
SQL> 

4、重启集群并重启服务器

无论是root还是grid重启crs服务都没能成功,于是决定重启服务器,root先使用crsctl stop crs -f关闭集群服务,然后执行rboot,还算顺利,服务器重启

完成后集群恢复正常,操作系统/var/log/messages也有新内容产生了:

ORACLE RAC 11.2.0.4 for RHEL6.8 集群CRS异常导致集群命令无法使用

ORACLE RAC 11.2.0.4 for RHEL6.8 集群CRS异常导致集群命令无法使用