AIX 下Oracle Rac dbca建库报错 ora-7445 [PC:0x103E2AFA0]

时间:2022-04-02 08:17:17
在AIX 7100-02-03-1334 上安装Oracle Rac,grid和oracle都已安装完成。但是dbca建库的时候发现数据库crash,以下是建库时的alert.log,数据库报ora-07445报错,dbca的日志中可以发现在Create database时出错。 在mos上没有找到匹配的文档,尝试使用其他方法。 /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/trace/alert_rmbtodb1.log MMNL started with pid=26, OS id=7733452  Exception [type: SIGILL, Illegal opcode] [ADDR:0x103E2AFA0] [PC:0x103E2AFA0, {empty}] [flags: 0x0, count: 1] Errors in file /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/trace/rmbtodb1_asmb_6357148.trc  (incident=105793): ORA-07445: exception encountered: core dump [PC:0x103E2AFA0] [SIGILL] [ADDR:0x103E2AFA0] [PC:0x103E2AFA0] [Illegal opcode] [] Incident details in: /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/incident/incdir_105793/rmbtodb1_asmb_6357148_i105793.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. lmon registered with NM - instance number 1 (internal mem no 0) Reconfiguration started (old inc 0, new inc 2) List of instances:  1 (myinst: 1)   Global Resource Directory frozen * allocate domain 0, invalid = TRUE   Communication channels reestablished  Master broadcasted resource hash value bitmaps  Non-local Process blocks cleaned out  LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived  LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived  Set master node info   Submitted all remote-enqueue requests  Dwn-cvts replayed, VALBLKs dubious  All grantable enqueues granted  Post SMON to start 1st pass IR  Submitted all GCS remote-cache requests  Post SMON to start 1st pass IR  Fix write in gcs resources Reconfiguration complete Thu Dec 11 11:19:18 2014 LCK0 started with pid=27, OS id=10420304  Starting background process RSMN Thu Dec 11 11:19:18 2014 RSMN started with pid=28, OS id=9306256  ORACLE_BASE from environment = /oraapp/oracle Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x496568BB8] [PC:0x10029B4D0, {empty}] [flags: 0x8, count: 3] Errors in file /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/trace/rmbtodb1_asmb_6357148.trc  (incident=105794): ORA-07445: exception encountered: core dump [PC:0x10029B4D0] [SIGSEGV] [ADDR:0x496568BB8] [PC:0x10029B4D0] [Address not mapped to object] [] ORA-07445: exception encountered: core dump [PC:0x103E2AFA0] [SIGILL] [ADDR:0x103E2AFA0] [PC:0x103E2AFA0] [Illegal opcode] [] Incident details in: /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/incident/incdir_105794/rmbtodb1_asmb_6357148_i105794.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Thu Dec 11 11:19:21 2014 Sweep [inc][105794]: completed Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sweep [inc][105793]: completed Sweep [inc2][93794]: completed Sweep [inc2][105794]: completed PMON (ospid: 16318602): terminating the instance due to error 486 System state dump requested by (instance=1, osid=16318602 (PMON)), summary=[abnormal instance termination]. System State dumped to trace file /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/trace/rmbtodb1_diag_14352568.trc Dumping diagnostic data in directory=[cdmp_20141211111922], requested by (instance=1, osid=16318602 (PMON)), summary=[abnormal instance termination]. Instance terminated by PMON, pid = 16318602 oracle@urmbtodb1:/oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/trace>1/incident/incdir_105794/rmbtodb1_asmb_6357148_i105794.trc                              < "/oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/incident/incdir_105794/rmbtodb1_asmb_6357148_i105794.trc" 2832 lines, 161159 characters  Dump file /oraapp/oracle/diag/rdbms/rmbtodb/rmbtodb1/incident/incdir_105794/rmbtodb1_asmb_6357148_i105794.trc
首先怀疑是oracle对ASM磁盘没有写权限,尝试用oracle在ASM上创建spfile,成功创建。检查CRS_HOME和ORACLE_HOME的执行文件oracle,并未发现权限问题。 1、首先尝试在1号节点上手动建库,编写一份pfile,尝试将数据库nomount,发现数据库nomount起来后立即crash。 2、尝试在2号节点上dbca建库,其中报错信息如下: /oraapp/oracle/cfgtoollogs/dbca/rmbtodb/trace.log [Thread-178] [ 2014-12-11 12:47:49.813 CST ] [PostDBCreationStep.executeImpl:889]  Starting Database HA Resource [Thread-178] [ 2014-12-11 12:48:16.318 CST ] [CRSNative.internalStartResource:389]  Failed to start resource: Name: ora.rmbtodb.db, node: null, filter: null,  msg CRS-5017: The resource action "ora.rmbtodb.db start" encountered the following error: ORA-03113: end-of-file on communication channel Process ID: 14287060 Session ID: 126 Serial number: 1 . For details refer to "(:CLSN00107:)" in"/oraapp/grid/gridhome/log/urmbtodb1/agent/crsd/oraagent_oracle/oraagent_oracle.log".
CRS-2674: Start of 'ora.rmbtodb.db' on 'urmbtodb1' failed CRS-2632: There are no more servers to try to place resource 'ora.rmbtodb.db' on that would satisfy its placement policy [Thread-178] [ 2014-12-11 12:48:16.319 CST ] [PostDBCreationStep.executeImpl:897]  Exception while Starting with HA Database Resource PRCR-1079 : Failed to s tart resource ora.rmbtodb.db CRS-5017: The resource action "ora.rmbtodb.db start" encountered the following error: ORA-03113: end-of-file on communication channel Process ID: 14287060 Session ID: 126 Serial number: 1 . For details refer to "(:CLSN00107:)" in "/oraapp/grid/gridhome/log/urmbtodb1/agent/crsd/oraagent_oracle/oraagent_oracle.log".
CRS-2674: Start of 'ora.rmbtodb.db' on 'urmbtodb1' failed CRS-2632: There are no more servers to try to place resource 'ora.rmbtodb.db' on that would satisfy its placement policy
ora.rmbtodb.db在rmbtodb1上启动失败,但是数据库可以成功创建在2号节点上。
具体查看oraagent_oracle.log日志: /oraapp/grid/gridhome/log/urmbtodb1/agent/crsd/oraagent_oracle/oraagent_oracle.log
2014-12-10 22:48:11.505: [ USRTHRD][1800] {2:52141:473} Value of LOCAL_LISTENER is 2014-12-10 22:48:11.549: [ USRTHRD][1800] {2:52141:473} ORA-01405: fetched column value is NULL
2014-12-10 22:48:11.549: [ USRTHRD][1800] {2:52141:473} Value of LISTENER_NETWORKS is 2014-12-10 22:48:11.549: [ USRTHRD][1800] {2:52141:473} sqlStmt = ALTER SYSTEM SET LOCAL_LISTENER=' (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=200.31.155.225)(PORT=1521))))' SCOPE=MEMORY SID='rmbtodb1' /* db agent *//* {2:52141:473} */ 2014-12-10 22:48:13.011: [ USRTHRD][1800] {2:52141:473} ORA-03113: end-of-file on communication channel Process ID: 14287060 Session ID: 126 Serial number: 1 发现在设置LOCAL_LISTENER时,数据库crash。此时问题已经非常明显,肯定是网络方面的问题。 AIX管理员表示之前在1号节点上做过更改网卡绑定的模式。 grid@urmbtodb1:/home/grid>oifcfg getif -global en10  192.168.4.0  global  cluster_interconnect en9  200.31.155.0  global  public 查看public IP和priv IP并无异常。尝试将Public IP重新设置一下: 删除en9信息: grid@urmbtodb1:/home/grid>oifcfg -delif -global en9 grid@urmbtodb1:/home/grid>oifcfg getif -global en10  192.168.4.0  global  cluster_interconnect 重设public IP: grid@urmbtodb1:/home/grid>oifcfg -setif -global en9/200.31.155.0:public grid@urmbtodb1:/home/grid>oifcfg getif -global en10  192.168.4.0  global  cluster_interconnect en9  200.31.155.0  global  public
之后将crs重新启动。并再次在1号节点dbca建库,没有出现此前类似的问题。