1.问题:
节点二用crsctl stop crs -f停rac服务,无法停止,d.bin相关的9个进程都还存在
版本:oracle 11.2.0.4 for solaris
2.分析:
查看/abcapp/oragrid/11.2.0/log/abc208下的alert.log文件,日志如下:
[/abcapp/oragrid/11.2.0/bin/scriptagent.bin(10605)]CRS-5818:Aborted command 'clean' for resource 'ora.oc4j'. Details at (:CRSAGF00
113:) {2:26009:18659} in /abcapp/oragrid/11.2.0/log/abc208/agent/crsd/scriptagent_oragrid/scriptagent_oragrid.log.
2017-08-30 23:28:10.192:
[crsd(62374)]CRS-2757:Command 'Clean' timed out waiting for response from the resource 'ora.oc4j'. Details at (:CRSPE00111:) {2:2600
9:18659} in /abcapp/oragrid/11.2.0/log/abc208/crsd/crsd.log.
/abcapp/oragrid/11.2.0/log/abc208/crsd/crsd.log报错如下:
2017-08-30 23:48:10.228: [UiServer][47]{2:26009:18672} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2680: Clean of 'ora.oc4j' on 'abc208' failed]
MSGTYPE:
TextMessage[1]
OBJID:
TextMessage[ora.oc4j]
WAIT:
TextMessage[0]
]
2017-08-30 23:48:10.228: [ CRSPE][46]{2:26009:18672} Sequencer for [ora.oc4j 1 1] has completed with error: CRS-0216: Could not st
op resource 'ora.oc4j'.
2017-08-30 23:48:10.230: [UiServer][47]{2:26009:18673} Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2503: Resource 'ora.oc4j' is in UNKNOWN state and must be stopped first]
MSGTYPE:
TextMessage[1]
OBJID:
TextMessage[ora.oc4j]
WAIT:
TextMessage[0]
]
/abcapp/oragrid/11.2.0/log/abc208/agent/crsd/scriptagent_oragrid/scriptagent_oragrid.log如下:
2017-08-30 22:37:10.040: [ora.oc4j][46]{1:63945:12686} [check] Executing action script: /abcapp/oragrid/11.2.0/bin/oc4jctl[check]
2017-08-30 22:37:49.597: [ AGFW][9]{1:63945:12686} Agent received the message: AGENT_HB[Engine] ID 12293:21601515
2017-08-30 22:38:10.044: [ AGENT][58]{1:63945:12686} {1:63945:12686} Created alert : (:CRSAGF00113:) : Aborting the command: chec
k for resource: ora.oc4j 1 1
2017-08-30 22:38:10.044: [ora.oc4j][58]{1:63945:12686} [check] Killing action script: check
2017-08-30 22:38:10.044: [ AGFW][58]{1:63945:12686} Command: check for resource: ora.oc4j 1 1 completed with status: TIMEDOUT
2017-08-30 22:38:10.072: [ AGFW][46]{1:63945:12686} Received unknown resource status code: 255
2017-08-30 22:38:49.600: [ AGFW][9]{1:63945:12686} Agent received the message: AGENT_HB[Engine] ID 12293:21601539
2017-08-30 22:39:10.047: [ora.oc4j][46]{1:63945:12686} [check] Executing action script: /abcapp/oragrid/11.2.0/bin/oc4jctl[check]
2017-08-30 22:39:49.603: [ AGFW][9]{1:63945:12686} Agent received the message: AGENT_HB[Engine] ID 12293:21601561
2017-08-30 22:40:10.049: [ AGENT][58]{1:63945:12686} {1:63945:12686} Created alert : (:CRSAGF00113:) : Aborting the command: chec
k for resource: ora.oc4j 1 1
上面明显为oc4j服务停不下来阻塞了后面的服务引起,oc4j为jvm的进程,理论上杀掉grid用户下的java进程即可。
-bash-4.1$ kill -9 10789
-bash-4.1$ ps -ef |grep 10789
oragrid 10789 1 0 May 29 ? 847:17 /abcapp/oragrid/11.2.0/jdk/bin/sparcv9/java -server -Xcheck:jni -Xms128M -Xmx
杀了很多遍,没有反应。
说明问题是由java 进程僵死导致的。而检查发现实例1上没有跑oc4j服务,grid用户下没有对应java进程,所以,不会有这个问题。
3.解决:
节点二重启OS,执行init 6,若执行后没有反应的话,将crsd进程kill后,os就能重启了。
启动OS后能正常启crs服务和数据库实例,并启动oc4j服务,crsctl start res ora.oc4j,最后节点一重启crs服务非常顺利。
相关文章
- Oracle 11g rac开启归档
- Oracle 11g R2客户端与服务器端在同一台机器,无法配置监听
- 数据库系统入门 | Oracle Linux上部署Oracle 11g服务,并实现SSH远程登录管理
- Linux 安装 Oracle 11g——静默创建数据库(RAC)
- Oracle 监听服务强制被关闭不能启动 TNS识别问题解决
- Oracle 11g服务器安装详细步骤——图文教程
- Oracle 11g服务器安装详细步骤——图文教程(系统 windows server 2012 R2)
- linux下oracle数据库服务和监听的启动停止
- linux下oracle数据库服务和监听的启动停止
- centos7.6静默安装oracle 11G RAC