最近的RP 值有点低,昨天开始装个10g的RAC。 遇到了N多问题。 解决raw 设备的问题之后, 在第二个节点执行root.sh 时候,报错如下:
[root@rac2 ~]# /u01/app/oracle/product/crs/root.sh
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
assigning default hostname rac1 for node 1.
assigning default hostname rac2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: rac1 rac1-priv rac1
node 2: rac2 rac2-priv rac2
clscfg: Arguments check out successfully.
NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
rac1
rac2
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
.....
Waiting for the Oracle CRSD and EVMD to start
Timed out waiting for the CRS stack to start.
在节点2的crsd.log 发现如下信息:
[root@rac2 crsd]# cat crsd.log |more
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reser
ved.
2010-11-28 20:11:12.645: [ default][1116368][ENTER]0
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2004, Oracle. All rights rese
rved
2010-11-28 20:11:12.645: [ default][1116368]0CRS Daemon Starting
2010-11-28 20:11:12.690: [ CRSMAIN][1116368]0Checking the OCR device
2010-11-28 20:11:12.994: [ CRSMAIN][1116368]0Connecting to the CSS Daemon
2010-11-28 20:11:13.636: [ COMMCRS][60492688]clsc_connect: (0x8b937e0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))
2010-11-28 20:11:13.637: [ CSSCLNT][1116368]clsssInitNative: connect failed, rc 9
2010-11-28 20:11:13.640: [ CRSRTI][1116368]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2010-11-28 20:11:17.062: [ COMMCRS][60492688]clsc_connect: (0x8c283e0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))
2010-11-28 20:11:17.062: [ CSSCLNT][1116368]clsssInitNative: connect failed, rc 9
2010-11-28 20:11:17.063: [ CRSRTI][1116368]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2010-11-28 20:11:18.361: [ COMMCRS][60492688]clsc_connect: (0x8b94c30) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))
2010-11-28 20:11:18.361: [ CSSCLNT][1116368]clsssInitNative: connect failed, rc 9
2010-11-28 20:11:18.361: [ CRSRTI][1116368]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2010-11-28 20:11:19.642: [ COMMCRS][60492688]clsc_connect: (0x8c28840) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))
2010-11-28 20:11:19.642: [ CSSCLNT][1116368]clsssInitNative: connect failed, rc 9
2010-11-28 20:11:19.642: [ CRSRTI][1116368]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2010-11-28 20:11:26.540: [ CRSD][1116368]0Daemon Version: 10.2.0.1.0 Active Version: 10.1.0.2.0
2010-11-28 20:11:26.540: [ CRSD][1116368]0Active Version is less than Software Version
2010-11-28 20:11:26.557: [ CRSD][1116368]0Registered in CSS group crs_version
2010-11-28 20:11:26.557: [ CRSMAIN][1116368]0Initializing OCR
2010-11-28 20:11:26.617: [ CRSD][104029072]0Monitoring the crs_version group for AV change notification
2010-11-28 20:11:26.617: [ CRSD][104029072]0Doing grpstat on crs_version group
2010-11-28 20:11:26.617: [ CRSD][104029072]0Returned from grpstat with event 1
2010-11-28 20:11:26.617: [ CRSD][104029072]0Doing grpstat on crs_version group
2010-11-28 20:11:26.827: [ OCRRAW][1116368]proprioo: for disk 0 (/dev/raw/raw1), id match (1), my id set (1669906634,188263131) total id sets (1), 1st set (1669906634,188263131), 2nd set (0,0) my votes (1), total votes (2)
2010-11-28 20:11:26.828: [ OCRRAW][1116368]proprioo: for disk 1 (/dev/raw/raw2), id match (1), my id set (1669906634,188263131) total id sets (1), 1st set (1669906634,188263131), 2nd set (0,0) my votes (1), total votes (2)
2010-11-28 20:11:28.715: [ CRSD][1116368]0ENV Logging level for Module: allcomp 0
2010-11-28 20:11:29.563: [ CRSD][1116368]0ENV Logging level for Module: default 0
2010-11-28 20:11:29.622: [ CRSD][1116368]0ENV Logging level for Module: COMMCRS 0
2010-11-28 20:11:30.671: [ CRSD][1116368]0ENV Logging level for Module: COMMNS 0
2010-11-28 20:11:31.620: [ CRSD][104029072]0Returned from grpstat with event 1
2010-11-28 20:11:31.620: [ CRSD][104029072]0Doing grpstat on crs_version group
2010-11-28 20:11:31.620: [ CRSD][104029072]0Returned from grpstat with event 1
2010-11-28 20:11:31.620: [ CRSD][104029072]0Doing grpstat on crs_version group
2010-11-28 20:11:31.620: [ CRSD][104029072]0Returned from grpstat with event 8
2010-11-28 20:11:31.620: [ CRSD][104029072]0Recieved GRPPRIV event
2010-11-28 20:11:31.632: [ CRSD][104029072]0AV got from version group: 10.2.0.1.0
2010-11-28 20:11:31.632: [ CRSD][104029072]0Stopped monitoring the version group
2010-11-28 20:11:31.632: [ CRSD][104029072]0New Active Version:10.2.0.1.0
2010-11-28 20:11:31.632: [ CRSD][104029072]0Active Version changed to 10.2.0.1.0
2010-11-28 20:11:32.105: [ CRSD][1116368]0ENV Logging level for Module: CRSUI 0
...
2010-11-28 20:11:47.616: [ CRSD][1116368]0ENV Logging level for Module: OCRMAS 0
2010-11-28 20:11:47.616: [ CRSMAIN][1116368]0Filename is /u01/app/oracle/product/crs/crs/init/rac2.p
id
[ clsdmt][104029072]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac2DBG_CRSD))
2010-11-28 20:11:48.124: [ CRSOCR][1116368]0OCR api procr_open_key failed for key SYSTEM.crs.updflag. OCR error code = 4 OCR error msg: PROC-4: The cluster registry key to be operated on does not exist.
2010-11-28 20:11:49.284: [ CRSOCR][1116368]0OCR api procr_delete_key failed for key SYSTEM.crs.updflag. OCR error code = 0 OCR error msg:
2010-11-28 20:11:49.294: [ CRSMAIN][1116368]0Using Authorizer location: /u01/app/oracle/product/crs/crs/auth/
2010-11-28 20:11:49.518: [ CRSMAIN][1116368]0Initializing RTI
2010-11-28 20:11:49.519: [CRSTIMER][2823719824]0Timer Thread Starting.
2010-11-28 20:11:49.524: [ CRSRES][1116368]0Parameter SECURITY = 1, running in USER Mode
2010-11-28 20:11:49.524: [ CRSMAIN][1116368]0Initializing EVMMgr
2010-11-28 20:11:49.636: [ COMMCRS][2813229968]clsc_connect: (0x918fc48) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2010-11-28 20:11:50.151: [ COMMCRS][2813229968]clsc_connect: (0x90fed98) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2010-11-28 20:11:50.444: [ COMMCRS][2813229968]clsc_connect: (0x918fe78) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2010-11-28 20:11:51.198: [ COMMCRS][2813229968]clsc_connect: (0x918ffb8) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2010-11-28 20:11:51.702: [ COMMCRS][2813229968]clsc_connect: (0x918f278) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2010-11-28 20:11:52.961: [ COMMCRS][2813229968]clsc_connect: (0x918f5f0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2010-11-28 20:11:53.474: [ COMMCRS][2813229968]clsc_connect: (0x918fd88) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
2010-11-28 20:11:54.726: [ COMMCRS][2813229968]clsc_connect: (0x918fd88) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))
[root@rac1 cssd]# cat ocssd.log |more
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
[ CSSD]2010-11-28 20:07:53.219 >USER: Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
[ CSSD]2010-11-28 20:07:53.219 >USER: CSS daemon log for node rac1, number 1, in cluster crs
[ CSSD]2010-11-28 20:07:53.257 [1277920] >TRACE: clssscmain: local-only set to false
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_CSSD))
[ CSSD]2010-11-28 20:07:53.369 [1277920] >TRACE: clssnmReadNodeInfo: added node 1 (rac1) to cluster
[ CSSD]2010-11-28 20:07:53.450 [1277920] >TRACE: clssnmReadNodeInfo: added node 2 (rac2) to cluster
[ CSSD]2010-11-28 20:07:53.525 [38079376] >TRACE: clssnm_skgxnmon: skgxn init failed, rc 1
[ CSSD]2010-11-28 20:07:53.525 [1277920] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
[ CSSD]2010-11-28 20:07:53.584 [1277920] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw3)
[ CSSD]2010-11-28 20:07:53.602 [1277920] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (1//dev/raw/raw4)
[ CSSD]2010-11-28 20:07:53.633 [1277920] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (2//dev/raw/raw5)
[ CSSD]2010-11-28 20:07:55.640 [65649552] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (1//dev/raw/raw4)
[ CSSD]2010-11-28 20:07:55.719 [38079376] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw3)
[ CSSD]2010-11-28 20:07:55.724 [76139408] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (2//dev/raw/raw5)
[ CSSD]2010-11-28 20:07:55.821 [1277920] >TRACE: clssscSclsFatal: read value of disable
[ CSSD]2010-11-28 20:07:55.822 [1277920] >TRACE: clssscSclsFatal: read value of disable
[ CSSD]2010-11-28 20:07:55.825 [114346896] >TRACE: clssnmFatalThread: spawned
[ CSSD]2010-11-28 20:07:55.825 [3086044048] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
[ CSSD]2010-11-28 20:07:56.024 [3086044048] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
[ CSSD]2010-11-28 20:07:56.025 [3086044048] >TRACE: clssnmClusterListener: Probing node(2)
[ CSSD]2010-11-28 20:07:56.102 [3086044048] >TRACE: clsc_send_msg: (0x8c250e0) NS err (12571, 12560), transport (530, 111, 0)
[ CSSD]2010-11-28 20:07:56.102 [3086044048] >ERROR: clssnmInitialMsg: send failed, con (0x8c25528), rc 3
[ CSSD]2010-11-28 20:07:56.121 [3075554192] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2010-11-28 20:07:56.122 [3075554192] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))
[ CSSD]2010-11-28 20:07:56.211 [3032476560] >TRACE: clssnmPollingThread: Connection complete
[ CSSD]2010-11-28 20:07:56.211 [3011496848] >TRACE: clssnmRcfgMgrThread: Connection complete
[ CSSD]2010-11-28 20:07:56.211 [3011496848] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2010-11-28 20:07:56.211 [3011496848] >TRACE: clssnmDoSyncUpdate: Initiating sync 1
[root@rac2 client]# cat css.log |more
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
2010-11-28 20:10:00.188: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9
2010-11-28 20:10:02.359: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9
2010-11-28 20:10:05.369: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9
2010-11-28 20:10:08.821: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9
2010-11-28 20:10:10.073: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9
2010-11-28 20:10:11.613: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9
2010-11-28 20:10:12.765: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9
启动CRS报如下错误:
[root@rac1 bin]# ./crsctl check crs
CSS appears healthy
Cannot communicate with CRS
Cannot communicate with EVM
问题的相关分析:
1. 防火墙原因
Oracle Metalink 上的一种类似的情况, 是因为防火墙的原因。 但是我的防火墙在安装系统的时候就关闭了。
问题表现, ping 私有IP 正常, 但是用tracert 私有IP。 就会有如下错误:
# traceroute 192.168.0.2
traceroute to 192.168.0.2 (192.168.0.2), 30 hops max, 46 byte packets
1 rac2prv (192.168.0.2) 0.201 ms !<10> 0.198 ms !<10> 0.109 ms !<10>
如果是这种情况, 关闭防火墙就可以了
# service iptables stop
# chkconfig iptables off.
2. raw 设备的权限问题
对照了一下,raw的权限没有问题。 因为raw的配置是按照Oracle 官方文档配置的。 所以我这里raw 的问题不大。
[root@rac2 ~]# cd /dev/raw/
[root@rac2 raw]# ll
total 0
crw-r----- 1 root oinstall 162, 1 Nov 28 19:14 raw1
crw-r----- 1 root oinstall 162, 2 Nov 28 19:14 raw2
crw-r--r-- 1 oracle oinstall 162, 3 Nov 28 20:15 raw3
crw-r--r-- 1 oracle oinstall 162, 4 Nov 28 20:15 raw4
crw-r--r-- 1 oracle oinstall 162, 5 Nov 28 20:15 raw5
3. 相关目录的权限问题
CRS 需要往相关的文件写一些信息,如果这些文件夹有权限问题,导致文件不能写。 也可能会出现这种情况。 这个我在网上搜到了几个例子。 他们对文件重新赋权后,CRS就正常启动了。
几个相关的目录:/var/tmp/.oracle, /tmp/.oracle和$CRS_HOME/log/sid/
Oracle 会往这几个文件里写一些socket和log的信息。 如果不能写,就会导致CRS不能启动。
如何判断是不是这个问题导致CRS不能启动的方法很简单。 就是先将这2个文件夹清空。 在启动CRS。 如果有文件生成就说明权限没有问题。
注意的事,要先关闭CRS。 如果CRS 在运行, 强制删除这2个文件夹,可能会导致CRS 挂掉。
尝试清空了这2个目录。 然后重新运行了root.sh命令,操作如下:
1. 用crsctl stop crs 命令,停掉CRS
2. 删除/etc/init.* 几个文件。 rm -f /etc/init.*
3. kill 相关进程
ps -ef|grep css
ps -ef|grep crs
ps -ef|grep evm
根据ps 查出来的id, 用kill -9 id 结束进程。
如果不在第二部删除掉相关文件, 这些进程是kill 不掉的。
4. 删除每台机器上的/etc/oracle/scls_scr/rac1/oracle/cssfatal 文件
如果不删这个文件,运行root.sh 脚本时会报错。
参考:
RAC root.sh Oracle CRS stack is already configured and will be running under init(1M) 的解决方法
http://blog.****.net/tianlesoftware/archive/2010/02/21/5314804.aspx
5. 情况OCR的2个raw设备
[root@rac1 bin]# dd if=/dev/zero of=/dev/raw/raw1 bs=1M count=195
195+0 records in
195+0 records out
204472320 bytes (204 MB) copied, 23.5725 seconds, 8.7 MB/s
[root@rac1 bin]# dd if=/dev/zero of=/dev/raw/raw2 bs=1M count=195
195+0 records in
195+0 records out
204472320 bytes (204 MB) copied, 28.1755 seconds, 7.3 MB/s
6. 重新运行 /u01/app/oracle/product/crs/root.sh 脚本。
按以上方式操作之后,还是同样的错误。 杯具中...
因为这个系统安装过Oracle 11gR2的RAC。 没有安装成功,就删除相关文件后,直接装10g的RAC了。 估计是某些地方没有删除干净。 Clusterware 也是很诡异的。 最终把系统重做,然后安装了10g的RAC。
网上的朋友是正常的RAC 环境,重启之后不能启动CRS。 出现这种错误后, 对相关目录赋权之后就正常启动了。 我这个是在安装的过程中。 好折腾。 如果是生产环境就麻烦了.
原文地址:http://blog.****.net/tianlesoftware/article/details/6048651