Oracle RAC系列之:Redhat 5.4 RAC no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

时间:2020-11-28 08:19:32

    最近的RP 值有点低,昨天开始装个10g的RAC。 遇到了N多问题。 解决raw 设备的问题之后, 在第二个节点执行root.sh 时候,报错如下:

 

[root@rac2 ~]# /u01/app/oracle/product/crs/root.sh

WARNING: directory '/u01/app/oracle/product' is not owned by root

WARNING: directory '/u01/app/oracle' is not owned by root

WARNING: directory '/u01/app' is not owned by root

WARNING: directory '/u01' is not owned by root

Checking to see if Oracle CRS stack is already configured

 

Setting the permissions on OCR backup directory

Setting up NS directories

Oracle Cluster Registry configuration upgraded successfully

WARNING: directory '/u01/app/oracle/product' is not owned by root

WARNING: directory '/u01/app/oracle' is not owned by root

WARNING: directory '/u01/app' is not owned by root

WARNING: directory '/u01' is not owned by root

clscfg: EXISTING configuration version 3 detected.

clscfg: version 3 is 10G Release 2.

assigning default hostname rac1 for node 1.

assigning default hostname rac2 for node 2.

Successfully accumulated necessary OCR keys.

Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.

node <nodenumber>: <nodename> <private interconnect name> <hostname>

node 1: rac1 rac1-priv rac1

node 2: rac2 rac2-priv rac2

clscfg: Arguments check out successfully.

 

NO KEYS WERE WRITTEN. Supply -force parameter to override.

-force is destructive and will destroy any previous cluster

configuration.

Oracle Cluster Registry for cluster has already been initialized

Startup will be queued to init within 90 seconds.

Adding daemons to inittab

Expecting the CRS daemons to be up within 600 seconds.

CSS is active on these nodes.

        rac1

        rac2

CSS is active on all nodes.

Waiting for the Oracle CRSD and EVMD to start

Waiting for the Oracle CRSD and EVMD to start

.....

Waiting for the Oracle CRSD and EVMD to start

Timed out waiting for the CRS stack to start.

 

 

在节点2crsd.log 发现如下信息:

[root@rac2 crsd]# cat crsd.log |more

Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle.  All rights reser

ved.

2010-11-28 20:11:12.645: [ default][1116368][ENTER]0

Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2004, Oracle.  All rights rese

rved

2010-11-28 20:11:12.645: [ default][1116368]0CRS Daemon Starting

2010-11-28 20:11:12.690: [ CRSMAIN][1116368]0Checking the OCR device

2010-11-28 20:11:12.994: [ CRSMAIN][1116368]0Connecting to the CSS Daemon

2010-11-28 20:11:13.636: [ COMMCRS][60492688]clsc_connect: (0x8b937e0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))

 

2010-11-28 20:11:13.637: [ CSSCLNT][1116368]clsssInitNative: connect failed, rc 9

2010-11-28 20:11:13.640: [  CRSRTI][1116368]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..

 

2010-11-28 20:11:17.062: [ COMMCRS][60492688]clsc_connect: (0x8c283e0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))

2010-11-28 20:11:17.062: [ CSSCLNT][1116368]clsssInitNative: connect failed, rc 9

2010-11-28 20:11:17.063: [  CRSRTI][1116368]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2010-11-28 20:11:18.361: [ COMMCRS][60492688]clsc_connect: (0x8b94c30) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))

2010-11-28 20:11:18.361: [ CSSCLNT][1116368]clsssInitNative: connect failed, rc 9

2010-11-28 20:11:18.361: [  CRSRTI][1116368]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2010-11-28 20:11:19.642: [ COMMCRS][60492688]clsc_connect: (0x8c28840) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))

2010-11-28 20:11:19.642: [ CSSCLNT][1116368]clsssInitNative: connect failed, rc 9

2010-11-28 20:11:19.642: [  CRSRTI][1116368]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2010-11-28 20:11:26.540: [    CRSD][1116368]0Daemon Version: 10.2.0.1.0 Active Version: 10.1.0.2.0

2010-11-28 20:11:26.540: [    CRSD][1116368]0Active Version is less than Software Version

2010-11-28 20:11:26.557: [    CRSD][1116368]0Registered in CSS group crs_version

2010-11-28 20:11:26.557: [ CRSMAIN][1116368]0Initializing OCR

2010-11-28 20:11:26.617: [    CRSD][104029072]0Monitoring the crs_version group for AV change notification

2010-11-28 20:11:26.617: [    CRSD][104029072]0Doing grpstat on crs_version group

2010-11-28 20:11:26.617: [    CRSD][104029072]0Returned from grpstat with event 1

2010-11-28 20:11:26.617: [    CRSD][104029072]0Doing grpstat on crs_version group

2010-11-28 20:11:26.827: [  OCRRAW][1116368]proprioo: for disk 0 (/dev/raw/raw1), id match (1), my id set (1669906634,188263131) total id sets (1), 1st set (1669906634,188263131), 2nd set (0,0) my votes (1), total votes (2)

2010-11-28 20:11:26.828: [  OCRRAW][1116368]proprioo: for disk 1 (/dev/raw/raw2), id match (1), my id set (1669906634,188263131) total id sets (1), 1st set (1669906634,188263131), 2nd set (0,0) my votes (1), total votes (2)

2010-11-28 20:11:28.715: [    CRSD][1116368]0ENV Logging level for Module: allcomp  0

2010-11-28 20:11:29.563: [    CRSD][1116368]0ENV Logging level for Module: default  0

2010-11-28 20:11:29.622: [    CRSD][1116368]0ENV Logging level for Module: COMMCRS  0

2010-11-28 20:11:30.671: [    CRSD][1116368]0ENV Logging level for Module: COMMNS  0

2010-11-28 20:11:31.620: [    CRSD][104029072]0Returned from grpstat with event 1

2010-11-28 20:11:31.620: [    CRSD][104029072]0Doing grpstat on crs_version group

2010-11-28 20:11:31.620: [    CRSD][104029072]0Returned from grpstat with event 1

2010-11-28 20:11:31.620: [    CRSD][104029072]0Doing grpstat on crs_version group

2010-11-28 20:11:31.620: [    CRSD][104029072]0Returned from grpstat with event 8

2010-11-28 20:11:31.620: [    CRSD][104029072]0Recieved GRPPRIV event

2010-11-28 20:11:31.632: [    CRSD][104029072]0AV got from version group: 10.2.0.1.0

2010-11-28 20:11:31.632: [    CRSD][104029072]0Stopped monitoring the version group

2010-11-28 20:11:31.632: [    CRSD][104029072]0New Active Version:10.2.0.1.0

2010-11-28 20:11:31.632: [    CRSD][104029072]0Active Version changed to 10.2.0.1.0

2010-11-28 20:11:32.105: [    CRSD][1116368]0ENV Logging level for Module: CRSUI  0

...

2010-11-28 20:11:47.616: [    CRSD][1116368]0ENV Logging level for Module: OCRMAS  0

2010-11-28 20:11:47.616: [ CRSMAIN][1116368]0Filename is /u01/app/oracle/product/crs/crs/init/rac2.p

id

[  clsdmt][104029072]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac2DBG_CRSD))

2010-11-28 20:11:48.124: [  CRSOCR][1116368]0OCR api procr_open_key failed for key SYSTEM.crs.updflag. OCR error code = 4 OCR error msg: PROC-4: The cluster registry key to be operated on does not exist.

2010-11-28 20:11:49.284: [  CRSOCR][1116368]0OCR api procr_delete_key failed for key SYSTEM.crs.updflag. OCR error code = 0 OCR error msg:

2010-11-28 20:11:49.294: [ CRSMAIN][1116368]0Using Authorizer location: /u01/app/oracle/product/crs/crs/auth/

2010-11-28 20:11:49.518: [ CRSMAIN][1116368]0Initializing RTI

2010-11-28 20:11:49.519: [CRSTIMER][2823719824]0Timer Thread Starting.

2010-11-28 20:11:49.524: [  CRSRES][1116368]0Parameter SECURITY = 1, running in USER Mode

2010-11-28 20:11:49.524: [ CRSMAIN][1116368]0Initializing EVMMgr

2010-11-28 20:11:49.636: [ COMMCRS][2813229968]clsc_connect: (0x918fc48) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

2010-11-28 20:11:50.151: [ COMMCRS][2813229968]clsc_connect: (0x90fed98) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

2010-11-28 20:11:50.444: [ COMMCRS][2813229968]clsc_connect: (0x918fe78) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

2010-11-28 20:11:51.198: [ COMMCRS][2813229968]clsc_connect: (0x918ffb8) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

2010-11-28 20:11:51.702: [ COMMCRS][2813229968]clsc_connect: (0x918f278) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

2010-11-28 20:11:52.961: [ COMMCRS][2813229968]clsc_connect: (0x918f5f0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

2010-11-28 20:11:53.474: [ COMMCRS][2813229968]clsc_connect: (0x918fd88) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

2010-11-28 20:11:54.726: [ COMMCRS][2813229968]clsc_connect: (0x918fd88) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

 

[root@rac1 cssd]# cat ocssd.log |more

Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle.  All rights reserved.

[    CSSD]2010-11-28 20:07:53.219 >USER:    Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle.  All rights reserved.

[    CSSD]2010-11-28 20:07:53.219 >USER:    CSS daemon log for node rac1, number 1, in cluster crs

[    CSSD]2010-11-28 20:07:53.257 [1277920] >TRACE:   clssscmain: local-only set to false

[  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_CSSD))

[    CSSD]2010-11-28 20:07:53.369 [1277920] >TRACE:   clssnmReadNodeInfo: added node 1 (rac1) to cluster

[    CSSD]2010-11-28 20:07:53.450 [1277920] >TRACE:   clssnmReadNodeInfo: added node 2 (rac2) to cluster

[    CSSD]2010-11-28 20:07:53.525 [38079376] >TRACE:   clssnm_skgxnmon: skgxn init failed, rc 1

[    CSSD]2010-11-28 20:07:53.525 [1277920] >TRACE:   clssnm_skgxnonline: Using vacuous skgxn monitor

[    CSSD]2010-11-28 20:07:53.584 [1277920] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw3)

[    CSSD]2010-11-28 20:07:53.602 [1277920] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (1//dev/raw/raw4)

[    CSSD]2010-11-28 20:07:53.633 [1277920] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (2//dev/raw/raw5)

[    CSSD]2010-11-28 20:07:55.640 [65649552] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (1//dev/raw/raw4)

[    CSSD]2010-11-28 20:07:55.719 [38079376] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw3)

[    CSSD]2010-11-28 20:07:55.724 [76139408] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (2//dev/raw/raw5)

[    CSSD]2010-11-28 20:07:55.821 [1277920] >TRACE:   clssscSclsFatal: read value of disable

[    CSSD]2010-11-28 20:07:55.822 [1277920] >TRACE:   clssscSclsFatal: read value of disable

[    CSSD]2010-11-28 20:07:55.825 [114346896] >TRACE:   clssnmFatalThread: spawned

[    CSSD]2010-11-28 20:07:55.825 [3086044048] >TRACE:   clssnmconnect: connecting to node 1, flags 0x0001, connector 1

[    CSSD]2010-11-28 20:07:56.024 [3086044048] >TRACE:   clssnmconnect: connecting to node 0, flags 0x0000, connector 1

[    CSSD]2010-11-28 20:07:56.025 [3086044048] >TRACE:   clssnmClusterListener: Probing node(2)

[    CSSD]2010-11-28 20:07:56.102 [3086044048] >TRACE:   clsc_send_msg: (0x8c250e0) NS err (12571, 12560), transport (530, 111, 0)

[    CSSD]2010-11-28 20:07:56.102 [3086044048] >ERROR:   clssnmInitialMsg: send failed, con (0x8c25528), rc 3

[    CSSD]2010-11-28 20:07:56.121 [3075554192] >TRACE:   clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))

[    CSSD]2010-11-28 20:07:56.122 [3075554192] >TRACE:   clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))

[    CSSD]2010-11-28 20:07:56.211 [3032476560] >TRACE:   clssnmPollingThread: Connection complete

[    CSSD]2010-11-28 20:07:56.211 [3011496848] >TRACE:   clssnmRcfgMgrThread: Connection complete

[    CSSD]2010-11-28 20:07:56.211 [3011496848] >TRACE:   clssnmRcfgMgrThread: Local Join

[    CSSD]2010-11-28 20:07:56.211 [3011496848] >TRACE:   clssnmDoSyncUpdate: Initiating sync 1

 

[root@rac2 client]# cat css.log |more

Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle.  All rights reserved.

2010-11-28 20:10:00.188: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9

2010-11-28 20:10:02.359: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9

2010-11-28 20:10:05.369: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9

2010-11-28 20:10:08.821: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9

2010-11-28 20:10:10.073: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9

2010-11-28 20:10:11.613: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9

2010-11-28 20:10:12.765: [ CSSCLNT][1501280]clsssInitNative: connect failed, rc 9

 

 

启动CRS报如下错误:

[root@rac1 bin]# ./crsctl check crs

CSS appears healthy

Cannot communicate with CRS

Cannot communicate with EVM

 

 

 

问题的相关分析:

 

1. 防火墙原因

 

       Oracle Metalink 上的一种类似的情况, 是因为防火墙的原因。 但是我的防火墙在安装系统的时候就关闭了。

 

问题表现, ping 私有IP 正常, 但是用tracert 私有IP。 就会有如下错误:


# traceroute 192.168.0.2
traceroute to  192.168.0.2 (192.168.0.2), 30 hops max, 46 byte packets
1  rac2prv (192.168.0.2)   0.201 ms !<10>   0.198 ms !<10>   0.109 ms !<10>

 

如果是这种情况, 关闭防火墙就可以了

# service iptables stop
# chkconfig iptables off.

 

 

2.  raw 设备的权限问题

       对照了一下,raw的权限没有问题。 因为raw的配置是按照Oracle 官方文档配置的。 所以我这里raw 的问题不大。

 

[root@rac2 ~]# cd /dev/raw/

[root@rac2 raw]# ll

total 0

crw-r----- 1 root   oinstall 162, 1 Nov 28 19:14 raw1

crw-r----- 1 root   oinstall 162, 2 Nov 28 19:14 raw2

crw-r--r-- 1 oracle oinstall 162, 3 Nov 28 20:15 raw3

crw-r--r-- 1 oracle oinstall 162, 4 Nov 28 20:15 raw4

crw-r--r-- 1 oracle oinstall 162, 5 Nov 28 20:15 raw5

 

 

3. 相关目录的权限问题

       CRS 需要往相关的文件写一些信息,如果这些文件夹有权限问题,导致文件不能写。 也可能会出现这种情况。 这个我在网上搜到了几个例子。 他们对文件重新赋权后,CRS就正常启动了。

       几个相关的目录:/var/tmp/.oracle, /tmp/.oracle和$CRS_HOME/log/sid/

       Oracle 会往这几个文件里写一些socket和log的信息。 如果不能写,就会导致CRS不能启动。

      

       如何判断是不是这个问题导致CRS不能启动的方法很简单。 就是先将这2个文件夹清空。 在启动CRS。 如果有文件生成就说明权限没有问题。

       注意的事,要先关闭CRS。 如果CRS 在运行, 强制删除这2个文件夹,可能会导致CRS 挂掉。

      

 

尝试清空了这2个目录。 然后重新运行了root.sh命令,操作如下:

       1. 用crsctl stop crs 命令,停掉CRS

       2. 删除/etc/init.* 几个文件。 rm -f /etc/init.*

       3. kill 相关进程

               ps -ef|grep css

              ps -ef|grep crs

              ps -ef|grep evm

       根据ps 查出来的id, 用kill -9 id 结束进程。

       如果不在第二部删除掉相关文件, 这些进程是kill 不掉的。

       4. 删除每台机器上的/etc/oracle/scls_scr/rac1/oracle/cssfatal 文件

              如果不删这个文件,运行root.sh 脚本时会报错。

       参考:

RAC root.sh Oracle CRS stack is already configured and will be running under init(1M) 的解决方法

http://blog.****.net/tianlesoftware/archive/2010/02/21/5314804.aspx

 

5. 情况OCR的2个raw设备

[root@rac1 bin]# dd if=/dev/zero of=/dev/raw/raw1 bs=1M count=195

195+0 records in

195+0 records out

204472320 bytes (204 MB) copied, 23.5725 seconds, 8.7 MB/s

[root@rac1 bin]# dd if=/dev/zero of=/dev/raw/raw2 bs=1M count=195

195+0 records in

195+0 records out

204472320 bytes (204 MB) copied, 28.1755 seconds, 7.3 MB/s

 

6. 重新运行 /u01/app/oracle/product/crs/root.sh 脚本。

 

 

按以上方式操作之后,还是同样的错误。 杯具中...

 

       因为这个系统安装过Oracle 11gR2的RAC。 没有安装成功,就删除相关文件后,直接装10g的RAC了。 估计是某些地方没有删除干净。 Clusterware 也是很诡异的。 最终把系统重做,然后安装了10g的RAC。

 

       网上的朋友是正常的RAC 环境,重启之后不能启动CRS。 出现这种错误后, 对相关目录赋权之后就正常启动了。 我这个是在安装的过程中。 好折腾。 如果是生产环境就麻烦了.



原文地址:http://blog.****.net/tianlesoftware/article/details/6048651