本篇文档描述,再云服务器上,操作系统版本Centos7版本,安装ORACLE RAC 11.2.0.4遇到的几个问题及说明:
1.再安装Grid集群软件,执行root.sh之前,需要再两个节点,都打上补丁
Install of Clusterware fails while running root.sh on OL7 - ohasd fails to start (Doc ID 1959008.1)
Grid 11.2.0.4 Install fails when running root.sh on OL7, this affects both Oracle Clusterware and Oracle Restart Installation.
rootcrs.log/roothas.log confirms that ohasd/crsd failed to start
CAUSE
There is a known issue where OL7 expects to use systemd rather than initd for running processes and restarting them and root.sh does not handle this currently.
This was reported in the following Unpublished Bug
Bug 18370031 - RC SCRIPTS (/ETC/RC.D/RC.* , /ETC/INIT.D/* ) ON OL7 FOR CLUSTERWARE
SOLUTION
Because Oracle Linux 7 (and Redhat 7) use systemd rather than initd for starting/restarting processes and runs them as a service the current software install of both 11.2.0.4 & 12.1.0.1 will not succeed because the ohasd process does not start properly.
In OL7 it needs to be set up as a service and patch fix for Bug 18370031 needs to be applied for this , BEFORE you run root.sh when prompted .
Need to apply the patch 18370031 for 11.2.0.4 .
And also its mentioned in 11gR2 Release Notes:https://docs.oracle.com/cd/E11882_01/relnotes.112/e23558/toc.htm#CJAJEBGG
During the Oracle Grid Infrastructure installation, you must apply patch 18370031 before configuring the software that is installed.
The timing of applying the patch is important and is described in detail in the Note 1951613.1 on My Oracle Support. This patch ensures that
the clusterware stack is configured to use systemd for clusterware processes, as Oracle Linux 7 uses systemd for all services.
在执行root.sh前需要打上18370031补丁
具体操作,下载安装该补丁后,解压后执行,两个节点都执行,再执行root.sh脚本之前,打补丁,然后再依次执行root.sh脚本
$ /u01/app/11.2.0/grid/OPatch/opatch apply -local
2.Oracle 软件安装遇到agent nmhs of make file ins_emagent.mk
error in invoking target ‘agent nmhs‘ of make file ins_emagent.mk while installing Oracle 11.2.0.4 on Linux (文档 ID 2299494.1)
SYMPTOMS
When installing Oracle database 11.2.0.4 software on some Linux x86-64 releases, such as SUSE12SP1, SUSE12SP2 or RHEL7, the below error is reported during link stage:
SOLUTION
Edit $ORACLE_HOME/sysman/lib/ins_emagent.mk, search for the line
$(MK_EMAGENT_NMECTL)Then replace the line with
$(MK_EMAGENT_NMECTL) -lnnz11 空格后,-字符l,nnz,数字11,实际操作发现只再一个节点存在Then click “Retry” button to continue.
----*如下BUG oracle没有明确的MOS文档说明,可以参考了解即可。
Patch 19692824
During installation of Oracle Database or Oracle RAC on OL7, the following linking error may be encountered:
Error in invoking target ‘agent nmhs‘ of makefile ‘<ORACLE_HOME>/sysman/lib/ins_emagent.mk‘. See ‘<installation log>‘ for details.
If this error is encountered, the user should select Continue . Then, after the installation has completed, the user must download Patch 19692824 from My Oracle Support and apply it per the instructions included in the patch README.
3.由于客户环境是公有云,遇到私有HAIP不通,导致安装GRID,第二个节点grid无法启动
CRS-5018:(:CLSN00037:) 节点2 安装GRID执行root.sh脚本失败,观察日志发现集群是ASM启动后报错,也就是说ASM实例无法启动成功。
由于ORACLE ASM启动之前是需要确保HAIP能够互通,因此本次将问题关注HAIP信息
参考
Known Issues: Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip (Doc ID 1640865.1)
Bug 11077756 - allow root script to continue upon HAIP failure
Issue: Startup failure of HAIP fails root script, fix of the bug will allow root script to continue so HAIP issue can be worked later.
Fixed in: 11.2.0.2 GI PSU6, 11.2.0.3 and above
Note: the consequence is that HAIP will be disabled. Once the cause is identified and solution is implemented, HAIP needs to be enabled when there‘s an outage window. To enable, as root on ALL nodes:
# $GRID_HOME/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=1" -init
# $GRID_HOME/bin/crsctl stop crs
# $GRID_HOME/bin/crsctl start crs
观察集群私网信息,HAIP再节点2是正常的
# ./crsctl stat res ora.cluster_interconnect.haip -init
NAME=ora.cluster_interconnect.haip
TYPE=ora.haip.type
TARGET=ONLINE
STATE=ONLINE on rac2
查询发现HAIP已经绑定在bond0上了,因此私网IP再节点1上,存在异常现象。
[[email protected] bin]# netstat -rn
Destination Gateway Genmask Flags MSS Window irtt Iface
10.10.10.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
10.118.7.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
0.0.0.0 10.118.7.1 0.0.0.0 UG 0 0 0 eth0
原文链接:https://blog.csdn.net/evils798/article/details/27248263
操作,为私网配置网关,重启网络服务,重新安装(建议可以尝试上述链接,指定IP的网卡路由)
重新配置私网网关地址之后,重新安装RAC ,再次执行root.sh 报错,本次169.254 的Oracle haip地址使用的是eth1的私网网卡地址,但是发现
节点1无法ping通节点2的haip地址。
ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481 (Doc ID 1383737.1)
并且,grid$sqlplus / as sysasm
SQL>startup 报错如上述,还是私网HAIP不通的问题。
Case5. HAIP is up on all nodes and route info is presented but HAIP is not pingable
Symptom:
HAIP is presented on both nodes and route information is also presented, but both nodes can not ping or traceroute against the other node HAIP.
······
Solution:
For Openstack Cloud implementation, engage system admin to create another neutron port to map link-local traffic. For other environment, engage SysAdmin/NetworkAdmin to review routing/network setup.
解决方案是让网络工程师调整,但是云厂商很难具体开通HAIP之间的连接。
本次选择禁用HAIP 服务,达到云环境安装目的。
禁用HAIP
参考链接 http://blog.itpub.net/23135684/viewspace-752721/ https://blog.csdn.net/ctypyb2002/article/details/90705436 https://blog.51cto.com/snowhill/2045748 root.sh 再节点2执行之后,集群报错之后 节点2 root执行如下修改命令,重启CRS后,再执行root.sh脚本 crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=0" -init crsctl modify res ora.asm -attr
"START_DEPENDENCIES=‘hard(ora.cssd,ora.ctssd)pullup(ora.cssd,ora.ctssd)weak(ora.drivers.acfs)‘,
STOP_DEPENDENCIES=‘hard(intermediate:ora.cssd)‘ " -init 随后节点1,root执行上述修改命令
随后,GRID安装可以顺利进行,并且ORACLE 集群软件也正常安装完毕。
DBCA建库遇到报错
crs 2672 crs 5017 报错,并且异常现象是DBCA建库,只有一个节点的DB能正常启动,另一个节点DB用于启动失败。
与Grid root.sh安装报错类型,后面发现当HAIP不通后,禁用后,还需要执行如下步骤。
ASM实例,及DB实例,都需要修改,本次节点使用私网IP地址
SQL> alter system set cluster_interconnects = ‘10.10.10.3‘ scope=spfile sid=‘ ASM1‘ ;
SQL> alter system set cluster_interconnects = ‘10.10.10.4‘ scope=spfile sid=‘ ASM2‘ ;
DB
SQL> alter system set cluster_interconnects = ‘10.10.10.3‘ scope=spfile sid=‘orcl1‘
SQL> alter system set cluster_interconnects = ‘10.10.10.4‘ scope=spfile sid=‘orcl2‘ ;
实际操作步骤是DBCA建库后,一个DBOK之后,修改DB,ASM参数后,重启集群,DB后,第二个节点的实例也能正常启动了。本次安装成功。
建议:禁用HAIP后,GRID集群安装成功之后,修改ASM参数,后续DBCA不知道是否能避免这个问题。