本文中针对其中11gR2的RAC、ClusterWare进程做学习整理。介绍了各进程的作用,产生的顺序,以及相应日志位置。
其理论说明大多来自oracle官方和网络。参考引用学习过的地址在后面有交待。
1. RAC数据库进程
进程实例
下面是一个实际的进程列表,按PID的顺序列出,PID的顺序代表了进程实际的启动顺序。和RAC相关的进程分别有root、grid和oracle用户启动的。这些RAC进程包括了ClusterWare、ASM、Database以及RAC特有的一些进程。
说明:为了可读强,用$GRID_HOME代替了实际目录。
#ps aux
root 1208 1 0 22:12 ? 00:00:00 /bin/sh /etc/init.d/init.ohasdrun
grid 2342 1 0 22:13 ? 00:00:14 $GRID_HOME/bin/oraagent.bin
grid 2354 1 0 22:13 ? 00:00:00 $GRID_HOME/bin/mdnsd.bin
grid 2364 1 0 22:13 ? 00:00:02 $GRID_HOME/bin/gpnpd.bin
grid 2374 1 0 22:13 ? 00:00:10 $GRID_HOME/bin/gipcd.bin
root 2376 1 0 22:13 ? 00:00:17$GRID_HOME/bin/orarootagent.bin
root 2390 1 2 22:13 ? 00:00:47 $GRID_HOME/bin/osysmond.bin
root 2408 1 0 22:13 ? 00:00:15 $GRID_HOME/bin/cssdmonitor
root 2421 1 0 22:13 ? 00:00:11 $GRID_HOME/bin/cssdagent
grid 2433 1 2 22:13 ? 00:00:59 $GRID_HOME/bin/ocssd.bin
root 2550 1 1 22:13 ? 00:00:22 $GRID_HOME/bin/octssd.binreboot
grid 2572 1 0 22:13 ? 00:00:08 $GRID_HOME/bin/evmd.bin
grid 2645 1 0 22:14 ? 00:00:00 asm_pmon_+ASM1
grid 2647 1 0 22:14 ? 00:00:00 asm_psp0_+ASM1
grid 2649 1 5 22:15 ? 00:02:03 asm_vktm_+ASM1
grid 2653 1 0 22:15 ? 00:00:00 asm_gen0_+ASM1
grid 2655 1 0 22:15 ? 00:00:02 asm_diag_+ASM1
grid 2657 1 0 22:15 ? 00:00:00 asm_ping_+ASM1
grid 2659 1 0 22:15 ? 00:00:08 asm_dia0_+ASM1
grid 2661 1 0 22:15 ? 00:00:05 asm_lmon_+ASM1
grid 2663 1 0 22:15 ? 00:00:04 asm_lmd0_+ASM1
grid 2665 1 1 22:15 ? 00:00:26 asm_lms0_+ASM1
grid 2669 1 0 22:15 ? 00:00:00 asm_lmhb_+ASM1
grid 2671 1 0 22:15 ? 00:00:00 asm_mman_+ASM1
grid 2673 1 0 22:15 ? 00:00:00 asm_dbw0_+ASM1
grid 2675 1 0 22:15 ? 00:00:00 asm_lgwr_+ASM1
grid 2677 1 0 22:15 ? 00:00:00 asm_ckpt_+ASM1
grid 2679 1 0 22:15 ? 00:00:00 asm_smon_+ASM1
grid 2681 1 0 22:15 ? 00:00:01 asm_rbal_+ASM1
grid 2683 1 0 22:15 ? 00:00:00 asm_gmon_+ASM1
grid 2685 1 0 22:15 ? 00:00:00 asm_mmon_+ASM1
grid 2687 1 0 22:15 ? 00:00:00 asm_mmnl_+ASM1
grid 2694 1 0 22:15 ? 00:00:00 asm_lck0_+ASM1
grid 2696 1 0 22:15 ? 00:00:00 oracle+ASM1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root 2701 1 1 22:15 ? 00:00:25 $GRID_HOME/bin/crsd.bin reboot
grid 2716 1 0 22:15 ? 00:00:00 oracle+ASM1_ocr(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
grid 2719 1 0 22:15 ? 00:00:00 asm_asmb_+ASM1
grid 2721 1 0 22:15 ? 00:00:00 oracle+ASM1_asmb_+asm1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
grid 2784 2572 0 22:15 ? 00:00:00 $GRID_HOME/bin/evmlogger.bin-o $GRID_HOME/evm/log/evmlogger.info -l $GRID_HOME/evm/log/evmlogger.log
grid 2825 1 0 22:15 ? 00:00:07 $GRID_HOME/bin/oraagent.bin
root 2833 1 1 22:15 ? 00:00:23 $GRID_HOME/bin/orarootagent.bin
grid 2867 1 0 22:15 ? 00:00:00 oracle+ASM1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
grid 2871 1 0 22:15 ? 00:00:00 oracle+ASM1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
grid 2923 1 0 22:16 ? 00:00:00 $GRID_HOME/bin/tnslsnrLISTENER -inherit
grid 3021 1 0 22:19 ? 00:00:00 $GRID_HOME/opmn/bin/ons -d
grid 3022 3021 0 22:19 ? 00:00:00 $GRID_HOME/opmn/bin/ons -d
grid 3070 1 0 22:19 ? 00:00:00 $GRID_HOME/bin/tnslsnrLISTENER_SCAN1 -inherit
grid 3158 1 0 22:20 ? 00:00:00 oracle+ASM1_asmb_tan1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 3045 1 0 22:19 ? 00:00:12 $GRID_HOME/bin/oraagent.bin
oracle 3101 1 0 22:20 ? 00:00:00 ora_pmon_tan1
oracle 3103 1 0 22:20 ? 00:00:00 ora_psp0_tan1
oracle 3105 1 6 22:20 ? 00:01:45 ora_vktm_tan1
oracle 3110 1 0 22:20 ? 00:00:00 ora_gen0_tan1
oracle 3112 1 022:20 ? 00:00:01 ora_diag_tan1
oracle 3114 1 0 22:20 ? 00:00:00 ora_dbrm_tan1
oracle 3116 1 0 22:20 ? 00:00:00 ora_ping_tan1
oracle 3118 1 0 22:20 ? 00:00:00 ora_acms_tan1
oracle 3120 1 0 22:20 ? 00:00:07 ora_dia0_tan1
oracle 3122 1 022:20 ? 00:00:06 ora_lmon_tan1
oracle 3124 1 022:20 ? 00:00:05 ora_lmd0_tan1
oracle 3126 1 322:20 ? 00:00:55 ora_lms0_tan1
oracle 3130 1 0 22:20 ? 00:00:00 ora_rms0_tan1
oracle 3132 1 0 22:20 ? 00:00:00 ora_lmhb_tan1
oracle 3134 1 0 22:20 ? 00:00:00 ora_mman_tan1
oracle 3136 1 0 22:20 ? 00:00:00 ora_dbw0_tan1
oracle 3138 1 0 22:20 ? 00:00:00 ora_lgwr_tan1
oracle 3140 1 0 22:20 ? 00:00:00 ora_ckpt_tan1
oracle 3142 1 0 22:20 ? 00:00:01 ora_smon_tan1
oracle 3144 1 0 22:20 ? 00:00:00 ora_reco_tan1
oracle 3146 1 0 22:20 ? 00:00:00 ora_rbal_tan1
oracle 3148 1 0 22:20 ? 00:00:00 ora_asmb_tan1
oracle 3150 1 0 22:20 ? 00:00:02 ora_mmon_tan1
oracle 3152 1 0 22:20 ? 00:00:00 ora_mmnl_tan1
oracle 3154 1 0 22:20 ? 00:00:00 ora_d000_tan1
oracle 3156 1 0 22:20 ? 00:00:00 ora_s000_tan1
oracle 3160 1 0 22:20 ? 00:00:00 ora_mark_tan1
oracle 3173 1 022:21 ? 00:00:03 ora_lck0_tan1
oracle 3175 1 0 22:21 ? 00:00:00 ora_rsmn_tan1
oracle 3178 1 0 22:21 ? 00:00:02 oracletan1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 3208 1 0 22:22 ? 00:00:00 ora_gtx0_tan1
oracle 3210 1 0 22:22 ? 00:00:00 ora_rcbg_tan1
oracle 3213 1 0 22:22 ? 00:00:00 ora_qmnc_tan1
oracle 3217 1 0 22:22 ? 00:00:00 ora_q000_tan1
oracle 3220 1 0 22:22 ? 00:00:00 ora_q001_tan1
oracle 3227 1 0 22:22 ? 00:00:00 oracletan1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 3249 1 0 22:22 ? 00:00:01 ora_cjq0_tan1
oracle 3252 1 0 22:22 ? 00:00:03 ora_vkrm_tan1
oracle 3308 1 0 22:27 ? 00:00:00 ora_smco_tan1
root 3368 1 1 22:31 ? 00:00:18 $GRID_HOME/bin/ologgerd -mnode2 -r -d $GRID_HOME/crf/db/node1
oracle 3446 1 0 22:37 ? 00:00:01 ora_gcr0_tan1
oracle 3571 1 0 22:47 ? 00:00:00 ora_w000_tan1
ClusterWare架构图
下面有两张关于clusterware启动顺利的架构图:
再看看下面转载的更详细的说明:
Level 1: OHASDSpawns:
cssdagent - Agent responsible forspawning CSSD.
orarootagent - Agent responsible for managing all root ownedohasd resources.
oraagent - Agent responsible for managingall oracle owned ohasd resources.
cssdmonitor - Monitors CSSD and nodehealth (along wth the cssdagent).
Level 2: OHASDrootagent spawns:
CSSD (ora.cssd) - Cluster Synchronization Services
CRSD(ora.crsd) - Primary daemon responsible for managingcluster resources.
CTSSD(ora.ctssd) - Cluster Time Synchronization ServicesDaemon
Diskmon(ora.diskmon)
ACFS (ASM Cluster File System) Drivers
Level 2: OHASDoraagent spawns:
MDNSD(ora.mdnsd) - Used for DNS lookup
GIPCD(ora.gipcd) - Used for inter-process and inter-nodecommunication
GPNPD(ora.gpnpd) - Grid Plug & Play Profile Daemon
EVMD(ora.evmd) - Event Monitor Daemon
ASM(ora.asm) - Resource for monitoring ASM instances
Level 3: CRSDspawns:
orarootagent - Agent responsible for managing all root ownedcrsd resources.
oraagent - Agent responsible for managingall oracle owned crsd resources.
Level 4: CRSDrootagent spawns:
Network resource - To monitor the public network
SCAN VIP(s) - Single Client Access Name Virtual IPs
Node VIPs - One per node
ACFS Registery - For mounting ASM Cluster File System
GNS VIP (optional) - VIP for GNS
Level 4: CRSDoraagent spawns:
ASM Resouce - ASM Instance(s) resource
Diskgroup - Used for managing/monitoringASM diskgroups.
DB Resource - Used for monitoring and managing the DB andinstances
SCAN Listener - Listener for single client access name,listening on SCAN VIP
Listener - Node listener listening on theNode VIP
Services - Used for monitoring andmanaging services
ONS - Oracle Notification Service
eONS - Enhanced Oracle NotificationService
GSD - For 9i backward compatibility
GNS (optional) - Grid Naming Service - Performs nameresolution
RAC进程介绍
LMSn:
对应GCS(GlobalCache Server)服务,负责数据块在实例间的传递,是Cache Fusion的主要进程。这个进程的数据通过参数GCS_SERVER_PROCESSES来控制,缺省为2,取值范围为0-9.
oracle官方文档的描述 :Processes that manage remote messages. Oracle RAC providesfor up to 10 Global Cache Service Processes. The number of LMSn varies dependingon the amount of messaging traffic among nodes in the cluster.
LMD:
对应GES(GlobalEnqueue Service)服务。这个进程负责多个实例之间协调对数据块的访问顺序,保证数据的一致性访问。
它和LMSn进程的GCS服务,还有GRD共同构成RAC最核心的功能 CacheFusion。
oracle官方文档的描述 :The resource agent process that manages requests forresources to control access to blocks. The LMD process also handles deadlockdetection and remote resource requests. Remote resource requests are requestsoriginating from another instance.
LCK
这个进程负责Non-CacheFusion资源的同步访问,每个实例有1个LCK进程。
是实例间关于Library Cache的锁,除传统单实例的LibraryCache Lock外,RAC中LCK0会对本实例Library Cache中的对象加1个Shared-mode为IV(Invalidation)的Instance Lock。
LMON
对应的服务是CGS(ClusterGroup Services)。各个实例的LMON进程会定期通信,以检查集群中各节点的健康状态,当某个节点出现故障时,负责集群重构、GRD恢复等操作。
LMON可以和下层的ClusterWare合作也可以单独工作。当LMON检测到实例级别的“脑裂”时,LMON会选通知下层的Clusterware,期待借助于Clusterware解决”脑裂”,但是RAC并不假设Clusterware肯定能够解决问题,因此,LMON进程不会无尽待Clusterware层的处理结果。如果发生等待超时,LMON进程会自动触发IMR(Instance Membership Recovery也叫Instance MembershipReconfiguration).LMON进程提供的IMR功能可以看作是Oracle在数据库层提供的“脑裂”、“IO隔离”机制。
LMON主要借助两种心跳机制来完成健康监测。
1. 节点间的网络心跳(Network Heartbeat):节点间定时发送Ping包检测节点状态。
2. 通过控制文件的磁盘心跳(Controlfile Heartbeat):每个节点的CKPT进程每个3秒钟更新一次控制文件一个数据块,这个数据块叫作Checkpoint Progress Record;控制文件是共享的,因此实例间可以相互检查对方是否及时更新以判断状态 。
SQL> selectinst_id,cphbt from x$kcccp;
INST_ID CPHBT
--------------------
1 825262660
1 825210052
oracle官方文档的描述 :The background LMON process monitors the entire cluster tomanage global resources. LMON manages instance deaths and the associatedrecovery for any failed instance. In particular, LMON handles the part ofrecovery associated with global resources. LMON-provided services are alsoknown as Cluster Group Services.
DIAG
Diag进程监控实例的健康状态,并在实例出现运行错误时收集诊断数据记录到alert.log中。
GSD
这个进程负责从客户端工具,比如srvctl 接收用户命令,为用户提供管理接口。
上面实例中GSD没有启动,故后台进程中没有。
ora.node1.gsd application OFFLINE OFFLINE
只有当CRS或者GI上需要管理9i的数据库时才需要。gsd进程是给9i racclient提供的图形接口服务,在10g以上版本可以禁用该服务,具体参见[ID 429966.1]
The function of GSD (10g and above)is to service requests for 9i RAC management clients and therefore whenthere are no 9i databases present, there is nothing for GSD to do.Consequently, there will be no impact on a RAC cluster if GSD is offlineand 9i is not used.
If gsd fails to start due to whetever reasons then bestthing is to work with Oracle support to analyze and fix the issue. Until thattime, gsd can be temporarily disabled.
In 11.2 GSD is disabled by default and the service willshow as target:offline, status:offline.
oracle官方文档的描述 :Acomponent that receives requests from SRVCTL to execute administrative jobtasks, such as startup or shutdown. The command is executed locally on eachnode, and the results are returned to SRVCTL. GSD is installed on the nodes bydefault.
ClusterWare进程
还是以前面的例子为例说明
root 2019 1 TS 19 Sep04 ? 00:01:07 $GRID_HOME/bin/ohasd.binreboot
grid 2342 1 TS 19 Sep04 ? 00:00:54 $GRID_HOME/bin/oraagent.bin
grid 2354 1 TS 19 Sep04 ? 00:00:00 $GRID_HOME/bin/mdnsd.bin
grid 2364 1 TS 19 Sep04 ? 00:00:05 $GRID_HOME/bin/gpnpd.bin
grid 2374 1 TS 19 Sep04 ? 00:00:30 $GRID_HOME/bin/gipcd.bin
root 2390 1 RR 139 Sep04 ? 00:02:31 $GRID_HOME/bin/osysmond.bin
root 2376 1 TS 19 Sep04 ? 00:01:02$GRID_HOME/bin/orarootagent.bin
root 2408 1 RR 139 Sep04 ? 00:00:46 $GRID_HOME/bin/cssdmonitor
root 2421 1 RR 139 Sep04 ? 00:00:38 $GRID_HOME/bin/cssdagent
grid 2433 1 RR 139 Sep04 ? 00:02:54 $GRID_HOME/bin/ocssd.bin
root 2550 1 TS 19 Sep04 ? 00:01:04 $GRID_HOME/bin/octssd.binreboot
grid 2572 1 TS 19 Sep04 ? 00:00:24 $GRID_HOME/bin/evmd.bin
root 2701 1 TS 19 Sep04 ? 00:01:14 $GRID_HOME/bin/crsd.bin reboot
grid 2825 1 TS 19 Sep04 ? 00:00:32 $GRID_HOME/bin/oraagent.bin
root 2833 1 TS 19 Sep04 ? 00:01:27$GRID_HOME/bin/orarootagent.bin
grid 3021 1 TS 19 Sep04 ? 00:00:00/u01/grid/11.2.0/gridhome/opmn/bin/ons -d
grid 3022 3021 TS 19 Sep04 ? 00:00:07 /u01/grid/11.2.0/gridhome/opmn/bin/ons-d
oracle 3045 1 TS 19 Sep04 ? 00:00:58 $GRID_HOME/bin/oraagent.bin
root 3368 1 RR 139 Sep04 ? 00:01:53 $GRfsD_HOME/bin/ologgerd -mnode2 -r -d $GRID_HOME
Oracle Root Agent
orarootagent
root 2376 1 TS 19 Sep04 ? 00:01:02$GRID_HOME/bin/orarootagent.bin
root 2833 1 TS 19 Sep04 ? 00:01:27$GRID_HOME/bin/orarootagent.bin
A specialized oraagent processthat helps crsd manages resources owned by root, such as thenetwork, and the Grid virtual IP address.
The above 2 process are actually threadswhich looks like processes. This is a Linux specific
这个进程有2个,1个是由ohasd生成,用来来管理ora.crsd,ora.ctssd, ora.diskmon, ora.drivers.acfs,另1个是由crsd产生。用来管理GNS,VIP, SCAN VIP and network resources
Oracle Agent
grid 2342 1 TS 19 Sep04 ? 00:00:54 $GRID_HOME/bin/oraagent.bin
grid 2825 1 TS 19 Sep04 ? 00:00:32 $GRID_HOME/bin/oraagent.bin
oracle 3045 1 TS 19 Sep04 ? 00:00:58 $GRID_HOME/bin/oraagent.bin
oraagent
Extends clusterware to support Oracle-specific requirementsand complex resources. This process runs server callout scripts when FAN eventsoccur. This process was known as RACG in Oracle Clusterware 11g Release 1(11.1).
这个进程有3个。具体作用参见后面章节介绍。
-------------关于进程orarootageent 和oraagent更多说明
Cluster Synchronization Service (CSS)
用于管理与协调集群中各节点的关系,并用于节点间通信,当节点在加入或离开集群时,都由css进行通知集群。
cssdmonitor
Monitors node hangs(via oprocdfunctionality) and monitors OCCSD process hangs (via oclsomon functionality)and monitors vendor clusterware(via vmon functionality).This is the multithreaded process that runs with elavated priority.
Startup sequence: INIT --> init.ohasd--> ohasd --> ohasd.bin --> cssdmonitor
cssdagent
Spawned by OHASD process.Previously(10g)oprocd, responsible for I/O fencing.Killing this process would cause nodereboot.Stops,start checks the status of occsd.bin daemon
Startup sequence: INIT --> init.ohasd--> ohasd --> ohasd.bin --> cssdagent
occsd
如果这个进程出现异常,会导致系统重启。这个进程提供CSS(Cluster Synchronization Serviee)服务。CSS服务通过多种心跳机制,实时监控集群健康状态,提供脑裂保护等基础集群服务功能。
如果节点发生了主机自动重启,需要查看ocssd的日志,位于: <CRS_HOME>/log/<host>/cssd。
Manages cluster node membership runs asoragrid user.Failure of this process results in node restart.
Startup sequence: INIT --> init.ohasd--> ohasd --> ohasd.bin --> cssdagent --> ocssd --> ocssd.bin
Cluster Time Synchronization Service (CTSS)
octssd
Provides Time Management in a cluster for Oracle Clusterware.
11g提供的时间同步进程,如果有操作系统ntpd同步时间,则该进程在观察状态,采用ntpd同步时间,如果操作系统同步时间失败。则由CTSS同时时间。
Cluster Ready Services (CRS)
crsd (Cluster Ready ServicesDaemon)
是管理群集内高可用操作的主要程序,在集群中CRS管理所有资源,包括数据库、服务、实例、vip地址、监听器、应用进程等。该进程可以对集群资源进行启动、停止、监视和容错等操作,正常状态下,crsd.bin监控节点各种资源,当某个资源发生异常时,自动重启或者切换该资源。
当发现某些资源异常终止后,首先需要查看crsd的日志:<CRS_HOME>/log/<host>/crsd。
Theprimary Oracle Clusterware process that performs high availability recovery andmanagement operations, such as maintaining OCR. Also manages applicationresources and runs as root
user (or by a user in the admin
groupon Mac OS X-based systems) and restarts automatically upon failure.
The above process is responsible forstart, stop, monitor and failover of resource. It maintains OCR and alsorestarts the resources when the failure occurs.
This is applicable for RAC systems. ForOracle Restart and ASM ohasd is used
CHM(ClusterHealth Monitor)
CHM用来收集RAC环境操作系统性能数据,可以用此来分析节点驱逐问题。是11.2.0.2的新特性。
进程:osysmond.bin
The monitoring and operating system metric collection servicethat sends the data to the cluster logger service. This service runs on everynode in a cluster
这个进程在所有节点上运行,负责监控和收集本地操作系统的性能数据,并将本节点其收集到的信息发送给ologgerd进程。
进程:ologgerd
Receives information from all the nodes in the cluster andpersists in a CHM repository-based database. This service runs on only two nodesin a cluster
这个进程在所有节点上运行,但是属于primary-standby的模式,也就是真正工作的只有运行在master节点的primary,其它节点上的进程作为备用。这个进程接收来自所有节点osysmond收集的信息,并将其存入到Berkeley DB(BDB),在存入以前它会对原始数据进行压缩以节约空间。
其他类别进程
evmd:
事件监控(event monitor)进程,由它来发布集群事件,比如实例启动、停止等事件。
ons:
Oracle Notification Service daemon,它用于接收evmd发来的集群事件,然后将这些事件发送给应用预订者或者本地的监听,这样就可以实现FAN(Fast Application Notification),应用能够接收到这些事件并进行处理。
gpnpd
Provides access to the Grid Plug and Play profile, andcoordinates updates to the profile among the nodes of the cluster to ensure thatall of the nodes have the most recent profile.
发布构建集群所需要的bootstrap 信息,并且在集群的所有节点之间同步gpnp profile。
它的日志位于:<GRID_HOME>/log/<host>/gpnpd
gipcd
A support daemon that enables RedundantInterconnect Usage.
这个进程负责管理集群中所有的私网(cluster interconnect)网卡。私网信息是通过gpnpd获得的。
它的日志位于:<GRID_HOME>/log/<host>/gipcd
mdnsd
Used by Grid Plug and Play to locate profiles in the cluster,as well as by GNS to perform name resolution. The mDNS process is a backgroundprocess on Linux and UNIX and on Windows.
这个进程通过多播(Multicast)发现集群中的节点和所有的网卡信息。一定要确定集群中的网卡支持多播,而且节点间的通信正常。
它的日志位于:<GRID_HOME>/log/<host>/mdnsd
Oracle Grid NamingService (GNS)
gnsd.bin
Handles requests sent by external DNSservers, performing name resolution for names defined by the cluster.
(可选,前面实例没有这个进程):Grid Naming Service. 相当于子DNS,功能和DNS类似,会取代使用/etc/hosts进行主机的解析。
它的日志位于:<GRID_HOME>/log/<host>/gnsd
2. 关于进程orarootageent和oraagent更多说明。
OracleClusterware 11g Release 2 (11.2) introduces a new agent concept which makes theOracle Clusterware more robust and performant. These agents are multi-threadeddaemons which implement entry points for multiple resource types and whichspawn new processes for different users. The agents are highly available andbesides the oraagent, orarootagent and cssdagent/cssdmonitor, there can be anapplication agent and a script agent.
The two mainagents are the oraagent and the orarootagent. Both ohasd and crsd employ oneoraagent and one orarootagent each. If the CRS user is different from theORACLE user, then crsd would utilize two oraagents and one orarootagent.
Oraagent
Oraagent总共有3个,1个是由ohasd产生,另2个由crsd产生。
ohasd’s oraagent:
Performs start/stop/check/clean actions forora.asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd
crsd’s oraagent:
crsd的oraagent有两个进程,一个由grid用户的,一个是oracle用户的。
Performsstart/stop/check/clean actions for ora.asm, ora.eons, ora.LISTENER.lsnr, SCANlisteners, ora.ons,Performs start/stop/check/clean actions for service,database and diskgroup resources,Receives eONS events, and translates andforwards them to interested clients (eONS will be removed and its functionalityincluded in EVM in 11.2.0.2),Receives CRS state change events and dequeues RLBevents and enqueues HA events for OCI and ODP.NET clients
orarootagent
ohasd’s orarootagent:
Performs start/stop/check/clean actions forora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs, ora.crf (11.2.0.2)
crsd’s orarootagent:
Performs start/stop/check/clean actions forGNS, VIP, SCAN VIP and network resources
Agent Log Files
OHASD/CRSDagents的日志位置
The log files for the ohasd/crsd agents are located in :
Grid_home/log/<hostname>/agent/{ohasd|crsd}/<agentname>_<owner>/<agentname>_<owner>.log. For example, for ora.crsd, which is managed by ohasd and owned byroot,
the agent log file is named :
Grid_home/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log
The same agent log file can have logmessages for more than one resource, if those resources are managed by the samedaemon.
If an agent process crashes,a core filewill be written to
Grid_home/log/<hostname>/agent/{ohasd|crsd}/<agentname>_<owner>,
And a call stack will be written to
Grid_home/log/<hostname>/agent/{ohasd|crsd}/<agentname>_<owner>/<agentname>_<owner>OUT.log
The agent log file format is the following:
<timestamp>:[<component>][<threadid>]…
<timestamp>:[<component>][<threadid>][<entry point>]…
3.集群中的其他名词
DLM:分布是锁管理
DistributedLock Management。Oracle中在9之前叫PCM,9开始叫Cache Fusion
OPS:
Oracle并行服务器,9之前对RAC的叫法,OracleParallel Server
PCM:
9之前的术语,对应9之后叫cacheFusion,PCM分PCM resource和Non-PCM Resource
Cache Fusion:
是oracle9之后的DLM,分CacheFusion resource和Non-Cache Fusion Resource
Non-Cache Fusion
存放的是对象定义、LibraryCache中存放的SQL代码、执行计划等。
GRD:
GlobalResource Directory,记录每个数据块在集群间的分布图,位于每个实例的SGA中,但每个实例SGA中的都是部分GRD,所有实例的GRD汇总在一起才是一个完整的GRD。
PCM Lock:
Shadownode的GRD记录的信息。
OHAS:
HighAvailability Services,OHAS是服务器启动后打开的第一个Grid Infrastructure组件。它被配置为以init(1)打开,并负责生成agent进程
CRSD:
集群软件的后台主要进程,使用oracle集群注册信息来管理集群中的资源
4.引用转载参考文献:
RAC进程:
《大话Oracle_RAC__集群_高可用性_备份与恢复》
http://blog.csdn.net/inthirties/article/details/4875535
启动GSD:http://hi.baidu.com/mediinfodba/item/1b2889ab399949f05bf191c8
Clusterware进程:
https://blogs.oracle.com/myoraclediary/entry/clusterware_processes_in_11g_rac
http://reneantunez.blogspot.com/2012/10/rac-11gr2-clusterware-startup-sequence.html
http://www.unitask.com/oracledaily/2013/02/28/oracle-crsgi-%E8%BF%9B%E7%A8%8B%E4%BB%8B%E7%BB%8D/
ologgerd与osysmond
http://aprakash.wordpress.com/2011/03/03/ologgerd-daemon-11gr2/
clusterHealth monitor (CHM)
http://www.dbaleet.org/understand_rac_cluster_health_monitor_overview_of_chm/
http://www.dbaleet.org/understand_rac_cluster_health_monitor_usage_of_chm/