11gR2 RAC进程-顺序-日志

时间:2022-06-26 19:45:14

本文中针对其中11gR2的RAC、ClusterWare进程做学习整理。介绍了各进程的作用,产生的顺序,以及相应日志位置。

其理论说明大多来自oracle官方和网络。参考引用学习过的地址在后面有交待。

1.   RAC数据库进程

进程实例

下面是一个实际的进程列表,按PID的顺序列出,PID的顺序代表了进程实际的启动顺序。和RAC相关的进程分别有root、grid和oracle用户启动的。这些RAC进程包括了ClusterWare、ASM、Database以及RAC特有的一些进程。

 

说明:为了可读强,用$GRID_HOME代替了实际目录。

#ps aux

root      1208    1  0 22:12 ?        00:00:00 /bin/sh /etc/init.d/init.ohasdrun

grid      2342    1  0 22:13 ?        00:00:14 $GRID_HOME/bin/oraagent.bin

grid      2354    1  0 22:13 ?        00:00:00 $GRID_HOME/bin/mdnsd.bin

grid      2364    1  0 22:13 ?        00:00:02 $GRID_HOME/bin/gpnpd.bin

grid      2374    1  0 22:13 ?        00:00:10 $GRID_HOME/bin/gipcd.bin


 

root      2376    1  0 22:13 ?        00:00:17$GRID_HOME/bin/orarootagent.bin

root      2390    1  2 22:13 ?        00:00:47 $GRID_HOME/bin/osysmond.bin

root      2408    1  0 22:13 ?        00:00:15 $GRID_HOME/bin/cssdmonitor

root      2421    1  0 22:13 ?        00:00:11 $GRID_HOME/bin/cssdagent

grid      2433    1  2 22:13 ?        00:00:59 $GRID_HOME/bin/ocssd.bin

root      2550    1  1 22:13 ?        00:00:22 $GRID_HOME/bin/octssd.binreboot

grid      2572    1  0 22:13 ?        00:00:08 $GRID_HOME/bin/evmd.bin

grid      2645    1  0 22:14 ?        00:00:00 asm_pmon_+ASM1

grid      2647    1  0 22:14 ?        00:00:00 asm_psp0_+ASM1

grid      2649    1  5 22:15 ?        00:02:03 asm_vktm_+ASM1

grid      2653    1  0 22:15 ?        00:00:00 asm_gen0_+ASM1

grid      2655    1  0 22:15 ?        00:00:02 asm_diag_+ASM1

grid      2657    1  0 22:15 ?        00:00:00 asm_ping_+ASM1

grid      2659    1  0 22:15 ?        00:00:08 asm_dia0_+ASM1

grid      2661    1  0 22:15 ?        00:00:05 asm_lmon_+ASM1

grid      2663    1  0 22:15 ?        00:00:04 asm_lmd0_+ASM1

grid      2665    1  1 22:15 ?        00:00:26 asm_lms0_+ASM1

grid      2669    1  0 22:15 ?        00:00:00 asm_lmhb_+ASM1

grid      2671    1  0 22:15 ?        00:00:00 asm_mman_+ASM1

grid      2673    1  0 22:15 ?        00:00:00 asm_dbw0_+ASM1

grid      2675    1  0 22:15 ?        00:00:00 asm_lgwr_+ASM1

grid      2677    1  0 22:15 ?        00:00:00 asm_ckpt_+ASM1

grid      2679    1  0 22:15 ?        00:00:00 asm_smon_+ASM1

grid      2681    1  0 22:15 ?        00:00:01 asm_rbal_+ASM1

grid      2683    1  0 22:15 ?        00:00:00 asm_gmon_+ASM1

grid      2685    1  0 22:15 ?        00:00:00 asm_mmon_+ASM1

grid      2687    1  0 22:15 ?        00:00:00 asm_mmnl_+ASM1

grid      2694    1  0 22:15 ?        00:00:00 asm_lck0_+ASM1

grid      2696    1  0 22:15 ?        00:00:00 oracle+ASM1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

root      2701    1  1 22:15 ?        00:00:25 $GRID_HOME/bin/crsd.bin reboot

grid      2716    1  0 22:15 ?        00:00:00 oracle+ASM1_ocr(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

grid      2719    1  0 22:15 ?        00:00:00 asm_asmb_+ASM1

grid      2721    1  0 22:15 ?        00:00:00 oracle+ASM1_asmb_+asm1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

grid      2784 2572  0 22:15 ?        00:00:00 $GRID_HOME/bin/evmlogger.bin-o $GRID_HOME/evm/log/evmlogger.info -l $GRID_HOME/evm/log/evmlogger.log

grid      2825    1  0 22:15 ?        00:00:07 $GRID_HOME/bin/oraagent.bin

root      2833    1  1 22:15 ?        00:00:23 $GRID_HOME/bin/orarootagent.bin

grid      2867    1  0 22:15 ?        00:00:00 oracle+ASM1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

grid      2871    1  0 22:15 ?        00:00:00 oracle+ASM1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

grid      2923    1  0 22:16 ?        00:00:00 $GRID_HOME/bin/tnslsnrLISTENER -inherit

grid      3021    1  0 22:19 ?        00:00:00 $GRID_HOME/opmn/bin/ons -d

grid      3022 3021  0 22:19 ?        00:00:00 $GRID_HOME/opmn/bin/ons -d

grid      3070    1  0 22:19 ?        00:00:00 $GRID_HOME/bin/tnslsnrLISTENER_SCAN1 -inherit

grid      3158    1  0 22:20 ?        00:00:00 oracle+ASM1_asmb_tan1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

oracle    3045    1  0 22:19 ?        00:00:12 $GRID_HOME/bin/oraagent.bin

oracle    3101    1  0 22:20 ?        00:00:00 ora_pmon_tan1

oracle    3103    1  0 22:20 ?        00:00:00 ora_psp0_tan1

oracle    3105    1  6 22:20 ?        00:01:45 ora_vktm_tan1

oracle    3110    1  0 22:20 ?        00:00:00 ora_gen0_tan1

oracle    3112     1  022:20 ?        00:00:01 ora_diag_tan1

oracle    3114    1  0 22:20 ?        00:00:00 ora_dbrm_tan1

oracle    3116    1  0 22:20 ?        00:00:00 ora_ping_tan1

oracle    3118    1  0 22:20 ?        00:00:00 ora_acms_tan1

oracle    3120    1  0 22:20 ?        00:00:07 ora_dia0_tan1

oracle    3122     1  022:20 ?        00:00:06 ora_lmon_tan1

oracle    3124     1  022:20 ?        00:00:05 ora_lmd0_tan1

oracle    3126     1  322:20 ?        00:00:55 ora_lms0_tan1

oracle    3130    1  0 22:20 ?        00:00:00 ora_rms0_tan1

oracle    3132    1  0 22:20 ?        00:00:00 ora_lmhb_tan1

oracle    3134    1  0 22:20 ?        00:00:00 ora_mman_tan1

oracle    3136    1  0 22:20 ?        00:00:00 ora_dbw0_tan1

oracle    3138    1  0 22:20 ?        00:00:00 ora_lgwr_tan1

oracle    3140    1  0 22:20 ?        00:00:00 ora_ckpt_tan1

oracle    3142    1  0 22:20 ?        00:00:01 ora_smon_tan1

oracle    3144    1  0 22:20 ?        00:00:00 ora_reco_tan1

oracle    3146    1  0 22:20 ?        00:00:00 ora_rbal_tan1

oracle    3148    1  0 22:20 ?        00:00:00 ora_asmb_tan1

oracle    3150    1  0 22:20 ?        00:00:02 ora_mmon_tan1

oracle    3152    1  0 22:20 ?        00:00:00 ora_mmnl_tan1

oracle    3154    1  0 22:20 ?        00:00:00 ora_d000_tan1

oracle    3156    1  0 22:20 ?        00:00:00 ora_s000_tan1

oracle    3160    1  0 22:20 ?        00:00:00 ora_mark_tan1

oracle    3173     1  022:21 ?        00:00:03 ora_lck0_tan1

oracle    3175    1  0 22:21 ?        00:00:00 ora_rsmn_tan1

oracle    3178    1  0 22:21 ?        00:00:02 oracletan1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

oracle    3208    1  0 22:22 ?        00:00:00 ora_gtx0_tan1

oracle    3210    1  0 22:22 ?        00:00:00 ora_rcbg_tan1

oracle    3213    1  0 22:22 ?        00:00:00 ora_qmnc_tan1

oracle    3217    1  0 22:22 ?        00:00:00 ora_q000_tan1

oracle    3220    1  0 22:22 ?        00:00:00 ora_q001_tan1

oracle    3227    1  0 22:22 ?        00:00:00 oracletan1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

oracle    3249    1  0 22:22 ?        00:00:01 ora_cjq0_tan1

oracle    3252    1  0 22:22 ?        00:00:03 ora_vkrm_tan1

oracle    3308    1  0 22:27 ?        00:00:00 ora_smco_tan1

root      3368    1  1 22:31 ?        00:00:18 $GRID_HOME/bin/ologgerd -mnode2 -r -d $GRID_HOME/crf/db/node1

oracle    3446    1  0 22:37 ?        00:00:01 ora_gcr0_tan1

oracle    3571    1  0 22:47 ?        00:00:00 ora_w000_tan1

 

ClusterWare架构图

下面有两张关于clusterware启动顺利的架构图:

 11gR2 RAC进程-顺序-日志

11gR2 RAC进程-顺序-日志

再看看下面转载的更详细的说明:

 11gR2 RAC进程-顺序-日志

Level 1: OHASDSpawns:

   cssdagent         - Agent responsible forspawning CSSD.

   orarootagent     - Agent responsible for managing all root ownedohasd resources.

   oraagent         - Agent responsible for managingall oracle owned ohasd resources.

   cssdmonitor        - Monitors CSSD and nodehealth (along wth the cssdagent).

   

Level 2: OHASDrootagent spawns:

   CSSD (ora.cssd)     - Cluster Synchronization Services

   CRSD(ora.crsd)     - Primary daemon responsible for managingcluster resources.

   CTSSD(ora.ctssd)     - Cluster Time Synchronization ServicesDaemon

   Diskmon(ora.diskmon)

   ACFS (ASM Cluster File System) Drivers 

 

Level 2: OHASDoraagent spawns:

 

   MDNSD(ora.mdnsd)     - Used for DNS lookup

   GIPCD(ora.gipcd)     - Used for inter-process and inter-nodecommunication

   GPNPD(ora.gpnpd)     - Grid Plug & Play Profile Daemon

   EVMD(ora.evmd)     - Event Monitor Daemon

   ASM(ora.asm)     - Resource for monitoring ASM instances

 

Level 3: CRSDspawns:

 

   orarootagent     - Agent responsible for managing all root ownedcrsd resources.

   oraagent         - Agent responsible for managingall oracle owned crsd resources.

 

Level 4: CRSDrootagent spawns:

 

   Network resource     - To monitor the public network

   SCAN VIP(s)     - Single Client Access Name Virtual IPs

   Node VIPs         - One per node

   ACFS Registery     - For mounting ASM Cluster File System

   GNS VIP (optional)     - VIP for GNS

 

Level 4: CRSDoraagent spawns:

 

   ASM Resouce     - ASM Instance(s) resource

   Diskgroup         - Used for managing/monitoringASM diskgroups. 

   DB Resource     - Used for monitoring and managing the DB andinstances

   SCAN Listener     - Listener for single client access name,listening on SCAN VIP

   Listener         - Node listener listening on theNode VIP

   Services         - Used for monitoring andmanaging services

   ONS         - Oracle Notification Service

   eONS         - Enhanced Oracle NotificationService

   GSD         - For 9i backward compatibility

   GNS (optional)     - Grid Naming Service - Performs nameresolution

 

RAC进程介绍

LMSn:

         对应GCS(GlobalCache Server)服务,负责数据块在实例间的传递,是Cache Fusion的主要进程。这个进程的数据通过参数GCS_SERVER_PROCESSES来控制,缺省为2,取值范围为0-9.

 

oracle官方文档的描述 Processes that manage remote messages. Oracle RAC providesfor up to 10 Global Cache Service Processes. The number of LMSn varies dependingon the amount of messaging traffic among nodes in the cluster. 

LMD:

         对应GES(GlobalEnqueue Service)服务。这个进程负责多个实例之间协调对数据块的访问顺序,保证数据的一致性访问。

         它和LMSn进程的GCS服务,还有GRD共同构成RAC最核心的功能 CacheFusion。

oracle官方文档的描述 The resource agent process that manages requests forresources to control access to blocks. The LMD process also handles deadlockdetection and remote resource requests. Remote resource requests are requestsoriginating from another instance. 

LCK

         这个进程负责Non-CacheFusion资源的同步访问,每个实例有1个LCK进程。

是实例间关于Library Cache的锁,除传统单实例的LibraryCache Lock外,RAC中LCK0会对本实例Library Cache中的对象加1个Shared-mode为IV(Invalidation)的Instance Lock。

LMON

         对应的服务是CGS(ClusterGroup Services)。各个实例的LMON进程会定期通信,以检查集群中各节点的健康状态,当某个节点出现故障时,负责集群重构、GRD恢复等操作。

LMON可以和下层的ClusterWare合作也可以单独工作。当LMON检测到实例级别的“脑裂”时,LMON会选通知下层的Clusterware,期待借助于Clusterware解决”脑裂”,但是RAC并不假设Clusterware肯定能够解决问题,因此,LMON进程不会无尽待Clusterware层的处理结果。如果发生等待超时,LMON进程会自动触发IMR(Instance Membership Recovery也叫Instance MembershipReconfiguration).LMON进程提供的IMR功能可以看作是Oracle在数据库层提供的“脑裂”、“IO隔离”机制。

LMON主要借助两种心跳机制来完成健康监测。

1.      节点间的网络心跳(Network Heartbeat):节点间定时发送Ping包检测节点状态。

2.      通过控制文件的磁盘心跳(Controlfile Heartbeat):每个节点的CKPT进程每个3秒钟更新一次控制文件一个数据块,这个数据块叫作Checkpoint Progress Record;控制文件是共享的,因此实例间可以相互检查对方是否及时更新以判断状态 。

SQL> selectinst_id,cphbt from x$kcccp;

 

   INST_ID     CPHBT

--------------------

         1 825262660

1        825210052

oracle官方文档的描述 The background LMON process monitors the entire cluster tomanage global resources. LMON manages instance deaths and the associatedrecovery for any failed instance. In particular, LMON handles the part ofrecovery associated with global resources. LMON-provided services are alsoknown as Cluster Group Services. 

DIAG

         Diag进程监控实例的健康状态,并在实例出现运行错误时收集诊断数据记录到alert.log中。

GSD

         这个进程负责从客户端工具,比如srvctl 接收用户命令,为用户提供管理接口。

上面实例中GSD没有启动,故后台进程中没有。

ora.node1.gsd  application   OFFLINE   OFFLINE  

 

只有当CRS或者GI上需要管理9i的数据库时才需要。gsd进程是给9i racclient提供的图形接口服务,在10g以上版本可以禁用该服务,具体参见[ID 429966.1]

 

The function of GSD (10g and above)is to service requests for 9i RAC management clients and therefore whenthere are no 9i databases present, there is nothing for GSD to do.Consequently, there will be no impact on a RAC cluster if GSD is offlineand 9i is not used.

If gsd fails to start due to whetever reasons then bestthing is to work with Oracle support to analyze and fix the issue. Until thattime, gsd can be temporarily disabled.

In 11.2 GSD is disabled by default and the service willshow as target:offline, status:offline.

         oracle官方文档的描述 Acomponent that receives requests from SRVCTL to execute administrative jobtasks, such as startup or shutdown. The command is executed locally on eachnode, and the results are returned to SRVCTL. GSD is installed on the nodes bydefault.

 

ClusterWare进程

还是以前面的例子为例说明

root      2019    1 TS   19 Sep04 ?        00:01:07 $GRID_HOME/bin/ohasd.binreboot

grid      2342    1 TS   19 Sep04 ?        00:00:54 $GRID_HOME/bin/oraagent.bin

grid      2354    1 TS   19 Sep04 ?        00:00:00 $GRID_HOME/bin/mdnsd.bin

grid      2364    1 TS   19 Sep04 ?        00:00:05 $GRID_HOME/bin/gpnpd.bin

grid      2374    1 TS   19 Sep04 ?        00:00:30 $GRID_HOME/bin/gipcd.bin

root      2390    1 RR  139 Sep04 ?        00:02:31 $GRID_HOME/bin/osysmond.bin

root      2376    1 TS   19 Sep04 ?        00:01:02$GRID_HOME/bin/orarootagent.bin

root      2408    1 RR  139 Sep04 ?        00:00:46 $GRID_HOME/bin/cssdmonitor

root      2421    1 RR  139 Sep04 ?        00:00:38 $GRID_HOME/bin/cssdagent

grid      2433    1 RR  139 Sep04 ?        00:02:54 $GRID_HOME/bin/ocssd.bin

root      2550    1 TS   19 Sep04 ?        00:01:04 $GRID_HOME/bin/octssd.binreboot

grid      2572    1 TS   19 Sep04 ?        00:00:24 $GRID_HOME/bin/evmd.bin

root      2701    1 TS   19 Sep04 ?        00:01:14 $GRID_HOME/bin/crsd.bin reboot

grid      2825    1 TS   19 Sep04 ?        00:00:32 $GRID_HOME/bin/oraagent.bin

root      2833    1 TS   19 Sep04 ?        00:01:27$GRID_HOME/bin/orarootagent.bin

grid      3021    1 TS   19 Sep04 ?        00:00:00/u01/grid/11.2.0/gridhome/opmn/bin/ons -d

grid      3022 3021 TS   19 Sep04 ?        00:00:07 /u01/grid/11.2.0/gridhome/opmn/bin/ons-d

oracle    3045    1 TS   19 Sep04 ?        00:00:58 $GRID_HOME/bin/oraagent.bin

root      3368    1 RR  139 Sep04 ?        00:01:53 $GRfsD_HOME/bin/ologgerd -mnode2 -r -d $GRID_HOME

 

Oracle Root Agent
orarootagent

root      2376    1 TS   19 Sep04 ?        00:01:02$GRID_HOME/bin/orarootagent.bin

root      2833    1 TS   19 Sep04 ?        00:01:27$GRID_HOME/bin/orarootagent.bin

 A specialized oraagent processthat helps crsd manages resources owned by root, such as thenetwork, and the Grid virtual IP address.

The above 2 process are actually threadswhich looks like processes. This is a Linux specific

这个进程有2个,1个是由ohasd生成,用来来管理ora.crsd,ora.ctssd, ora.diskmon, ora.drivers.acfs,另1个是由crsd产生。用来管理GNS,VIP, SCAN VIP and network resources

Oracle Agent

grid     2342     1 TS   19 Sep04 ?        00:00:54 $GRID_HOME/bin/oraagent.bin

grid     2825     1 TS   19 Sep04 ?        00:00:32 $GRID_HOME/bin/oraagent.bin

oracle   3045     1 TS   19 Sep04 ?        00:00:58 $GRID_HOME/bin/oraagent.bin

oraagent

       Extends clusterware to support Oracle-specific requirementsand complex resources. This process runs server callout scripts when FAN eventsoccur. This process was known as RACG in Oracle Clusterware 11g Release 1(11.1).

这个进程有3个。具体作用参见后面章节介绍。

-------------关于进程orarootageent 和oraagent更多说明

Cluster Synchronization Service (CSS)

用于管理与协调集群中各节点的关系,并用于节点间通信,当节点在加入或离开集群时,都由css进行通知集群。

cssdmonitor

Monitors node hangs(via oprocdfunctionality) and monitors OCCSD process hangs (via oclsomon functionality)and monitors vendor clusterware(via vmon functionality).This is the multithreaded process that runs with elavated priority.

Startup sequence: INIT --> init.ohasd--> ohasd --> ohasd.bin --> cssdmonitor

cssdagent

Spawned by OHASD process.Previously(10g)oprocd, responsible for I/O fencing.Killing this process would cause nodereboot.Stops,start checks the status of occsd.bin daemon

Startup sequence: INIT --> init.ohasd--> ohasd --> ohasd.bin --> cssdagent

occsd

如果这个进程出现异常,会导致系统重启。这个进程提供CSS(Cluster Synchronization Serviee)服务。CSS服务通过多种心跳机制,实时监控集群健康状态,提供脑裂保护等基础集群服务功能。

如果节点发生了主机自动重启,需要查看ocssd的日志,位于: <CRS_HOME>/log/<host>/cssd。

 Manages cluster node membership runs asoragrid user.Failure of this process results in node restart.

Startup sequence: INIT --> init.ohasd--> ohasd --> ohasd.bin --> cssdagent --> ocssd --> ocssd.bin

Cluster Time Synchronization Service (CTSS)
octssd

 Provides Time Management in a cluster for Oracle Clusterware.

11g提供的时间同步进程,如果有操作系统ntpd同步时间,则该进程在观察状态,采用ntpd同步时间,如果操作系统同步时间失败。则由CTSS同时时间。

Cluster Ready Services (CRS)
crsd (Cluster Ready ServicesDaemon)

是管理群集内高可用操作的主要程序,在集群中CRS管理所有资源,包括数据库、服务、实例、vip地址、监听器、应用进程等。该进程可以对集群资源进行启动、停止、监视和容错等操作,正常状态下,crsd.bin监控节点各种资源,当某个资源发生异常时,自动重启或者切换该资源。

当发现某些资源异常终止后,首先需要查看crsd的日志:<CRS_HOME>/log/<host>/crsd。

Theprimary Oracle Clusterware process that performs high availability recovery andmanagement operations, such as maintaining OCR. Also manages applicationresources and runs as root user (or by a user in the admin groupon Mac OS X-based systems) and restarts automatically upon failure.

The above process is responsible forstart, stop, monitor and failover of resource. It maintains OCR and alsorestarts the resources when the failure occurs.

This is applicable for RAC systems. ForOracle Restart and ASM ohasd is used

CHM(ClusterHealth Monitor)

CHM用来收集RAC环境操作系统性能数据,可以用此来分析节点驱逐问题。是11.2.0.2的新特性。

11gR2 RAC进程-顺序-日志

进程:osysmond.bin

       The monitoring and operating system metric collection servicethat sends the data to the cluster logger service. This service runs on everynode in a cluster

这个进程在所有节点上运行,负责监控和收集本地操作系统的性能数据,并将本节点其收集到的信息发送给ologgerd进程。

进程:ologgerd

 Receives information from all the nodes in the cluster andpersists in a CHM repository-based database. This service runs on only two nodesin a cluster

这个进程在所有节点上运行,但是属于primary-standby的模式,也就是真正工作的只有运行在master节点的primary,其它节点上的进程作为备用。这个进程接收来自所有节点osysmond收集的信息,并将其存入到Berkeley DB(BDB),在存入以前它会对原始数据进行压缩以节约空间。

其他类别进程
evmd:

事件监控(event monitor)进程,由它来发布集群事件,比如实例启动、停止等事件。

 

ons:

Oracle Notification Service daemon,它用于接收evmd发来的集群事件,然后将这些事件发送给应用预订者或者本地的监听,这样就可以实现FAN(Fast Application Notification),应用能够接收到这些事件并进行处理。

gpnpd

 Provides access to the Grid Plug and Play profile, andcoordinates updates to the profile among the nodes of the cluster to ensure thatall of the nodes have the most recent profile.

 发布构建集群所需要的bootstrap 信息,并且在集群的所有节点之间同步gpnp profile。

   它的日志位于:<GRID_HOME>/log/<host>/gpnpd

gipcd

A support daemon that enables RedundantInterconnect Usage.

这个进程负责管理集群中所有的私网(cluster interconnect)网卡。私网信息是通过gpnpd获得的。

   它的日志位于:<GRID_HOME>/log/<host>/gipcd

mdnsd

 Used by Grid Plug and Play to locate profiles in the cluster,as well as by GNS to perform name resolution. The mDNS process is a backgroundprocess on Linux and UNIX and on Windows.

这个进程通过多播(Multicast)发现集群中的节点和所有的网卡信息。一定要确定集群中的网卡支持多播,而且节点间的通信正常。

   它的日志位于:<GRID_HOME>/log/<host>/mdnsd

Oracle Grid NamingService (GNS)
gnsd.bin

Handles requests sent by external DNSservers, performing name resolution for names defined by the cluster.

(可选,前面实例没有这个进程):Grid Naming Service. 相当于子DNS,功能和DNS类似,会取代使用/etc/hosts进行主机的解析。

   它的日志位于:<GRID_HOME>/log/<host>/gnsd

2. 关于进程orarootageent和oraagent更多说明。

OracleClusterware 11g Release 2 (11.2) introduces a new agent concept which makes theOracle Clusterware more robust and performant. These agents are multi-threadeddaemons which implement entry points for multiple resource types and whichspawn new processes for different users. The agents are highly available andbesides the oraagent, orarootagent and cssdagent/cssdmonitor, there can be anapplication agent and a script agent.

The two mainagents are the oraagent and the orarootagent. Both ohasd and crsd employ oneoraagent and one orarootagent each. If the CRS user is different from theORACLE user, then crsd would utilize two oraagents and one orarootagent.

Oraagent

         Oraagent总共有3个,1个是由ohasd产生,另2个由crsd产生。

ohasd’s oraagent:

Performs start/stop/check/clean actions forora.asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd

crsd’s oraagent:

  crsd的oraagent有两个进程,一个由grid用户的,一个是oracle用户的。

Performsstart/stop/check/clean actions for ora.asm, ora.eons, ora.LISTENER.lsnr, SCANlisteners, ora.ons,Performs start/stop/check/clean actions for service,database and diskgroup resources,Receives eONS events, and translates andforwards them to interested clients (eONS will be removed and its functionalityincluded in EVM in 11.2.0.2),Receives CRS state change events and dequeues RLBevents and enqueues HA events for OCI and ODP.NET clients

orarootagent

ohasd’s orarootagent:

Performs start/stop/check/clean actions forora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs, ora.crf (11.2.0.2)

crsd’s orarootagent:

Performs start/stop/check/clean actions forGNS, VIP, SCAN VIP and network resources

Agent Log Files

OHASD/CRSDagents的日志位置

The log files for the ohasd/crsd agents are located in :

 

Grid_home/log/<hostname>/agent/{ohasd|crsd}/<agentname>_<owner>/<agentname>_<owner>.log. For example, for ora.crsd, which is managed by ohasd and owned byroot,

the agent log file is named :

Grid_home/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log

The same agent log file can have logmessages for more than one resource, if those resources are managed by the samedaemon.

If an agent process crashes,a core filewill be written to

Grid_home/log/<hostname>/agent/{ohasd|crsd}/<agentname>_<owner>,

And a call stack will be written to

Grid_home/log/<hostname>/agent/{ohasd|crsd}/<agentname>_<owner>/<agentname>_<owner>OUT.log

The agent log file format is the following:

<timestamp>:[<component>][<threadid>]… 

<timestamp>:[<component>][<threadid>][<entry point>]…

3.集群中的其他名词

DLM:分布是锁管理

         DistributedLock Management。Oracle中在9之前叫PCM,9开始叫Cache Fusion

OPS:

         Oracle并行服务器,9之前对RAC的叫法,OracleParallel Server

PCM:

         9之前的术语,对应9之后叫cacheFusion,PCM分PCM resource和Non-PCM Resource

Cache Fusion:

         是oracle9之后的DLM,分CacheFusion resource和Non-Cache Fusion Resource

Non-Cache Fusion

         存放的是对象定义、LibraryCache中存放的SQL代码、执行计划等。

GRD:

         GlobalResource Directory,记录每个数据块在集群间的分布图,位于每个实例的SGA中,但每个实例SGA中的都是部分GRD,所有实例的GRD汇总在一起才是一个完整的GRD。

PCM Lock:

         Shadownode的GRD记录的信息。

OHAS:

         HighAvailability Services,OHAS是服务器启动后打开的第一个Grid Infrastructure组件。它被配置为以init(1)打开,并负责生成agent进程

CRSD:

 

集群软件的后台主要进程,使用oracle集群注册信息来管理集群中的资源

4.引用转载参考文献:

RAC进程:

《大话Oracle_RAC__集群_高可用性_备份与恢复》

http://blog.csdn.net/inthirties/article/details/4875535

启动GSD:http://hi.baidu.com/mediinfodba/item/1b2889ab399949f05bf191c8

Clusterware进程:

https://blogs.oracle.com/myoraclediary/entry/clusterware_processes_in_11g_rac

http://reneantunez.blogspot.com/2012/10/rac-11gr2-clusterware-startup-sequence.html

http://www.unitask.com/oracledaily/2013/02/28/oracle-crsgi-%E8%BF%9B%E7%A8%8B%E4%BB%8B%E7%BB%8D/

http://balakumarnair.wordpress.com/2011/01/26/clusterware-startup-sequence-start-cluster-vs-start-crs/

ologgerd与osysmond

http://aprakash.wordpress.com/2011/03/03/ologgerd-daemon-11gr2/

clusterHealth monitor (CHM)

http://www.dbaleet.org/understand_rac_cluster_health_monitor_overview_of_chm/

http://www.dbaleet.org/understand_rac_cluster_health_monitor_usage_of_chm/