drbd+corosync+pacemaker实现mysql的高可用性“上”

时间:2021-09-04 03:20:02

1、实验说明:

node1主机名:node1.abc.com,地址192.168.1.10;node2主机名:node2.abc.com,地址192.168.1.20;vip地址192.168.1.50;操作系统linux企业版5.4;内核版本号:2.6.18-164.el5 实验用到的软件包: 由于drbd内核模块代码只在linux内核2.6.3.33以后的版本中才有,所以我们要同时安装内核模块和管理工具: drbd83-8.3.8-1.el5.centos.i386.rpm    #drbd的管理包 kmod-drbd83-8.3.8-1.el5.centos.i686.rpm    #drbd的内核模块   cluster-glue-1.0.6-1.6.el5.i386.rpm       #在群集中增加对更多节点的支持 cluster-glue-libs-1.0.6-1.6.el5.i386.rpm #库文件 corosync-1.2.7-1.1.el5.i386.rpm corosync #的主配置文件 corosynclib-1.2.7-1.1.el5.i386.rpm        #corosync的库文件 heartbeat-3.0.3-2.3.el5.i386.rpm          #做heartbeat四层的资源代理 heartbeat-libs-3.0.3-2.3.el5.i386.rpm    #heartbeat的库文件 ldirectord-1.0.1-1.el5.i386.rpm    #在高可用性群集中实验对后面realserver的探测 libesmtp-1.0.4-5.el5.i386.rpm openais-1.1.3-1.6.el5.i386.rpm   #做丰富pacemake的内容 openaislib-1.1.3-1.6.el5.i386.rpm   #openais 的库文件 pacemaker-1.1.5-1.1.el5.i386.rpm   # pacemake的主配置文档 pacemaker-libs-1.1.5-1.1.el5.i386.rpm  #pacemaker的库文件 pacemaker-cts-1.1.5-1.1.el5.i386.rpm perl-TimeDate-1.16-5.el5.noarch.rpm resource-agents-1.0.4-1.1.el5.i386.rpm  # 开启资源代理 mysql-5.5.15-linux2.6-i686.tar.gz      # mysql的绿色安装包 资源的下载地址 http://down.51cto.com/data/402802   2、实验步骤:  1、同步时间: node1 [root@node1 ~]# hwclock –s  node2 [root@node2 ~]# hwclock –s   2、修改hosts文件: [root@node1 ~]# vim /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1               localhost.localdomain localhost ::1             localhost6.localdomain6 localhost6 192.168.1.10    node1.abc.com   node1 192.168.1.20    node2.abc.com   node2   将hosts文件拷贝都node2中: [root@node1 ~]# scp /etc/hosts 192.168.1.20:/etc/hosts The authenticity of host '192.168.1.20 (192.168.1.20)' can't be established. RSA key fingerprint is d4:f1:06:3b:a0:81:fd:85:65:20:9e:a1:ee:46:a6:8b. Are you sure you want to continue connecting (yes/no)? yes #覆盖原hosts文件 Warning: Permanently added '192.168.1.20' (RSA) to the list of known hosts. root@192.168.1.20's password:   #输入node2的管理员密码 hosts           3、在两个节点上生成密钥,实现无密码的方式通讯: node1 [root@node1 ~]# ssh-keygen   -t rsa   #产生一个非对称加密的私钥对 enerating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa):   #默认,直接回车 Enter passphrase (empty for no passphrase):   #默认,直接回车 Enter same passphrase again:   #默认,直接回车 Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 61:b1:4a:c8:88:19:31:5d:cb:8f:91:0c:fe:38:bd:c3 root@node1.abc.com   [root@node1 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@node2.abc.com #拷贝到node2上 15 The authenticity of host 'node2.abc.com (192.168.1.20)' can't be established. RSA key fingerprint is d4:f1:06:3b:a0:81:fd:85:65:20:9e:a1:ee:46:a6:8b. Are you sure you want to continue connecting (yes/no)? yes #输入yes Warning: Permanently added 'node2.abc.com' (RSA) to the list of known hosts. root@node2.abc.com's password:   #node2的管理员密码 node2   [root@node2 ~]# ssh-keygen   -t rsa   #产生一个非对称加密的私钥对 Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa):   #默认,直接回车 Enter passphrase (empty for no passphrase):   #默认,直接回车 Enter same passphrase again:   #默认,直接回车 Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 3f:0b:27:14:8a:ba:b1:c6:4d:02:2b:22:86:a3:46:0a root@node2.abc.com [root@node2 ~]# ssh-copy-id -i .ssh/id_rsa.pub   root@node1.abc.com #拷贝到node1节点上 15 The authenticity of host 'node1.abc.com (192.168.1.10)' can't be established. RSA key fingerprint is d4:f1:06:3b:a0:81:fd:85:65:20:9e:a1:ee:46:a6:8b. Are you sure you want to continue connecting (yes/no)? yes   #输入yes Warning: Permanently added 'node1.abc.com,192.168.1.10' (RSA) to the list of known hosts. root@node1.abc.com's password: #node1的管理员密码 至此我们可以再两个节点之间实现无密码的通信了。 4、编辑yum客户端 node1 [root@node1 ~]# mkdir /mnt/cdrom [root@node1 ~]# mount /dev/cdrom /mnt/cdrom/ [root@node1 ~]# vim /etc/yum.repos.d/rhel-debuginfo.repo [rhel-server] name=Red Hat Enterprise Linux server baseurl=file:///mnt/cdrom/Server enabled=1 gpgcheck=1 gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release [rhel-cluster] name=Red Hat Enterprise Linux cluster baseurl=file:///mnt/cdrom/Cluster enabled=1 gpgcheck=1 gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release [rhel-clusterstorage] name=Red Hat Enterprise Linux clusterstorage baseurl=file:///mnt/cdrom/ClusterStorage enabled=1 gpgcheck=1 gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release [rhel-vt] name=Red Hat Enterprise Linux vt baseurl=file:///mnt/cdrom/VT enabled=1 gpgcheck=1 gpgkey=file:///mnt/cdrom/RPM-GPG-KEY-redhat-release   将yum客户端文件拷贝到node2的/etc/ yum.repos.d/ 目录下 [root@node1 ~]# scp /etc/yum.repos.d/rhel-debuginfo.repo node2.abc.com:/etc/yum.repos.d/   node2 [root@node2 ~]# mkdir /mnt/cdrom [root@node2 ~]# mount /dev/cdrom /mnt/cdrom/   5、将下载好的rpm包上传到linux上的各个节点 安装drbd: node1 [root@node1 ~]# yum localinstall drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm -y –nogpgcheck node2: [root@node2 ~]# yum localinstall drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm -y --nogpgcheck 6、在各节点上增加一个大小类型都相关的drbd设备(sda4):   node1   [root@node1 ~]# fdisk /dev/sda The number of cylinders for this disk is set to 2610. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs    (e.g., DOS FDISK, OS/2 FDISK)   Command (m for help): n   #增加一分区 Command action    e   extended    p   primary partition (1-4) p    #主分区 Selected partition 4 First cylinder (1580-2610, default 1580):    #默认值,回车 Using default value 1580 Last cylinder or +size or +sizeM or +sizeK (1580-2610, default 2610): +1G #大小为1G   Command (m for help): w #保存并推出 The partition table has been altered!   Calling ioctl() to re-read partition table.   WARNING: Re-reading the partition table failed with error 16: Device or resource busy. The kernel still uses the old table. The new table will be used at the next reboot. Syncing disks.   [root@node1 ~]# partprobe /dev/sda   #重新加载内核模块 [root@node1 ~]# cat /proc/partitions major minor #blocks name      8     0   20971520 sda    8     1     104391 sda1    8     2   10482412 sda2    8     3    2096482 sda3    8     4     987997 sda4 node2   [root@node2 ~]# fdisk /dev/sda   The number of cylinders for this disk is set to 2610. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs    (e.g., DOS FDISK, OS/2 FDISK)   Command (m for help): n n   #增加一分区 Command action    e   extended    p   primary partition (1-4) p #主分区 Selected partition 4 First cylinder (1580-2610, default 1580):  #默认值,回车 Using default value 1580 Last cylinder or +size or +sizeM or +sizeK (1580-2610, default 2610): +1G  #大小为1G   Command (m for help): w #保存并推出 The partition table has been altered!   Calling ioctl() to re-read partition table.   WARNING: Re-reading the partition table failed with error 16: Device or resource busy. The kernel still uses the old table. The new table will be used at the next reboot. Syncing disks. [root@node2 ~]# partprobe /dev/sda   #重新加载内核模块 [root@node2 ~]# cat /proc/partitions major minor #blocks name      8     0   20971520 sda    8     1     104391 sda1    8     2   10482412 sda2    8     3    2096482 sda3    8     4     987997 sda4   7、配置drbd: node1 复制配置文件drbd.conf [root@node1 ~]# cp /usr/share/doc/drbd83-8.3.8/drbd.conf   /etc/ 备份global_common.conf [root@node1 ~]# cd /etc/drbd.d/ [root@node1 drbd.d]# cp global_common.conf global_common.conf.bak 编辑global_common.conf [root@node1 drbd.d]# vim global_common.conf   global {         usage-count no;    #不开启统计 }   common {         protocol C;           handlers {                 pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";                 pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";                 local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";         }           startup {                  wfc-timeout 120;    #等待连接的超时时间                  degr-wfc-timeout 100;  #等待降级的节点连接的超时时间         }           disk {                  on-io-error detach;   #当出现I/O错误,节点要拆掉drbd设备         }           net {                 cram-hmac-alg "sha1";  #使用sha1加密算法实现节点认证                 shared-secret "mydrbdlab"; #认证码,两个节点内容要相同         } syncer {                 rate 100M;   #定义同步数据时的速率         } }   定义mysql的资源: [root@node1 drbd.d]# vim mysql.res resource mysql {                on node1.abc.com {                                     device /dev/drbd0;                            disk /dev/sda4;                                address 192.168.1.10:7898;                     meta-disk internal;                    }         on node2.abc.com {                 device /dev/drbd0;                 disk /dev/sda4;                 address 192.168.1.20:7898;                 meta-disk internal;         } }   8、将global_common.conf、mysql.res、drbd.conf拷贝到node2中: [root@node1 drbd.d]# scp global_common.conf node2.abc.com:/etc/drbd.d/ [root@node1 drbd.d]# scp mysql.res node2.abc.com:/etc/drbd.d/ [root@node1 drbd.d]# scp /etc/drbd.conf node2.abc.com:/etc/   9、分别在node1和node2上初始化定义的mysql的资源并启动相应的服务 node1 [root@node1 drbd.d]# drbdadm create-md mysql Writing meta data... initializing activity log NOT initialized bitmap New drbd meta data block successfully created. [root@node1 drbd.d]# service drbd start   node2 [root@node2 drbd.d]# drbdadm create-md mysql Writing meta data... initializing activity log NOT initialized bitmap New drbd meta data block successfully created. [root@node2 drbd.d]# service drbd start   查看drbd的状态:   node1 [root@node1 drbd.d]# drbd-overview  0:mysql Connected Secondary/Secondary Inconsistent/Inconsistent C r---- node2 [root@node2 drbd.d]# drbd-overview  0:mysql Connected Secondary/Secondary Inconsistent/Inconsistent C r----   由此可知此时两个节点均为Secondary状态。   现将node1设为主节点,在node1上执行: node1 [root@node1 drbd.d]# drbdadm -- --overwrite-data-of-peer primary mysql   再次查看: node1 [root@node1 drbd.d]# drbd-overview  0:mysql SyncSource Primary/Secondary UpToDate/Inconsistent C r----    [========>...........] sync'ed: 45.9% (538072/987928)K delay_probe: 40   node2 [root@node2 drbd.d]# drbd-overview  0:mysql Connected Secondary/Primary UpToDate/UpToDate C r---- 此时node1为主节点,node2为备份节点。   10、格式化、创建挂载点(只在主节点,即在node1上执行):   [root@node1 ~]# mkfs -t ext3 /dev/drbd0 [root@node1 ~]# mkdir /mysqldata [root@node1 ~]# mount /dev/drbd0 /mysqldata/ [root@node1 ~]# cd /mysqldata [root@node1 mysql]# touch node1   #创建名为node1的文件 [root@node1 mysql]# ll total 16 drwx------ 2 root root 16384 Mar 14 19:10 lost+found -rw-r--r-- 1 root root     0 Mar 14 19:19 node1   11、卸载drbd设备 node1 [root@node1 mysql]# cd [root@node1 ~]# umount /mysqldata 将node1设置为secondary节点: [root@node1 ~]# drbdadm secondary mysql [root@node1 ~]# drbd-overview  0:mysql Connected Secondary/Secondary UpToDate/UpToDate C r----   12、将node2设置为primary节点 node2 [root@node2 drbd.d]# cd [root@node2 ~]# drbdadm primary mysql [root@node2 ~]# drbd-overview  0:mysql Connected Primary/Secondary UpToDate/UpToDate C r---- [root@node2 ~]# mkdir /mysqldata [root@node2 ~]# mount /dev/drbd0 /mysqldata [root@node2 ~]# cd /mysqldata [root@node2 mysql]# ll total 16 drwx------ 2 root root 16384 Mar 14 19:10 lost+found -rw-r--r-- 1 root root     0 Mar 14 19:19 node1 卸载: [root@node2 mysql]# cd [root@node2 ~]# umount /mnt/mysql/ 至此/dev/drbd0已同步,drbd已经正常安装完成。   13、安装并配置mysql: node1 添加用户和组: [root@node1 ~]# groupadd -r mysql [root@node1 ~]# useradd -g mysql -r mysql 由于主设备才能读写,挂载,故我们还要设置node1为主设备,node2为从设备:   node2 [root@node2 ~]# drbdadm secondary mysql node1 [root@node1 ~]# drbdadm primary mysql   挂载drbd设备: [root@node1 ~]# mount /dev/drbd0 /mysqldata [root@node1 ~]# mkdir /mysqldata/data data目录要用存放mysql的数据,故改变其属主属组: [root@node1 ~]# chown -R mysql.mysql /mysqldata/data/   安装mysql: [root@node1 ~]# tar -zxvf mysql-5.5.15-linux2.6-i686.tar.gz -C /usr/local/ [root@node1 ~]# cd /usr/local/ [root@node1 local]# ln -sv mysql-5.5.15-linux2.6-i686 mysql [root@node1 local]# cd mysql [root@node1 mysql]# chown -R mysql:mysql #修改当前目录写文件的权限 初始化mysql数据库: [root@node1 mysql]# scripts/mysql_install_db --user=mysql --datadir=/mysqldata/data [root@node1 mysql]# chown -R root . 为mysql提供主配置文件: [root@node1 mysql]# cp support-files/my-large.cnf /etc/my.cnf 编辑my.cnf: [root@node1 mysql]# vim /etc/my.cnf 39行  thread_concurrency = 2 添加如下行指定mysql数据文件的存放位置: datadir = /mysqldata/data   为mysql提供sysv服务脚本,使其能使用service命令: [root@node1 mysql]# cp support-files/mysql.server /etc/rc.d/init.d/mysqld   node2上的配置文件,sysv服务脚本和此相同,故直接复制过去: [root@node1 mysql]# scp /etc/my.cnf node2.abc.com:/etc/ [root@node1 mysql]# scp /etc/rc.d/init.d/mysqld node2.abc.com:/etc/rc.d/init.d/   添加至服务列表: [root@node1 mysql]# chkconfig --add mysqld 确保开机不能自动启动,我们要用CRM控制: [root@node1 mysql]# chkconfig mysqld off 启动服务: [root@node1 mysql]# service mysqld start   测试之后关闭服务:   查看data是否有文件 [root@node1 mysql]# ll /mnt/mysql/data/ total 29756 -rw-rw---- 1 mysql mysql 5242880 Mar 14 20:17 ib_logfile0 -rw-rw---- 1 mysql mysql 5242880 Mar 14 20:17 ib_logfile1 -rw-rw---- 1 mysql mysql 18874368 Mar 14 20:17 ibdata1 drwx------ 2 mysql root      4096 Mar 14 19:52 mysql -rw-rw---- 1 mysql mysql    27017 Mar 14 20:16 mysql-bin.000001 -rw-rw---- 1 mysql mysql   996460 Mar 14 20:16 mysql-bin.000002 -rw-rw---- 1 mysql mysql      107 Mar 14 20:17 mysql-bin.000003 -rw-rw---- 1 mysql mysql       57 Mar 14 20:17 mysql-bin.index -rw-rw---- 1 mysql root      1699 Mar 14 20:17 node1.abc.com.err -rw-rw---- 1 mysql mysql        5 Mar 14 20:17 node1.abc.com.pid drwx------ 2 mysql mysql     4096 Mar 14 20:16 performance_schema drwx------ 2 mysql root      4096 Mar 14 19:51 test   [root@node1 mysql]# service mysqld stop 为了使用mysql的安装符合系统使用规范,并将其开发组件导出给系统使用,这里还需要进行如下步骤: 输出mysql的man手册至man命令的查找路径: [root@node1 mysql]# vim /etc/man.config 48行  MANPATH /usr/local/mysql/man   输出mysql的头文件至系统头文件路径/usr/include,这可以通过简单的创建链接实现: [root@node1 mysql]# ln -sv /usr/local/mysql/include/ /usr/include/mysql   输出mysql的库文件给系统库查找路径:(文件只要是在/etc/ld.so.conf.d/下并且后缀是.conf就可以) [root@node1 mysql]# echo '/usr/local/mysql/lib/' > /etc/ld.so.conf.d/mysql.conf   而后让系统重新载入系统库: [root@node1 mysql]# ldconfig   修改PATH环境变量,让系统所有用户可以直接使用mysql的相关命令: [root@node1 mysql]# vim /etc/profile PATH=$PATH:/usr/local/mysql/bin   重新读取环境变量 [root@node1 mysql]# . /etc/profile [root@node1 mysql]# echo $PATH /usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/mysql/bin   卸载drbd设备: [root@node1 mysql]# umount /mysqldata     node2 添加用户和组: [root@node2 ~]# groupadd -r mysql           [root@node2 ~]# useradd -g mysql -r mysql 由于主设备才能读写,挂载,故我们还要设置node2为主设备,node1为从设备: node1上操作: [root@node1 mysql]# drbdadm secondary mysql   node2上操作: [root@node2 ~]# drbdadm primary mysql   挂载drbd设备: [root@node2 ~]# mount /dev/drbd0 /mysqldata   查看: [root@node2 ~]# ll /mnt/mysql/data/ total 29752 -rw-rw---- 1 mysql mysql 5242880 Mar 14 20:20 ib_logfile0 -rw-rw---- 1 mysql mysql 5242880 Mar 14 20:17 ib_logfile1 -rw-rw---- 1 mysql mysql 18874368 Mar 14 20:20 ibdata1 drwx------ 2 mysql root      4096 Mar 14 19:52 mysql -rw-rw---- 1 mysql mysql    27017 Mar 14 20:16 mysql-bin.000001 -rw-rw---- 1 mysql mysql   996460 Mar 14 20:16 mysql-bin.000002 -rw-rw---- 1 mysql mysql      126 Mar 14 20:20 mysql-bin.000003 -rw-rw---- 1 mysql mysql       57 Mar 14 20:17 mysql-bin.index -rw-rw---- 1 mysql root      2116 Mar 14 20:20 node1.abc.com.err drwx------ 2 mysql mysql     4096 Mar 14 20:16 performance_schema drwx------ 2 mysql root      4096 Mar 14 19:51 test   安装mysql: [root@node2 ~]# tar -zxvf mysql-5.5.15-linux2.6-i686.tar.gz -C /usr/local/ [root@node2 ~]# cd /usr/local/ [root@node2 local]# ln -sv mysql-5.5.15-linux2.6-i686 mysql [root@node2 local]# cd mysql   一定不能对数据库进行初始化,因为我们在node1上已经初始化了: [root@node2 mysql]# chown -R root:mysql .   mysql主配置文件和sysc服务脚本已经从node1复制过来了,不用在添加。 添加至服务列表: [root@node2 mysql]# chkconfig --add mysqld   确保开机不能自动启动,我们要用CRM控制: [root@node2 mysql]# chkconfig mysqld off 而后就可以启动服务测试使用了:(确保node1的mysql服务停止) [root@node2 mysql]# service mysqld start 测试之后关闭服务: 查看其中是否有文件 [root@node2 mysql]# ll /mysqldata/data/ total 29764 -rw-rw---- 1 mysql mysql 5242880 Mar 14 20:48 ib_logfile0 -rw-rw---- 1 mysql mysql 5242880 Mar 14 20:17 ib_logfile1 -rw-rw---- 1 mysql mysql 18874368 Mar 14 20:20 ibdata1 drwx------ 2 mysql root      4096 Mar 14 19:52 mysql -rw-rw---- 1 mysql mysql    27017 Mar 14 20:16 mysql-bin.000001 -rw-rw---- 1 mysql mysql   996460 Mar 14 20:16 mysql-bin.000002 -rw-rw---- 1 mysql mysql      126 Mar 14 20:20 mysql-bin.000003 -rw-rw---- 1 mysql mysql      107 Mar 14 20:48 mysql-bin.000004 -rw-rw---- 1 mysql mysql       76 Mar 14 20:48 mysql-bin.index -rw-rw---- 1 mysql root      2116 Mar 14 20:20 node1.abc.com.err -rw-rw---- 1 mysql root       937 Mar 14 20:48 node2.abc.com.err -rw-rw---- 1 mysql mysql        5 Mar 14 20:48 node2.abc.com.pid drwx------ 2 mysql mysql     4096 Mar 14 20:16 performance_schema drwx------ 2 mysql root      4096 Mar 14 19:51 test   [root@node2 mysql]# service mysqld stop   为了使用mysql的安装符合系统使用规范,并将其开发组件导出给系统使用,这里还需要进行与node1上相同的操作,这里不再阐述。 卸载设备: [root@node2 mysql]# umount /dev/drbd0     14、corosync+pacemaker的安装和配置 node1 [root@node1 ~]# yum localinstall *.rpm -y –nogpgcheck 此处不需要安装ldirectord node:2 [root@node2 ~]# yum localinstall *.rpm -y –nogpgcheck 此处不需要安装ldirectord   对各个节点进行相应的配置:   node1 [root@node1 ~]# cd /etc/corosync/ [root@node1 corosync]# cp corosync.conf.example corosync.conf [root@node1 corosync]# vim corosync.conf   # Please read the corosync.conf.5 manual page compatibility: whitetank   totem {         version: 2         secauth: off         threads: 0         interface {                 ringnumber: 0                 bindnetaddr: 192.168.1.0        #只需改动这里                 mcastaddr: 226.94.1.1                 mcastport: 5405         } }   logging {         fileline: off         to_stderr: no   #是否发送标准出错         to_logfile: yes         to_syslog: yes    #系统日志 (建议关掉一个),会降低性能         logfile: /var/log/cluster/corosync.log #需要手动创建目录cluster         debug: off #排除时可以启动         timestamp: on   #日志中是否记录时间 以下是openais的东西         logger_subsys {                 subsys: AMF                 debug: off         } } amf {         mode: disabled         } 补充一些东西,前面只是底层的东西,因为要用pacemaker service {         ver: 0         name: pacemaker use_mgmtd: yes         } 虽然用不到openais ,但是会用到一些子选项 aisexec {         user: root         group: root         }   创建cluster目录 [root@node1 corosync]# mkdir /var/log/cluster 为了便面其他主机加入该集群,需要认证,生成一authkey   [root@node1 corosync]# corosync-keygen Corosync Cluster Engine Authentication key generator. Gathering 1024 bits for key from /dev/random. Press keys on your keyboard to generate entropy. Writing corosync key to /etc/corosync/authkey. [root@node1 corosync]# ll total 28 -rw-r--r-- 1 root root 5384 Jul 28 2010 amf.conf.example -r-------- 1 root root 128 Mar 14 21:13 authkey -rw-r--r-- 1 root root 563 Mar 14 21:08 corosync.conf -rw-r--r-- 1 root root 436 Jul 28 2010 corosync.conf.example drwxr-xr-x 2 root root 4096 Jul 28 2010 service.d drwxr-xr-x 2 root root 4096 Jul 28 2010 uidgid.d [root@node1 corosync]# ssh node2.abc.com 'mkdir /var/log/cluster'   将node1节点上的文件拷贝到节点node2上面(记住要带-p) [root@node1 corosync]# scp -p authkey corosync.conf node2.abc.com:/etc/corosync/   在node1和node2节点上面启动 corosync 的服务 node1 [root@node1 corosync]# service corosync start node2 [root@node2 corosync]# service corosync start   验证corosync引擎是否正常启动了: node1(注意此时node1是备份节点): [root@node1 corosync]# grep -i -e "corosync cluster engine" -e "configuration file" /var/log/messages Mar 14 21:16:54 node1 corosync[6081]:  [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service. Mar 14 21:16:54 node1 corosync[6081]:   [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. node2 [root@node2 corosync]# grep -i -e "corosync cluster engine" -e "configuration file" /var/log/messages Mar 14 21:17:03 node2 corosync[5876]:   [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service. Mar 14 21:17:03 node2 corosync[5876]:   [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. Mar 14 21:17:03 node2 corosync[5876]:   [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397. Mar 14 21:17:53 node2 corosync[5913]:   [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service. Mar 14 21:17:53 node2 corosync[5913]:   [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. Mar 14 21:17:53 node2 corosync[5913]:   [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397. Mar 14 21:19:53 node2 corosync[5978]:   [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service. Mar 14 21:19:53 node2 corosync[5978]:   [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.   查看初始化成员节点通知是否发出: node1 [root@node1 corosync]# grep -i totem /var/log/messages Apr 3 14:13:16 node1 corosync[387]:   [TOTEM ] Initializing transport (UDP/IP). Apr 3 14:13:16 node1 corosync[387]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Apr 3 14:13:16 node1 corosync[387]:   [TOTEM ] The network interface [192.168.1.10] is now up. Apr 3 14:13:17 node1 corosync[387]:   [TOTEM ] Process pause detected for 565 ms, flushing membership messages. Apr 3 14:13:17 node1 corosync[387]:   [TOTEM ] A processor joined or left the membership and a new membership was formed. Apr 3 14:13:19 node1 corosync[387]:   [TOTEM ] A processor joined or left the membership and a new membership was formed. node2 [root@node2 ~]# grep -i totem /var/log/messages Apr 3 14:13:19 node2 corosync[32438]:   [TOTEM ] Initializing transport (UDP/IP). Apr 3 14:13:19 node2 corosync[32438]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Apr  3 14:13:19 node2 corosync[32438]:   [TOTEM ] The network interface [192.168.1.20] is now up. Apr 3 14:13:21 node2 corosync[32438]:   [TOTEM ] A processor joined or left the membership and a new membership was formed. 检查过程中是否有错误产生: [root@node1 ~]# grep -i error: /var/log/messages |grep -v unpack_resources [root@node2 ~]# grep -i error: /var/log/messages |grep -v unpack_resources   检查pacemaker时候已经启动了: node1 [root@node1 ~]# grep -i pcmk_startup /var/log/messages Mar 14 21:16:55 node1 corosync[6081]:   [pcmk ] info: pcmk_startup: CRM: Initialized Mar 14 21:16:55 node1 corosync[6081]:   [pcmk ] Logging: Initialized pcmk_startup Mar 14 21:16:55 node1 corosync[6081]:   [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295 Mar 14 21:16:55 node1 corosync[6081]:   [pcmk ] info: pcmk_startup: Service: 9 Mar 14 21:16:55 node1 corosync[6081]:   [pcmk ] info: pcmk_startup: Local hostname: node1.abc.com Mar 14 22:13:15 node1 corosync[3179]:   [pcmk ] info: pcmk_startup: CRM: Initialized Mar 14 22:13:15 node1 corosync[3179]:   [pcmk ] Logging: Initialized pcmk_startup Mar 14 22:13:15 node1 corosync[3179]:   [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295 Mar 14 22:13:15 node1 corosync[3179]:   [pcmk ] info: pcmk_startup: Service: 9 Mar 14 22:13:15 node1 corosync[3179]:   [pcmk ] info: pcmk_startup: Local hostname: node1.abc.com   node2 [root@node2 ~]# grep -i pcmk_startup /var/log/messages Mar 14 21:19:55 node2 corosync[5978]:   [pcmk ] info: pcmk_startup: CRM: Initialized Mar 14 21:19:55 node2 corosync[5978]:   [pcmk ] Logging: Initialized pcmk_startup Mar 14 21:19:55 node2 corosync[5978]:   [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295 Mar 14 21:19:55 node2 corosync[5978]:   [pcmk ] info: pcmk_startup: Service: 9 Mar 14 21:19:55 node2 corosync[5978]:   [pcmk ] info: pcmk_startup: Local hostname: node2.abc.com Mar 14 22:13:20 node2 corosync[3174]:   [pcmk ] info: pcmk_startup: CRM: Initialized Mar 14 22:13:20 node2 corosync[3174]:   [pcmk ] Logging: Initialized pcmk_startup Mar 14 22:13:20 node2 corosync[3174]:   [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295 Mar 14 22:13:20 node2 corosync[3174]:   [pcmk ] info: pcmk_startup: Service: 9 Mar 14 22:13:21 node2 corosync[3174]:   [pcmk ] info: pcmk_startup: Local hostname: node2.abc.com   在node2(主节点)上查看群集的状态   [root@node2 corosync]# crm status ============ Last updated: Tue Apr 3 15:26:56 2012 Stack: openais Current DC: node1.abc.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 0 Resources configured. ============ Online: [ node1.abc.com node2.abc.com ]   15、配置群集的工作属性 corosync默认启用了stonith,而当前集群并没有相应的stonith设备,因此此默认配置目前尚不可用,这可以通过如下命令先禁用stonith: node1 [root@node1 ~]# crm configure property stonith-enabled=false node2 [root@node2 ~]# crm configure property stonith-enabled=false   对于双节点的集群来说,我们要配置此选项来忽略quorum,即这时候票数不起作用,一个节点也能正常运行: node1 [root@node1 ~]# crm configure property no-quorum-policy=ignore node2 [root@node2 ~]# crm configure property no-quorum-policy=ignore   定义资源的粘性值,使资源不能再节点之间随意的切换,因为这样是非常浪费系统的资源的。 资源黏性值范围及其作用: 0:这是默认选项。资源放置在系统中的最适合位置。这意味着当负载能力“较好”或较差的节点变得可用时才转移资源。此选项的作用基本等同于自动故障回复,只是资源可能会转移到非之前活动的节点上; 大于0:资源更愿意留在当前位置,但是如果有更合适的节点可用时会移动。值越高表示资源越愿意留在当前位置; 小于0:资源更愿意移离当前位置。绝对值越高表示资源越愿意离开当前位置; INFINITY:如果不是因节点不适合运行资源(节点关机、节点待机、达到migration-threshold 或配置更改)而强制资源转移,资源总是留在当前位置。此选项的作用几乎等同于完全禁用自动故障回复; -INFINITY:资源总是移离当前位置;   我们这里可以通过以下方式为资源指定默认黏性值: node1 [root@node1 ~]# crm configure rsc_defaults resource-stickiness=100 node2 [root@node2 ~]# crm configure rsc_defaults resource-stickiness=100   因字数限制,所以分上下两篇,请看“下”

本文出自 “朱超博” 博客,请务必保留此出处http://zhuchaobo.blog.51cto.com/4393935/885664