1. drbd简介:
Distributed Replicated Block Device(DRBD)是一个用软件实现的、无共享的、服务器之间镜像块设备内容的存储复制解决方案。
数据镜像:实时、透明、同步(所有服务器都成功后返回)、异步(本地服务器成功后返回)
DRBD的核心功能通过Linux的内核实现,最接近系统的IO栈,但它不能神奇地添加上层的功能比如检测到EXT3文件系统的崩溃。 DRBD的位置处于文件系统以下,比文件系统更加靠近操作系统内核及IO栈。
单主模式:典型的高可靠性集群方案。
复主模式:需要采用共享cluster文件系统,如GFS和OCFS2。用于需要从2个节点并发访问数据的场合,需要特别配置。
复制模式:3种模式:
协议A:异步复制协议。本地写成功后立即返回,数据放在发送buffer中,可能丢失。
协议B:内存同步(半同步)复制协议。本地写成功并将数据发送到对方后立即返回,如果双机掉电,数据可能丢失。
协议C:同步复制协议。本地和对方写成功确认后返回。如果双机掉电或磁盘同时损坏,则数据可能丢失。
一般用协议C。选择协议将影响流量,从而影响网络时延。
有效的同步:按线性而不是当初写的顺序同步块。同步损坏时间内的不一致数据。
在线的设备检验:一端顺序计算底层存储,得到一个数字,传给另一端,另一端也计算,如果不一致,则稍后进行同步。建议一周或一月一次。
复制过程的一致性检验:加密后,对*不一致则要求重传。防止网卡、缓冲等问题导致位丢失、覆盖等错误。
Split brain:当网络出现暂时性故障,导致两端都自己提升为Primary。两端再次连通时,可以选择email通知,建议手工处理这种情况。
2 . 案例拓扑图
3.配置corosync+openais
3.1 配置n1和n2的ip地址和host文件
1) 配置n1的地址
2) 重启网络服务
[root@localhost ~]# service network restart
正在关闭接口 eth0: [确定]
关闭环回接口: [确定]
弹出环回接口: [确定]
弹出界面 eth0: [确定]
3)修改主机名为n1.xh.com
[root@localhost ~]# vim /etc/sysconfig/network
4)修改n1的hosts文件
[root@localhost ~]# vim /etc/hosts
5)配置n2的ip地址
6)修改主机名n2.xh.com
[root@localhost ~]# vim /etc/sysconfig/network
7)修改hosts文件
[root@localhost ~]# vim /etc/hosts
3.2 配置n1和n2的无障碍通信
1)生成密钥
[root@n1 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
d0:0d:c4:5e:fe:b8:a9:45:75:68:f9:ad:43:03:84:9a root@n1.xh.com
2)复制公钥文件到n2.xh.com
[root@n1 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@n2.xh.com
15
Now try logging into the machine, with "ssh 'root@n2.xh.com'", and check in: .ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
3)测试复制n1的hosts的文件到n2
[root@n1 ~]# scp /etc/hosts n2.xh.com:/etc/hosts
hosts
4)在n2上面做同样的配置
[root@n2 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
4f:f2:26:4c:b9:14:85:ba:4c:c0:21:2d:a1:d1:a0:b2 root@n2.xh.com
[root@n2 ~]# ssh-copy-id -i .ssh/id_rsa.pub root@n1.xh.com
15
The authenticity of host 'n1.xh.com (192.168.20.133)' can't be established.
RSA key fingerprint is d5:1f:ad:8b:ce:a3:10:0b:97:47:41:ac:7c:92:6e:29.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'n1.xh.com,192.168.20.133' (RSA) to the list of known hosts.
root@n1.xh.com's password:
Now try logging into the machine, with "ssh 'root@n1.xh.com'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
在n1上查看n2的ip配置
[root@n1 ~]# ssh n2.xh.com 'ifconfig'
ssh: n2.xifconfig: Temporary failure in name resolution
[root@n1 ~]# ssh n2.xh.com 'ifconfig'
eth0 Link encap:Ethernet HWaddr 00:0C:29:AD:82:2B
inet addr:192.168.20.134 Bcast:192.168.20.143 Mask:255.255.255.240
inet6 addr: fe80::20c:29ff:fead:822b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2295 errors:0 dropped:0 overruns:0 frame:0
TX packets:1625 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:216705 (211.6 KiB) TX bytes:209670 (204.7 KiB)
Interrupt:67 Base address:0x2000
3.3 在n1,n2上面配置yum
1) 配置n2的yum服务器
[root@n2 ~]# vim /etc/yum.repos.d/rhel-debuginfo.repo
2)配置n1的yum 服务器
[root@n2 ~]# scp -p /etc/yum.repos.d/rhel-debuginfo.repo n1.xh.com:/etc/yum.repos.d/
rhel-debuginfo.repo
3.4 配置corosync服务
1)把所需的软件包复制到n2上
[root@n1 ~]# scp -r corosync n2.xh.com:/root
2)在n1上安装corosnyc和openais
[root@n1 corosync]# yum localinstall cluster-glue-1.0.6-1.6.el5.i386.rpm cluster-glue-libs-1.0.6-1.6.el5.i386.rpm corosync-1.2.7-1.1.el5.i386.rpm corosynclib-1.2.7-1.1.el5.i386.rpm heartbeat-3.0.3-2.3.el5.i386.rpm heartbeat-libs-3.0.3-2.3.el5.i386.rpm libesmtp-1.0.4-5.el5.i386.rpm pacemaker-1.1.5-1.1.el5.i386.rpm pacemaker-libs-1.1.5-1.1.el5.i386.rpm perl-TimeDate-1.16-5.el5.noarch.rpm resource-agents-1.0.4-1.1.el5.i386.rpm --nogpgcheck
[root@n1 corosync]# yum localinstall -y openais*.rpm
3)查看corosync的样例配置文件
[root@n1 corosync]# cd /etc/corosync
[root@n1 corosync]# ll
总计 20
-rw-r--r-- 1 root root 5384 2010-07-28 amf.conf.example
-rw-r--r-- 1 root root 436 2010-07-28 corosync.conf.example
drwxr-xr-x 2 root root 4096 2010-07-28 service.d
drwxr-xr-x 2 root root 4096 2010-07-28 uidgid.d
4)生成corosync的配置文件
[root@n1 corosync]# cd /etc/corosync
[root@n1 corosync]# cp corosync.conf.example corosync.conf
5)编辑corosync的配置文件
[root@n1 corosync]# vim corosync.conf
6)corosync配置文件的说明
compatibility: whitetank //(表示兼容corosync 0.86的版本,向后兼容,兼容老的版本,一些 新的功能可能无法实用)
totem { //(图腾的意思 ,多个节点传递心跳时的相关协议的信息)
version: 2 //版本号
secauth: off //是否代开安全认证
threads: 0 //多少个现成认证 0 无限制
interface {
ringnumber: 0
bindnetaddr: 192 168.1.1 //通过哪个网络地址进行通讯.
mcastaddr: 226.94.1.1
mcastport: 5405
}
}
logging {
fileline: off
to_stderr: no //是否发送标准出错
to_logfile: yes // 日志
to_syslog: yes //系统日志 (建议关掉一个),会降低性能
logfile: /var/log/cluster/corosync.log //(手动创建目录)
debug: off //排除错误时可以起来
timestamp: on //日志中是否记录时间
以下是openais的东西,可以不用代开
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
前面只是底层的东西,因为要用pacemaker需要下面的选项
service {
ver: 0
name: pacemaker
}
虽然用不到openais ,但是会用到一些子选项
aisexec {
user: root
group: root
}
7)创建日志目录
[root@n1 corosync]# mkdir /var/log/cluster
8)在n2上面安装所需要的软件包
[root@n2 corosync]# yum localinstall -y *.rpm �Cnogpgcheck
9)为了便面其他主机加入该集群,需要认证,生成一个authkey
[root@n1 corosync]# corosync-keygen
[root@n1 corosync]# scp -p authkey corosync.conf n2.xh.com:/etc/corosync/
authkey 100% 128 0.1KB/s 00:00
corosync.conf 100% 541 0.5KB/s 00:00
10)在另外的一个节点上创建日志目录
[root@n1 corosync]# ssh n2.xh.com 'mkdir /var/log/cluster'
11)在node1上启动服务
[root@n1 corosync]# service corosync start
Starting Corosync Cluster Engine (corosync): [确定]
12)(验证corosync引擎是否正常启动了
[root@n1 corosync]# grep -i -e "corosync cluster engine" -e "configuration file" /var/log/messages
Oct 2 20:03:42 localhost smartd[3014]: Opened configuration file /etc/smartd.conf
Oct 2 20:03:43 localhost smartd[3014]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Oct 3 00:10:34 localhost corosync[4592]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Oct 3 00:10:34 localhost corosync[4592]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
13)查看初始化成员节点通知是否发出
[root@n1 corosync]# grep -i totem /var/log/messages
Oct 3 00:10:34 localhost corosync[4592]: [TOTEM ] Initializing transport (UDP/IP).
Oct 3 00:10:34 localhost corosync[4592]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 3 00:10:35 localhost corosync[4592]: [TOTEM ] The network interface [192.168.20.133] is now up.
Oct 3 00:10:37 localhost corosync[4592]: [TOTEM ] Process pause detected for 1196 ms, flushing membership messages.
Oct 3 00:10:37 localhost corosync[4592]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
14)检查过程中是否有错误产生
[root@n1 corosync]# grep -i error: /var/log/messages |grep -v unpack_resources
15)检查pacemaker时候已经启动了
[root@n1 corosync]# grep -i pcmk_startup /var/log/messages
Oct 3 00:10:35 localhost corosync[4592]: [pcmk ] info: pcmk_startup: CRM: Initialized
Oct 3 00:10:35 localhost corosync[4592]: [pcmk ] Logging: Initialized pcmk_startup
Oct 3 00:10:35 localhost corosync[4592]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Oct 3 00:10:35 localhost corosync[4592]: [pcmk ] info: pcmk_startup: Service: 9
Oct 3 00:10:36 localhost corosync[4592]: [pcmk ] info: pcmk_startup: Local hostname: n1.xh.com
16)节点n1上启动n2节点
[root@n1 corosync]# ssh n2.xh.com '/etc/init.d/corosync start'
Starting Corosync Cluster Engine (corosync): [确定]
17)将前面的验证步骤在n2节点上再次验证一次
18) 在任何一个节点上 查看集群的成员状态
[root@n2 corosync]# crm status
============
Last updated: Wed Oct 3 00:39:35 2012
Stack: openais
Current DC: n1.xh.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ n1.xh.com n2.xh.com ]
3.5 提供高可用服务
1)在corosync中,定义服务可以用两种接口
图形接口 (使用hb―gui)
crm (pacemaker 提供一个shell)
比如查看两个节点的日期是否同步
[root@n1 corosync]# ssh n2.xh.com 'date'
2012年 10月 03日 星期三 00:42:24 CST
[root@n1 corosync]# date
2012年 10月 03日 星期三 00:42:35 CST
[root@n1 corosync]# crm
crm(live)# configure
crm(live)configure# show
node n1.xh.com
node n2.xh.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2"
2)用xml格式显示
crm(live)configure# show xml
<?xml version="1.0" ?>
<cib admin_epoch="0" crm_feature_set="3.0.5" dc-uuid="n1.xh.com" epoch="5" have-quorum="1" num_updates="18" validate-with="pacemaker-1.2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="n2.xh.com" type="normal" uname="n2.xh.com"/>
<node id="n1.xh.com" type="normal" uname="n1.xh.com"/>
</nodes>
<resources/>
<constraints/>
</configuration>
</cib>
3)验证该文件的语法错误
[root@n1 corosync]# crm_verify -L
crm_verify[4750]: 2012/10/03_01:02:18 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[4750]: 2012/10/03_01:02:18 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[4750]: 2012/10/03_01:02:18 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
4)可以看到有stonith错误,在高可用的环境里面,会禁止实用任何支援 可以禁用stonith
[root@n1 corosync]# crm
crm(live)# configure
crm(live)configure# property stonith-enabled=false
crm(live)configure# commit
5)查看选项是否添加进去
crm(live)configure# show
node n1.xh.com
node n2.xh.com
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
6)再次进行语法检查
[root@node1 corosync]# crm_verify -L
系统上有专门的stonith命令
stonith -L 显示stonith所指示的类型
crm 可以使用交互式模式
可以执行help 查看帮助
集群的资源类型有4种
primitive 本地主资源 (只能运行在一个节点上)
group 把多个资源轨道一个组里面,便于管理
clone 需要在多个节点上同时启用的 (如ocfs2 ,stonith ,没有主次之分)
master 有主次之分,如drbd
使用list可以查看
crm(live)ra# classes
heartbeat
lsb
ocf / heartbeat pacemaker
stonith
7)可以实用list lsb 查看资源代理的脚本
crm(live)# ra
crm(live)ra# classes
heartbeat
lsb
ocf / heartbeat pacemaker
stonith
crm(live)ra# list lsb
8)查看ocf的heartbeat
crm(live)ra# list ocf heartbeat
9)配置一个资源,可以在configuration 下面进行配置
primitive webIP ocf:heartbeat:IPaddr params ip=192.168.2.137
crm(live)ra# end
crm(live)# configure
crm(live)configure# primitive webIP ocf:heartbeat:IPaddr params ip=192.168.2.137
查看刚才的配置
crm(live)configure# show
node n1.xh.com
node n2.xh.com
primitive webIP ocf:heartbeat:IPaddr \
params ip="192.168.2.137"
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
crm(live)configure# commit
crm(live)configure# status
ERROR: syntax: status
crm(live)configure# end
crm(live)# status
============
Last updated: Wed Oct 3 01:23:05 2012
Stack: openais
Current DC: n1.xh.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ n1.xh.com n2.xh.com ]
webip (ocf::heartbeat:IPaddr): Started n1.xh.com
10)在n1上得到所定义的资源
[root@n1 corosync]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:70:3F:F7
inet addr:192.168.20.133 Bcast:192.168.20.143 Mask:255.255.255.240
inet6 addr: fe80::20c:29ff:fe70:3ff7/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:50797 errors:0 dropped:0 overruns:0 frame:0
TX packets:65597 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:18088499 (17.2 MiB) TX bytes:21418409 (20.4 MiB)
Interrupt:67 Base address:0x2000
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:70:3F:F7
inet addr:192.168.20.137 Bcast:192.168.20.143 Mask:255.255.255.240
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:67 Base address:0x2000
11)在两个节点上都要安装httpd
[root@n1 html]# yum install httpd -y
12)安装完毕后,可以查看httpd的lsb脚本
crm(live)ra# meta lsb:httpd
lsb:httpd
Apache is a World Wide Web server. It is used to serve \
HTML files and CGI.
Operations' defaults (advisory minimum):
start timeout=15
stop timeout=15
status timeout=15
restart timeout=15
force-reload timeout=15
monitor interval=15 timeout=15 start-delay=15
12)定义httpd的资源
crm(live)ra# end
crm(live)# configure
crm(live)configure# primitive webserver lsb:httpd
13)查看定义的httpd资源
crm(live)configure# show
node n1.xh.com
node n2.xh.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.20.137"
primitive webserver lsb:httpd
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
crm(live)configure# end
There are changes pending. Do you want to commit them? yes
crm(live)# status
============
Last updated: Wed Oct 3 03:13:27 2012
Stack: openais
Current DC: n2.xh.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ n1.xh.com n2.xh.com ]
webip (ocf::heartbeat:IPaddr): Started n1.xh.com
webserver (lsb:httpd): Started n2.xh.com
发现httpd已经启动了,但是在node2节点上 (高级群集服务资源越来越多,会分布在不同的节点上,以尽量负载均衡)
需要把资源约束在同一个节点上可以把它们加入一个组
13)定义组并把 webip和webserver加入组
crm(live)configure# group web webip webserver
crm(live)configure# show
node n1.xh.com
node n2.xh.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.20.137"
primitive webserver lsb:httpd
group web webip webserver
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false"
crm(live)# status
============
Last updated: Wed Oct 3 03:20:46 2012
Stack: openais
Current DC: n2.xh.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ n1.xh.com n2.xh.com ]
Resource Group: web
webip (ocf::heartbeat:IPaddr): Started n1.xh.com
webserver (lsb:httpd): Started n1.xh.com
14)测试, 在节点1 和节点2 上分别创建网页当群集ip 发现在第一个节点上时
[root@n1 html]# service corosync stop
Signaling Corosync Cluster Engine (corosync) to terminate: [确定]
Waiting for corosync services to unload:....... [确定]
[root@node2 corosync]# crm status
============
Last updated: Mon May 7 20:16:58 2012
Stack: openais
Current DC: node2.a.com - partition WITHOUT quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node2.a.com ]
OFFLINE: [ node1.a.com ]
但是httpd服务没有启动是因为n2没有票数
关闭 quorum
可选的参数有如下 ignore (忽略)
freeze (冻结,表示已经启用的资源继续实用,没有启用的资源不能启用)
stop(默认)
suicide (所有的资源杀掉)
将节点1 的corosync 服务启动起来
改变quorum
crm(live)configure# property no-quorum-policy=ignore
cimmit
crm(live)# show (再次查看quorum 的属性)
node n1.xh.com
node n2.xh.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.20.137"
primitive webserver lsb:httpd
group web webip webserver
property $id="cib-bootstrap-options" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
15)停止n1的corosync服务
[root@n1 html]# service corosync stop
Signaling Corosync Cluster Engine (corosync) to terminate: [确定]
Waiting for corosync services to unload:.......... [确定]
这时已经看到资源已经转移到n2上面
[root@n2 html]# crm status
============
Last updated: Wed Oct 3 03:41:01 2012
Stack: openais
Current DC: n2.xh.com - partition WITHOUT quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ n2.xh.com ]
OFFLINE: [ n1.xh.com ]
Resource Group: web
webip (ocf::heartbeat:IPaddr): Started n2.xh.com
webserver (lsb:httpd): Started n2.xh.com
4. 配置drbd实现数据的一致性
1)在两个节点上建相应的分区,大小一致如下所示
在n1上
Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 65 522081 83 Linux
/dev/sda2 66 1370 10482412+ 83 Linux
/dev/sda3 1371 1631 2096482+ 82 Linux swap / Solaris
/dev/sda4 1632 2610 7863817+ 5 Extended
/dev/sda5 1632 1754 987966 83 Linux
在n2上
Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 65 522081 83 Linux
/dev/sda2 66 1370 10482412+ 83 Linux
/dev/sda3 1371 1631 2096482+ 82 Linux swap / Solaris
/dev/sda4 1632 2610 7863817+ 5 Extended
/dev/sda5 1632 1754 987966 83 Linux
2)在两个节点使系统重新识别分区
[root@n1 ~]# partprobe /dev/sda
[root@n2 ~]# partprobe /dev/sda
3)在两个节点安装drbd软件包
[root@n1 ~]# yum localinstall drbd83-8.3.8-1.el5.centos.i386.rpm -y �Cnogpgcheck
[root@n1 ~]# yum localinstall kmod-drbd83-8.3.8-1.el5.centos.i686.rpm -y �Cnogpgcheck
4)生成drbd的配置文件
[root@n1 drbd83-8.3.8]# cp drbd.conf /etc/
cp:是否覆盖“/etc/drbd.conf”? y
5)备份并编辑global_common.conf文件
[root@n1 drbd.d]# cp global_common.conf global_common.conf.bak
global {
usage-count yes; //改成no
# minor-count dialog-refresh disable-ip-verification
}
把
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
}
改成
startup {
wfc-timeout 120;
degr-wfc-timeout 120;
}
把
disk {
# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
# no-disk-drain no-md-flushes max-bio-bvecs
}
改成
disk {
on-io-error detach;
fencing resource-only;
}
把
net {
# sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
# max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
# after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
}
改成
net {
cram-hmac-alg "sha1";
shared-secret "mydrbdlab";
}
把
syncer {
# rate after al-extents use-rle cpu-mask verify-alg csums-alg
}
改成
syncer {
rate 100M;
}
修改完的结果如下
[root@node1 drbd.d]# vim global_common.conf
global {
usage-count yes;
# minor-count dialog-refresh disable-ip-verification
}
common {
protocol C;
startup {
wfc-timeout 120;
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
fencing resource-only;
}
net {
cram-hmac-alg "sha1";
shared-secret "mydrbdlab";
}
syncer {
rate 100M;
}
}
6)创建资源文件其内容如下
[root@n1 drbd.d]# vim web.res
resource web {
on n1.xh.com {
device /dev/drbd0;
disk /dev/sda5;
address 192.168.20.133:7789;
meta-disk internal;
}
on n2.xh.com {
device /dev/drbd0;
disk /dev/sda5;
address 192.168.20.134:7789;
meta-disk internal;
}
}
7)为确保配置文件的一致性把配置文件复制n2上
[root@n1 drbd.d]# scp global_common.conf n2.xh.com:/etc/drbd.d
[root@n1 drbd.d]# scp web.res n2.xh.com:/etc/drbd.d
[root@n1 drbd.d]# drbdadm create-md web
初始化资源,在n1和n2上分别执行:drbdadm create-md web
[root@n1 drbd.d]# service drbd start
[root@n1 drbd.d]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:987896
都为second 状态,没有同步
也可以使用 drbd-overview
把n1设为主在n1上执行(只在第一次时使用此命令)
[root@n1 drbd.d]# drbdadm -- --overwrite-data-of-peer primary web
[root@n1 drbd.d]# watch -n 1 'cat /proc/drbd'
8)格式化drbd0
[root@n1 drbd.d]# mkfs -t ext3 -L drbdweb /dev/drbd0
9) 测试挂载drbd0
[root@n1 drbd.d]# mkdir /web
[root@n1 drbd.d]# mount /dev/drbd0 /web/
10) 创建index.html文件
[root@n1 web]# vim index.html
11)把n1变成从的,node2 变成主的
[root@n1 ~]# umount /web
[root@n1 ~]# drbdadm secondary web
[root@n1 ~]# drbdadm role web
Secondary/Secondary
在node2 节点上
[root@n2 ~]# mkdir /web
[root@n2 ~]# drbdadm primary web
[root@n2 ~]# drbd-overview
0:web Connected Primary/Secondary UpToDate/UpToDate C r----
[root@n2 ~]# mount /dev/drbd0 /web
[root@n2 ~]# drbdadm role web
Primary/Secondary
12) 此时能看见index.html数据已经同步
[root@n2 ~]# cd /web
[root@n2 web]# ll
总计 20
-rw-r--r-- 1 root root 17 10-03 06:53 index.html
drwx------ 2 root root 16384 10-03 05:51 lost+found
13)卸载web
[root@n2 web]# cd
[root@n2 ~]# umount /web
[root@n2 ~]# drbdadm secondary web
关闭drbd服务,并确保两节点上的drbd服务不会开机自动启动
service drbd stop
ssh node2 �C 'service drbd stop'
chkconfig drbd off
ssh node2 -- 'chkconfig drbd off'
5. 配置drbd服务为高可用
5.1 将已经配置好的drbd设备/dev/drbd0定义为集群服务;
1)配置drbd为集群资源:
drbd需要同时运行在两个节点上,但只能有一个节点(primary/secondary模型)是Master,而另一个节点为Slave;因此,它是一种比较特殊的集群资源,其资源类型为多状态(Multi-state)clone类型,即主机节点有Master和Slave之分,且要求服务刚启动时两个节点都处于slave状态。
[root@n1 ~]# crm
crm(live)# configure
crm(live)configure# primitive drbd ocf:heartbeat:drbd params drbd_resource=web op monitor role=Master interval=30s timeout=40s op monitor role=Slave interval=50s timeout=40s
crm(live)configure# master MS_Webdrbd drbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
crm(live)configure# verify
crm(live)configure# commit
此时查看资源的运行状态
2)为Primary节点上的web资源创建自动挂载的集群服务
MS_Webdrbd的Master节点即为drbd服务web资源的Primary节点,此节点的设备/dev/drbd0可以挂载使用,且在某集群服务的应用当中也需要能够实现自动挂载。假设我们这里的web资源是为Web服务器集群提供网页文件的共享文件系统,将其挂载至/web目录。
因为此自动挂载的集群资源需要运行于drbd服务的Master节点上,并且只能在drbd服务将某节点设置为Primary以后方可启动,所以还需要为这两个资源建立排列约束和顺序约束。
# crm
crm(live)# configure
crm(live)configure# primitive fs ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/web" fstype="ext3"
crm(live)configure# colocation drbd_fs inf: fs MS_Webdrbd:Master
crm(live)configure# order after_fs inf: MS_Webdrbd:promote fs:start
crm(live)configure# verify
crm(live)configure# commit
查看资源运行状态
3)在n1的/web目录创建一个网页文件,而后在故障故障转移后查看n2的/web目录下是否存在这些文件。
[root@n1 ~]# vim /web/index.html
4) 在两个节点做基于主机名的虚拟主机
[root@n1 ~]# vim /etc/httpd/conf/httpd.conf
[root@n2 ~]# vim /etc/httpd/conf/httpd.conf
5)在两个节点重启httpd服务之后再关闭httpd服务后访问网页
6)模拟n1节点故障,看此些资源可否正确转移至n2。
n1:
crm node standby
由于我们在/etc/drbd.d/global_common.conf配置文件中开启了资源隔离和脑列处理机制,所以在crm的配置文件cib中将会自动出现一个位置约束配置,当主节点宕机之后,禁止从节点变为主节点,以免当主节点恢复的时候产生脑列,进行资源争用,但是我们此时只是为了验证资源能够流转,所以将这个位置约束删除:
crm(live)configure# edit
可以看到我们内有定义的一个以location开头的位置约束,打开这个配置文件,将这一行删除即可,保存退出,提交设置:
crm(live)configure# commit
然后在n2上进行状态查看
此时访问网页
7)最后把组web和drbd的primary资源约束在一起。加入下图标记哪一行。
#crmcrm configure edit
8)配置完毕。