Mysql双主复制搭建和基于keepalived的故障转移

时间:2022-09-15 07:47:13

1.前言

    前面我们介绍了mysql的主从复制(http://blog.csdn.net/wjf870128/article/details/45149573)、mysql5.6新出的GTID复制技术(http://blog.csdn.net/wjf870128/article/details/45151669)及mysql的半同步复制技术(http://blog.csdn.net/wjf870128/article/details/45170089)。

    当前越来越多的互联网行业使用Mysql的主从复制功能实现读写分离,来降低对数据库端的业务负载。通过在master上进行写操作;slave节点上进行只读操作来提高数据库端的响应速度。如下图:

Mysql双主复制搭建和基于keepalived的故障转移

但是如果是基于M-S架构的话,如果master端出现故障,我们的只读业务同样会受到影响,因此呢,我们可以通过构建M-M-S架构,来实现对master节点单点故障的排除,同时保证了slave端能够接收到复制信息。架构图如下:

Mysql双主复制搭建和基于keepalived的故障转移

    Mysql5.6中实现了GTID的复制技术,通过change master to MASTER_AUTO_POSITION = 1,数据库服务器可以自动的从master中根据transaction id抓取需要的日志进行前滚恢复,我们可以利用这一新特性,可以实现master1故障恢复时的自动同步功能。


2.基于GTID的双主搭建

mysql的安装、GTID和半同步复制的原理和搭建过程前文已经介绍过,这里就省略了Mysql双主复制搭建和基于keepalived的故障转移

2.1 环境

OS:Redhat 6.5 32位

DB:5.6.23-enterprise-commercial-advanced-log 

Master1:

  ip:192.168.163.8

  hostname:mysql-rep01

Master2:

  ip:192.168.163.9

  hostname:mysql-rep02

2.2 Master1节点配置/etc/my.cnf

<span style="font-size:12px;">explicit_defaults_for_timestamp 

log_bin = mysql-bin
binlog_format = row
gtid_mode=ON
log-slave-updates
enforce-gtid-consistency

rpl_semi_sync_master_enabled=1
rpl_semi_sync_slave_enabled=1

auto_increment_increment=2
auto_increment_offset=1

server_id = 1
datadir = /var/lib/mysql
port = 3306
socket=/var/lib/mysql/mysql.sock

relay-log=mysql-rep01-relay-bin
relay-log-recovery=1
relay-log-purge=1
relay-log-info-repository=table
master-info-repository=table

2.3 Master2节点配置/etc/my.cnf

<span style="font-size:12px;">explicit_defaults_for_timestamp

log_bin = mysql-bin
server_id = 2
datadir = /var/lib/mysql
port = 3306
socket=/var/lib/mysql/mysql.sock
binlog_format = row
gtid_mode=ON
log-slave-updates
enforce-gtid-consistency

rpl_semi_sync_master_enabled=1
rpl_semi_sync_slave_enabled=1

auto_increment_increment=2
auto_increment_offset=2

relay-log=mysql-rep02-relay-bin
relay-log-recovery=1
relay-log-purge=1
relay-log-info-repository=table
master-info-repository=table

2.4 创建复制用户

master1
mysql> CREATE USER 'repl'@'mysql-rep02' IDENTIFIED BY 'slavepass';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'mysql-rep02';

master2
mysql> CREATE USER 'repl'@'mysql-rep01' IDENTIFIED BY 'slavepass';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'mysql-rep01';

2.5 节点复制配置

master1
mysql> CHANGE MASTER TO
-> MASTER_HOST='mysql-rep02',
-> MASTER_USER='repl',
-> MASTER_PASSWORD='slavepass',
-> MASTER_AUTO_POSITION=1,
-> MASTER_CONNECT_RETRY=45,
-> MASTER_RETRY_COUNT=5,
-> MASTER_HEARTBEAT_PERIOD=30;

master2
mysql> CHANGE MASTER TO
-> MASTER_HOST='mysql-rep01',
-> MASTER_USER='repl',
-> MASTER_PASSWORD='slavepass',
-> MASTER_AUTO_POSITION=1,
-> MASTER_CONNECT_RETRY=45,
-> MASTER_RETRY_COUNT=5,
-> MASTER_HEARTBEAT_PERIOD=30;

2.6 启动slave

在两个节点中执行:
mysql> start slave;

2.7查看节点状态

查看节点master1中的状态
mysql> show  slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: mysql-rep02
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000020
Read_Master_Log_Pos: 231
Relay_Log_File: mysql-rep01-relay-bin.000019
Relay_Log_Pos: 401
Relay_Master_Log_File: mysql-bin.000020
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 231
Relay_Log_Space: 1492
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: f18073c6-de8d-11e4-aa0e-0050563260e0
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 3
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: f18073c6-de8d-11e4-aa0e-0050563260e0:44-49
Executed_Gtid_Set: 485d870d-ddf8-11e4-af9e-000c2957f0bc:1-68,
f18073c6-de8d-11e4-aa0e-0050563260e0:1-49
Auto_Position: 1

mysql> show processlist;
+----+-----------------+-------------------+------+------------------+------+-----------------------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-----------------+-------------------+------+------------------+------+-----------------------------------------------------------------------------+------------------+
| 3 | root | localhost | NULL | Query | 0 | init | show processlist |
| 20 | event_scheduler | localhost | NULL | Daemon | 4367 | Waiting on empty queue | NULL |
| 27 | repl | mysql-rep02:47007 | NULL | Binlog Dump GTID | 4271 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL |
| 30 | system user | | NULL | Connect | 4195 | Waiting for master to send event | NULL |
| 31 | system user | | NULL | Connect | 4194 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL |
+----+-----------------+-------------------+------+------------------+------+-----------------------------------------------------------------------------+------------------+

查看节点master2的状态
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: mysql-rep01
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000032
Read_Master_Log_Pos: 612
Relay_Log_File: mysql-rep02-relay-bin.000048
Relay_Log_Pos: 408
Relay_Master_Log_File: mysql-bin.000032
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 612
Relay_Log_Space: 587
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 485d870d-ddf8-11e4-af9e-000c2957f0bc
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 3
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set: 485d870d-ddf8-11e4-af9e-000c2957f0bc:1-68,
f18073c6-de8d-11e4-aa0e-0050563260e0:1-49
Auto_Position: 1
1 row in set (0.00 sec)

mysql> show processlist;
+----+-------------+-------------------+------+------------------+------+-----------------------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-------------------+------+------------------+------+-----------------------------------------------------------------------------+------------------+
| 1 | system user | | NULL | Connect | 4340 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL |
| 2 | system user | | NULL | Connect | 4341 | Waiting for master to send event | NULL |
| 3 | root | localhost | NULL | Query | 0 | init | show processlist |
| 7 | repl | mysql-rep01:43897 | NULL | Binlog Dump GTID | 4264 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL |
+----+-------------+-------------------+------+------------------+------+-----------------------------------------------------------------------------+------------------+

2.8 验证双主复制

一般而言,架构为M-M-S中,master2是不参与相关的写操作的,防止相关冲突的发生。这里,我们通过设置
auto_increment_increment
auto_increment_offset

来防止auto_increment字段产生的主键冲突问题。但是仍然不能避免同一行的锁冲突问题,因此生产环境当中通常将Master2当作只读服务器。查看双主复制验证如下:

master1:

mysql> use test;
mysql> create table test_ic(id int unsigned not null auto_increment,var varchar(10) not null,primary key(id));
Query OK, 0 rows affected (0.10 sec)
mysql> insert into test_ic(var) values('1a');
Query OK, 1 row affected (0.05 sec)

mysql> select * from test_ic;
+----+-----+
| id | var |
+----+-----+
| 1 | 1a |
+----+-----+
1 row in set (0.00 sec)
master2:

mysql> use test;
Database changed
mysql> select * from test_ic;
+----+-----+
| id | var |
+----+-----+
| 1 | 1a |
+----+-----+
1 row in set (0.00 sec)

mysql> insert into test_ic(var) values('2b');
Query OK, 1 row affected (0.05 sec)

mysql> select * from test_ic;
+----+-----+
| id | var |
+----+-----+
| 1 | 1a |
| 2 | 2b |
+----+-----+
2 rows in set (0.00 sec)
master1:

mysql> select * from test_ic;
+----+-----+
| id | var |
+----+-----+
| 1 | 1a |
| 2 | 2b |
+----+-----+
2 rows in set (0.00 sec)
可见master1和master2中同时写入没有问题,双主复制配置成功。

由于GTID的特性,当一个节点宕机重新连接的时候会主动同步来自另一节点的事务达到一致。

3.KeepAlived实现故障转移


3.1 KeepAlived作用

        KeepAlived在此的作用是通过自定义的脚本来检测Mysql服务器的状态,如果一台Mysql服务器宕机或者出现故障,KeepAlived通过脚本检测到,将有故障的机器从集群中剔除;当Mysql服务器恢复正常的时候再加入到集群当中。


3.2 配置环境

OS:Redhat 6.5 32位

Soft:keepalived-1.2.16.tar.gz

VIP:192.168.163.10

Master1:

  ip:192.168.163.8

  hostname:mysql-rep01

Master2:

  ip:192.168.163.9

  hostname:mysql-rep02

3.3 安装步骤

master1和master2节点上同时安装:
[root@mysql-rep01 keepalived-1.2.16]# yum -y install gcc openssl-devel openssl popt popt-devel 
[root@mysql-rep01 keepalived]# tar zxf keepalived-1.2.16.tar.gz
[root@mysql-rep01 keepalived]# cd keepalived-1.2.16
[root@mysql-rep01 keepalived-1.2.16]# ./configure
[root@mysql-rep01 keepalived-1.2.16]# make&&make install
[root@mysql-rep01 init.d]# cp /usr/local/etc/rc.d/init.d/keepalived /etc/rc.d/init.d/[root@mysql-rep01 init.d]# cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/[root@mysql-rep01 init.d]# mkdir /etc/keepalived[root@mysql-rep01 init.d]# cp /usr/local/etc/keepalived/keepalived.conf /etc/keepalived/[root@mysql-rep01 init.d]# cp /usr/local/sbin/keepalived /usr/sbin[root@mysql-rep01 init.d]# chkconfig --add keepalived[root@mysql-rep01 init.d]# chkconfig --level 345 keepalived on[root@mysql-rep01 init.d]# chkconfig --list|grep -i keepalivedkeepalived      0:off   1:off   2:off   3:on    4:on    5:on    6:off

3.4 配置步骤

KeepAlived启动的时候默认使用/etc/keepalived/keepalived.conf作为启动文件,这里我用是贺春旸老师在<<Mysql管理之道>>一书中用到的配置和脚本(书中变量因为是基于5.5的可能有错误,下面的是我改正后的5.6版本的脚本)。如下:

master1:/etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
router_id KeepAlive_MySQL
}
vrrp_script check_run {
script "/home/sh/mysql_check.sh"
interval 30
}

vrrp_sync_group VG1 {
group {
VI_1
}
}

vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
check_run
}

notify_master "/home/sh/master.sh"
notify_backup "/home/sh/backup.sh"
notify_stop "/home/sh/stop.sh"

virtual_ipaddress {
192.168.163.10
}
}

master2:
! Configuration File for keepalived

global_defs {
router_id KeepAlive_MySQL
}
vrrp_script check_run {
script "/home/sh/mysql_check.sh"
interval 30
}

vrrp_sync_group VG1 {
group {
VI_1
}
}

vrrp_instance VI_1 {
state BACKUP
interface eth1
virtual_router_id 51
priority 90
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
check_run
}

notify_master "/home/sh/master.sh"
notify_backup "/home/sh/backup.sh
notify_stop "/home/sh/stop.sh

virtual_ipaddress {
192.168.163.10
}
}

注:
        其中notify_master是变为master执行的脚本;notify_backup是状态变为backup执行的脚本;notify_stop是VRRP停止后执行的脚本;notify_fault是状态变为fault后执行的脚本,notify 任意状态改变后执行的脚本。
       为防止master1宕机恢复正常后于master2进行vip的争执接管,这里我们使用的conf文件中状态为backup,同时指定nopreempt参数来防止二次切换的发生。
/home/sh/mysql_check.sh 是监控脚本
#!/bin/bash
. /root/.bash_profile
count=1
while true
do
mysql -uroot -ppassword -e "show status;">/dev/null 2>&1
i=$?
ps aux|grep mysqld|grep -v grep>/dev/null 2>&1
j=$?
if [ $i = 0 ]&&[ $j = 0 ]
then
exit 0
else
if [ $i = 1 ]&&[ $j = 0 ]
then
exit 0
else
if [ $count -gt 5 ]
then
break
fi
let count++
continue
fi
fi
done

/home/sh/master.sh 
#!/bin/bash
. /root/.bash_profile
Master_Log_File=$(mysql -uroot -ppassword -e "show slave status\G"|grep -w Master_Log_File|awk -F": " '{print $2}')
Relay_Master_Log_File=$(mysql -uroot -ppassword -e "show slave status\G"|grep -w Relay_Master_Log_File|awk -F": " '{print $2}')
Read_Master_Log_Pos=$(mysql -uroot -ppassword -e "show slave status\G"|grep -w Read_Master_Log_Pos|awk -F": " '{print $2}')
Exec_Master_Log_Pos=$(mysql -uroot -ppassword -e "show slave status\G"|grep -w Exec_Master_Log_Pos|awk -F": " '{print $2}')

i=1
while true
do
if [ $Master_Log_File = $Relay_Master_Log_File ]&&[ $Read_Master_Log_Pos -eq $Exec_Master_Log_Pos ]
then
echo "ok"
break
else
sleep 1
if [ $i -gt 60 ]
then
break
fi
continue
let i++
fi
done

mysql -uroot -ppassword -e "stop slave;"
mysql -uroot -ppassword -e "set global innodb_support_xa=1;"
mysql -uroot -ppassword -e "set global sync_binlog=0;"
mysql -uroot -ppassword -e "set global innodb_flush_log_at_trx_commit=2;"
mysql -uroot -ppassword -e "set global event_scheduler=1;"
mysql -uroot -ppassword -e "flush logs;"
mysql -uroot -ppassword -e "show master status;" > /tmp/master_status_$(date "+%y%m%d-%H%M").txt

/home/sh/backup.sh 
#!/bin/bash
. /root/.bash_profile
mysql -uroot -ppassword -e "set global innodb_support_xa=1;"
mysql -uroot -ppassword -e "set global sync_binlog=0;"
mysql -uroot -ppassword -e "set global innodb_flush_log_at_trx_commit=2;"
mysql -uroot -ppassword -e "set global event_scheduler=0;"

/home/sh/stop.sh
#!/bin/bash
. /root/.bash_profile
mysql -uroot -ppassword -e "set global innodb_support_xa=1;"
mysql -uroot -ppassword -e "set global sync_binlog=1;"
mysql -uroot -ppassword -e "set global innodb_flush_log_at_trx_commit=1;"

M_File1=$(mysql -uroot -ppassword -e "show master status\G"|awk -F': ' '/File/{print $2}')
M_Positon1=$(mysql -uroot -ppassword -e "show master status\G"|awk -F': ' '/Position/{print $2}')
sleep 1
M_File2=$(mysql -uroot -ppassword -e "show master status\G"|awk -F': ' '/File/{print $2}')
M_Positon2=$(mysql -uroot -ppassword -e "show master status\G"|awk -F': ' '/Position/{print $2}')

i=1
while true
do
if [ $M_File1 = $M_File2 ]&&[ $M_Positon1 -eq $M_Positon2 ]
then
echo "ok"
break
else
sleep 1
if [ $i -gt 60 ]
then
break
fi
continue
let i++
fi
done

3.5 启动KeepAlived

[root@mysql-rep01 sh]# service keepalived start
Starting keepalived: [ OK ]
[root@mysql-rep01 sh]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
link/ether 00:0c:29:57:f0:bc brd ff:ff:ff:ff:ff:ff
inet 192.168.163.8/24 brd 192.168.163.255 scope global eth0
<span style="color:#ff6666;">inet 192.168.163.10/32 scope global eth0</span>
inet6 fe80::20c:29ff:fe57:f0bc/64 scope link
valid_lft forever preferred_lft forever
此时KeepAlived就配置完成了。当master1出现故障的时候,master2上的keepalived会自动接管VIP。