(MHA+MYSQL-5.7增强半同步)高可用架构设计与实现

时间:2024-11-07 11:34:32
       架构使用mysql5.7版本基于GTD增强半同步并行复制配置 reploication 一主两从,使用MHA套件管理整个复制架构,实现故障自动切换高可用
       优势:
          1、增强半同步设置 AFTER_SYNC 提高数据安全性,主从一致性,
          2、mha 特性提高故障后主从数据一致性,自动切换并重新配置主从、切换后不影响业务正常写入
 
 
一、MHA介绍:
    1、简介
 ( Master High Availability)是一款开源的MySQL的高可用per脚本开发的程序套件,它为MySQL主从复制架构提供了automating master failover 功能。MHA在监控到master节点故障时,会提升其中拥有最新数据的slave节点成为新的master节点,在此期间,MHA会通过与其它从节点获取额外信息来避免一致性方面的问题。MHA还提供了master节点的在线切换功能,即按需切换master/slave节点。
相较于其它HA软件,MHA的目的在于维持MySQL Replication中Master库的高可用性,其最大特点是可以修复多个Slave之间的差异日志,最终使所有Slave保持数据一致,然后从中选择一个充当新的Master,并将其它Slave指向它。
    2、角色功能:
MHA 服务有两种角色,MHA Manager(管理节点)和MHA Node(数据节点):
MHA Manager:通常单独部署在一*立的机器上或者直接部署在其中一台slave上(不建议后者),可以管理多个master/slave集群:
(1)master自动切换及故障转移命令运行
(2)其他的帮助脚本运行:手动切换master;master/slave状态检测
MHA node:运行在每台MySQL服务器上(master/slave/manager),它通过监控具备解析和清理logs功能的脚本来加快故障转移。其作用有:
(1)复制主节点的binlog数据
(2)对比从节点的中继日志文件
(3)无需停止从节点的SQL线程,定时删除中继日志
  注意:MHA集群环境下需要时删除relaylog,因为关闭了mysql的 自动刷新功能 relay-log-purge = 0 可以通过,
    (1)动态开启全局参数设置
         SET GLOBAL relay_log_purge=1; FLUSH LOGS; SET GLOBAL relay_log_purge=0;
    (2)删除relaylog 文件 
      (3)  使用MHA 脚本 purge_relay_logs --help
    3、架构图
            (MHA+MYSQL-5.7增强半同步)高可用架构设计与实现(MHA+MYSQL-5.7增强半同步)高可用架构设计与实现
(1)从宕机崩溃的master保存二进制日志事件(binlog events);
(2)识别含有最新更新的slave;
(3)应用差异的中继日志(relay log)到其他的slave;
(4)应用从master保存的二进制日志事件(binlog events);
(5)提升一个slave为新的master;
(6)使其他的slave连接新的master进行复制;
环境准备
主机
角色
服务
端口
mha-node
172.16.40.201
slave
mysql-5.7.25(PerconaServer)
7066
node
172.16.40.202
slave(master-b)
mysql-5.7.25(PerconaServer)
7066
manager/node
172.16.40.203
master
mysql-5.7.25(PerconaServer)
7066
node
二、在3台服务器上安装mysql 服务
        1、版本选择:Percona-Server-5.7.25-28-Linux.x86_64.ssl101.tar.gz
             Mysql 安装:略
        2、 配置主从 
  【172.16.40.203(master)】:
            mysql> grant replication slave on *.* to 'repl'@'172.16.40.%' identified by 'replpasswod';
            mysql>flush privileges;   
  【172.16.40.202(slave(master-b)】:
            mysql> CHANGE MASTER TO MASTER_HOST='172.16.40.203',MASTER_PORT=7066,MASTER_USER='repl',MASTER_PASSWORD='replpasswod',MASTER_AUTO_POSITION=194;
            mysql>start slave;
            mysql>show slave status\G;
  【172.16.40.201(slave)】:
            mysql> CHANGE MASTER TO MASTER_HOST='172.16.40.203',MASTER_PORT=7066,MASTER_USER='repl',MASTER_PASSWORD='replpasswod',MASTER_AUTO_POSITION=194;
            mysql>start slave;
            mysql>show slave status\G;
        3、开启Mysql 半同步复制
# mysql -S /tmp/7066.sock -p
mysql> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
mysql> SET GLOBAL rpl_semi_sync_master_enabled=1;
mysql> SET GLOBAL rpl_semi_sync_slave_enabled=1;
# 重启mysql
# /etc/init.d/mysqld-7066 restart
#确认是否开启半同步
mysql> show global variables like '%semi%';或 show global status like '%semi%';
三、搭建MHA
1、下载MHA套件
2、配置服务器间免密登陆
    ( 注意:Manager 要是装到某一台MySQL上,则需要自己和自己无密码登入,单独到一台服务器则不需要)
【172.16.40.202(manager)】:
    # ssh-keygen -t rsa
    # cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
    #  ssh-copy-id root@172.16.40.203
    #  ssh-copy-id root@172.16.40.201
【172.16.40.203(node)】:
    #  ssh-keygen -t rsa
    #  ssh-copy-id root@172.16.40.202
    #  ssh-copy-id root@172.16.40.201
【172.16.40.201(node)】:
    #  ssh-keygen -t rsa
    #  ssh-copy-id root@172.16.40.202
    #  ssh-copy-id root@172.16.40.203
3、安装MHA
    (注意:如果manager 没有安装在独立的服务器上则每个节点都需要安装node)
     (1)上传 mha4mysql-node-0.58.tar.gz 包到所有服务器并安装
          # 安装 需要perl,perl-DBD-MySQL,perl-devel 依赖,yum安装即可,yum install DBD-MySQL
        # tar zxvf mha4mysql-node-0.58.tar.gz
        # cd mha4mysql-node-0.58
        # perl Makefile.PL
        # make&&make install
    (2)上传 mha4mysql-manager-0.58.tar.gz 包到manager 服务器并安装
       # 安装依赖
       # yum install perl-Config-Tiny,perl-Log-Dispatch, perl-Parallel-ForkManager,perl-Time-HiRes -y
       # tar zxvf mha4mysql-manager-0.58.tar.gz
       # cd mha4mysql-manager-0.58
       # perl Makefile.PL
       # make&&make install
       工具包介绍:     
  Manager工具:
- masterha_check_ssh : 检查MHA的SSH配置。
- masterha_check_repl : 检查MySQL复制。
- masterha_manager : 启动MHA。
- masterha_check_status : 检测当前MHA运行状态。
- masterha_master_monitor : 监测master是否宕机。
- masterha_master_switch : 控制故障转移(自动或手动)。
- masterha_conf_host : 添加或删除配置的server信息。
  Node工具:
- save_binary_logs : 保存和复制master的二进制日志。
- apply_diff_relay_logs : 识别差异的中继日志事件并应用于其它slave。
- filter_mysqlbinlog : 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)。
- purge_relay_logs : 清除中继日志(不会阻塞SQL线程)。
报错解决:
[root@localhost authors]# masterha_check_ssh
"NI_NUMERICHOST" is not exported by the Socket module
"getaddrinfo" is not exported by the Socket module
"getnameinfo" is not exported by the Socket module
Can't continue after import errors at /usr/local/share/perl5/MHA/NodeUtil.pm line 29
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/NodeUtil.pm line 29.
Compilation failed in require at /usr/local/share/perl5/MHA/SlaveUtil.pm line 28.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/SlaveUtil.pm line 28.
Compilation failed in require at /usr/local/share/perl5/MHA/DBHelper.pm line 26.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/DBHelper.pm line 26.
Compilation failed in require at /usr/local/share/perl5/MHA/HealthCheck.pm line 30.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/HealthCheck.pm line 30.
Compilation failed in require at /usr/local/share/perl5/MHA/Server.pm line 28.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/Server.pm line 28.
Compilation failed in require at /usr/local/share/perl5/MHA/Config.pm line 29.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/Config.pm line 29.
Compilation failed in require at /usr/local/share/perl5/MHA/SSHCheck.pm line 32.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/SSHCheck.pm line 32.
Compilation failed in require at /usr/local/bin/masterha_check_ssh line 25.
BEGIN failed--compilation aborted at /usr/local/bin/masterha_check_ssh line 25.
使用cpan 安装依赖包
cpan[1]> install ExtUtils::Constant
cpan[1]> install Socket
Tips:如果服务器无法联网的情况下、可以根据cpan 的提示信息地址手动下载依赖包并放到对应的目录下在执行安装命令即可
问题解决:
[root@localhost authors]# masterha_check_ssh --help
Usage:
    masterha_check_ssh --global_conf=/etc/masterha_default.cnf
    --conf=/etc/conf/masterha/app1.cnf
    See online reference
    (http://code.google.com/p/mysql-master-ha/wiki/Requirements#SSH_public_k
    ey_authentication) for details.
4、配置MHA
        (1)  在【172.16.40.202(manager)】创建工作目录
            # mkdir -p /home/mysql/app/mha/masterha
        (2) 复制配置文件并修改
[server default]
manager_workdir=/home/mysql/app/mha/masterha
manager_log=/home/mysql/app/mha/masterha/logs/manager.log
master_binlog_dir=/home/mysql/app/mha/7066/logs/binlog
password=romysqladmint  // 设置监控用户
user=root
ping_interval=1
remote_workdir=/opt/TMHA2/mha4mysql-node-master
repl_password=replpasswod
repl_user=repl
ssh_user=root
shutdown_script=""
log_level=debug
#master node
[server1]
hostname=172.16.40.203
port=7066
ssh_port=22
#slave node
[server2]
hostname=172.16.40.202
port=7066
ssh_port=22
#candidate_master=1  //设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
#slave node
[server3]
hostname=172.16.40.201
port=7066
ssh_port=22
# 数据库授权监控用户
    (3) Manager 状态检查:
[root@fuzhou202 conf]# masterha_check_ssh  --conf=/home/mysql/app/mha/masterha/conf/app1.cnf
Sun Mar 24 19:30:26 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Mar 24 19:30:26 2019 - [info] Reading application default configuration from /home/mysql/app/mha/masterha/conf/app1.cnf..
Sun Mar 24 19:30:26 2019 - [info] Reading server configuration from /home/mysql/app/mha/masterha/conf/app1.cnf..
Sun Mar 24 19:30:26 2019 - [info] Starting SSH connection tests..
Sun Mar 24 19:30:26 2019 - [debug]
Sun Mar 24 19:30:26 2019 - [debug]  Connecting via SSH from root@172.16.40.202(172.16.40.202:22) to root@172.16.40.203(172.16.40.203:22)..
Sun Mar 24 19:30:26 2019 - [debug]   ok.
Sun Mar 24 19:30:26 2019 - [debug]  Connecting via SSH from root@172.16.40.202(172.16.40.202:22) to root@172.16.40.201(172.16.40.201:22)..
Sun Mar 24 19:30:26 2019 - [debug]   ok.
Sun Mar 24 19:30:27 2019 - [debug]
Sun Mar 24 19:30:26 2019 - [debug]  Connecting via SSH from root@172.16.40.203(172.16.40.203:22) to root@172.16.40.202(172.16.40.202:22)..
Sun Mar 24 19:30:26 2019 - [debug]   ok.
Sun Mar 24 19:30:26 2019 - [debug]  Connecting via SSH from root@172.16.40.203(172.16.40.203:22) to root@172.16.40.201(172.16.40.201:22)..
Sun Mar 24 19:30:26 2019 - [debug]   ok.
Sun Mar 24 19:30:27 2019 - [debug]
Sun Mar 24 19:30:27 2019 - [debug]  Connecting via SSH from root@172.16.40.201(172.16.40.201:22) to root@172.16.40.202(172.16.40.202:22)..
Sun Mar 24 19:30:27 2019 - [debug]   ok.
Sun Mar 24 19:30:27 2019 - [debug]  Connecting via SSH from root@172.16.40.201(172.16.40.201:22) to root@172.16.40.203(172.16.40.203:22)..
Sun Mar 24 19:30:27 2019 - [debug]   ok.
Sun Mar 24 19:30:27 2019 - [info] All SSH connection tests passed successfully.
---------
# masterha_check_repl  --conf=/home/mysql/app/mha/masterha/conf/app1.cnf
# masterha_check_status  --conf=/home/mysql/app/mha/masterha/conf/app1.cnf
上述脚本执行都通过开启manager 监控
(4)开启manager 监控服务
#启动manager
# nohup masterha_manager --conf=/home/mysql/app/mha/masterha/conf/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /home/mysql/app/mha/masterha/logs/manager.log 2>&1 &
#检查状态
[root@fuzhou202 logs]# masterha_check_status --conf=/home/mysql/app/mha/masterha/conf/app1.cnf
app1 (pid:9163) is running(0:PING_OK), master:172.16.40.203
#关闭manager 
# masterha_stop --conf=/home/mysql/app/mha/masterha/conf/app1.cnf
(5) 配置脚本方式管理VIP
# 在 master 【172.16.40.203 (master)】节点上手动绑定VIP
# ifconfig eth0:1 172.16.40.99/24
# 创建perl master-failover 脚本
# 在配置文件中添加参数
master_ip_failover_script= /usr/local/bin/master_ip_failover
# vim /home/mysql/app/mha/masterha/conf/app1.cnf #添加
master_ip_failover_script= /usr/local/bin/master_ip_failover
#编辑脚本,内容如下
# vim /usr/local/bin/master_ip_failover
---------------------------------------------------
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);
my $vip = '172.16.40.99/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth1:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth1:$key down";
GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);
exit &main();
sub main {
    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
    if ( $command eq "stop" || $command eq "stopssh" ) {
        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
--------------------------------------
# chmod +x /usr/local/bin/master_ip_failover
验证自动自动 master-failover
# masterha_check_repl --conf=/home/mysql/app/mha/masterha/conf/app1.cnf
...
un Mar 24 20:17:11 2019 - [info] Checking master_ip_failover_script status:
Sun Mar 24 20:17:11 2019 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.40.203 --orig_master_ip=172.16.40.203 --orig_master_port=7066
IN SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1 172.16.40.99/24===
四、测试
    1、自动 master-failover
        sysbench生成测试数据  
#  主库生成数据
# sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --init-rng=on --num-threads=4 --max-requests=0 --oltp-dist-type=uniform --max-time=1800 --mysql-user=root --mysql-socket=/tmp/7706.sock --mysql-password=mysqladmin--db-driver=mysql --mysql-table-engine=innodb --oltp-test-mode=complex prepare
# 关闭一台mysql的slave io_thread,模拟复制延迟情况
mysql > stop slave io_thread;
# sysbench --test=oltp --oltp-table-size=1000000 --oltp-read-only=off --init-rng=on --num-threads=4--max-requests=0 --oltp-dist-type=uniform --max-time=180 --mysql-user=root --mysql-socket=/tmp/7066.sock --mysql-password=mysqladmin --db-driver=mysql --mysql-table-engine=innodb --oltp-test-mode=complex run
# 关闭 master mysql
# pkill -9 mysqld
#观察manager 日志
...
Mon Mar 25 11:07:42 2019 - [info]  172.16.40.201: Resetting slave info succeeded.
Mon Mar 25 11:07:42 2019 - [info] Master failover to 172.16.40.201(172.16.40.201:7066) completed successfully.
Mon Mar 25 11:07:42 2019 - [info] Deleted server1 entry from /home/mysql/app/mha/masterha/conf/app1.cnf .
Mon Mar 25 11:07:42 2019 - [debug]  Disconnected from 172.16.40.202(172.16.40.202:7066)
Mon Mar 25 11:07:42 2019 - [debug]  Disconnected from 172.16.40.201(172.16.40.201:7066)
Mon Mar 25 11:07:42 2019 - [info]
----- Failover Report -----
app1: MySQL Master failover 172.16.40.203(172.16.40.203:7066) to 172.16.40.201(172.16.40.201:7066) succeeded
Master 172.16.40.203(172.16.40.203:7066) is down!
Check MHA Manager logs at fuzhou202:/home/mysql/app/mha/masterha/logs/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 172.16.40.203(172.16.40.203:7066)
Selected 172.16.40.201(172.16.40.201:7066) as a new master.
172.16.40.201(172.16.40.201:7066): OK: Applying all logs succeeded.
172.16.40.201(172.16.40.201:7066): OK: Activated master IP address.
172.16.40.202(172.16.40.202:7066): OK: Slave started, replicating from 172.16.40.201(172.16.40.201:7066)
172.16.40.201(172.16.40.201:7066): Resetting slave info succeeded.
Master failover to 172.16.40.201(172.16.40.201:7066) completed successfully.
#mha 自动切换已经识别最新的slave,提升为master,并修改配置成功
# vip 也已经切换到另外一台
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:b7:29:df brd ff:ff:ff:ff:ff:ff
    inet 172.16.40.201/24 brd 172.16.40.255 scope global eth0
    inet 172.16.40.99/24 brd 172.16.40.255 scope global secondary eth0:1
    inet6 fe80::250:56ff:feb7:29df/64 scope link
       valid_lft forever preferred_lft forever
注意:
    mha 使用自动切换 master-failover后,manager的监控程序就会自动停止,因为启动参数设置了 --remove_dead_master_conf
    所以已经恢复的mysql节点在手动加入mha 时需要的操作
    1、将节点信息添加到配置文件 /home/mysql/app/mha/masterha/conf/app1.cnf
    2、为已经恢复得mysql重新配置主从 CHANGE MASTER TO MASTER_HOST='172.16.40.203', MASTER_PORT=7066, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='replpasswod';  MASTER_HOST为切换后的新masetr的ip,start salve; 查看复制状态 show slave status\G
    3、重新启动 manager的监控程序
2、手动 master-failover
        手动 master-failover 无需开启 manager的监控程序, 当主服务器故障时,人工手动调用MHA来进行故障切换操作
# masterha_master_switch --master_state=dead --conf=/home/mysql/app/mha/masterha/conf/app1.cnf -dead_master_host=172.16.40.202 --dead_master_port=7066 --new_master_host=172.16.40.203 --new_master_port=7066 --ignore_last_failover
#此时会输出交互信息确认继续即可