可以看这篇文章的理论部分,写的不错:http://www.educity.cn/wenda/399310.html
操作可参考:http://www.178linux.com/15698
1、关于MHA
MHA(Master HA)是一款开源的MySQL的高可用程序,它为MySQL主从复制架构提供了automating master failover功能。MHA在监控到master节点故障时,会提升其中拥有的最新数据的slave节点成为新的master节点,在此期间,MHA会通过其它从节点获取额外信息来避免一致性方面的问题。MHA还提供了master节点的在线切换功能,即按需切换master/slave节点。
MHA服务有两种角色,MHA Manager(管理节点)和MHA Node(数据节点):
MHA Manager:通常单独部署在一*立机器上管理多个master/slave集群,每个master/slave集群称为一个application;
MHA node:运行在每台MySQL服务器上,它通过监控具备解析和清理log功能的脚本来加快故障转移
2、MHA组件说明
Manager节点:
-masterha_check_ssh:MHA依赖的SSH环境检测工具;-masterha_check_repl:MySQL复制环境检测工具;-masterha_manager:MHA服务主程序;-masterha_check_status:MHA运行状态探测工具;-masterha_master_monitor:MySQL master节点可用性检测工具;-masterha_master_switch:master节点切换工具;-masterha_conf_host:添加或删除配置的节点;-masterha_stop:关闭MHA服务的工具;
Node节点:
-save_binary_logs:保存和复制master的二进制日志;-apply_diff_relay_logs:识别差异的中继日志事件并用于其他slave;-fiter_mysqlbinlog:去除不必要的ROLLBACK事件(MHA已不再使用这个工具);-purge_relay_logs:清除中继日志(不会阻塞SQL线程);
自定义扩展:
-secondary_check_script:通过多条网络路由检测master的可用性;-master_ip_failover_script:更新appliction使用的masterip;-shutdown_script:强制关闭master节点;-report_script:发送报告;-init_conf_load_script:加载初始配置参数;-master_ip_online_change_script:更新master节点ip地址
3、部署及测试
实验拓扑:
node1:192.168.150.137 MHA managernode2:192.168.150.138 MHA node mariadb masternode3:192.168.150.139 MHA node mariadb slave candidatenode4:192.168.150.140 MHA node mariadb slave
配置过程:
1、修改每台服务器的hosts文件/etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.150.137 node1.com node1192.168.150.138 node2.com node2192.168.150.139 node3.com node3192.168.150.140 node4.com node42、node2-node4进行mariadb的yum安装yum -y install mariadb-server3、配置ssh互信通信环境[root@node1 ~]# ssh-keygen -t rsa -P ''Generating public/private rsa key pair.Enter file in which to save the key (/root/.ssh/id_rsa): Created directory '/root/.ssh'.Your identification has been saved in /root/.ssh/id_rsa.Your public key has been saved in /root/.ssh/id_rsa.pub.The key fingerprint is:a2:f2:10:28:cd:ea:7b:d8:f4:95:15:6e:73:a6:9d:4e root@node1.comThe key's randomart image is:+--[ RSA 2048]----+| || . || . . || + = o ||o + . S * . ||.. o . + . E ||. * o . o ||.. * . . || oo . |+-----------------+[root@node1 ~]# ls .ssh/id_rsaid_rsa id_rsa.pub [root@node1 ~]# cat .ssh/id_rsa.pub > .ssh/authorized_keys[root@node1 ~]# ssh node1The authenticity of host 'node1 (192.168.150.137)' can't be established.ECDSA key fingerprint is 2a:e3:03:52:8c:84:02:59:a2:26:a3:b2:f6:74:6c:3c.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'node1,192.168.150.137' (ECDSA) to the list of known hosts.Last login: Wed Mar 29 15:06:52 2017 from 192.168.150.1[root@node1 ~]# ll .ssh/authorized_keys -rw-r--r-- 1 root root 396 3月 29 15:35 .ssh/authorized_keys[root@node1 ~]# chmod go= .ssh/authorized_keys [root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys node2:/root/.sshroot@node2's password: id_rsa 100% 1675 1.6KB/s 00:00 authorized_keys 100% 396 0.4KB/s 00:00 [root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys node3:/root/.sshThe authenticity of host 'node3 (192.168.150.139)' can't be established.ECDSA key fingerprint is 2a:e3:03:52:8c:84:02:59:a2:26:a3:b2:f6:74:6c:3c.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'node3,192.168.150.139' (ECDSA) to the list of known hosts.root@node3's password: id_rsa 100% 1675 1.6KB/s 00:00 authorized_keys 100% 396 0.4KB/s 00:00 [root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys node4:/root/.sshroot@node4's password: Permission denied, please try again.root@node4's password: id_rsa 100% 1675 1.6KB/s 00:00 authorized_keys 100% 396 0.4KB/s 00:00 [root@node1 ~]# ssh node2Last login: Wed Mar 29 15:07:05 2017 from 192.168.150.1[root@node2 ~]# exit登出Connection to node2 closed.[root@node1 ~]# ssh node3Last login: Wed Mar 29 15:07:18 2017 from 192.168.150.1[root@node3 ~]# exit登出Connection to node3 closed.[root@node1 ~]# ssh node3Last login: Wed Mar 29 15:40:05 2017 from node1.com[root@node3 ~]# exit登出Connection to node3 closed.[root@node1 ~]# ssh node4Last failed login: Wed Mar 29 15:39:53 CST 2017 from node1.com on ssh:nottyThere was 1 failed login attempt since the last successful login.Last login: Wed Mar 29 15:39:30 2017 from node1.com[root@node4 ~]# exit登出Connection to node4 closed.[root@node1 ~]# ssh node4Last login: Wed Mar 29 15:40:13 2017 from node1.com[root@node4 ~]# exit登出Connection to node4 closed.4、修改mysql参数master:[mysqld]innodb_file_per_table = 1skip_name_resolve = 1log-bin = master-binrelay-log = relay-binserver_id = 1slave:[mysqld]innode_file_per_table = 1skip_name_resolve = 1log-bin = master-binrelay-log = relay-binserver_id = 2read_only = 1relay_log_purge = 05、主库开启并创建授权账号master:MariaDB [(none)]> SHOW MASTER STATUS;+-------------------+----------+--------------+------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |+-------------------+----------+--------------+------------------+| master-bin.000003 | 245 | | |+-------------------+----------+--------------+------------------+1 row in set (0.00 sec)MariaDB [(none)]> GRANT REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO 'repluser'@'192.168.%.%' IDENTIFIED BY 'replpass';MariaDB [(none)]> GRANT ALL ON *.* TO 'mhauser'@'192.168.%.%' IDENTIFIED BY 'mhapass'; #此为mha的管理账号MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only'; #此时主库是可写可读+---------------+-------+| Variable_name | Value |+---------------+-------+| read_only | OFF |+---------------+-------+1 row in set (0.00 sec)6、从库进行主从功能开启 node3、node4操作相同[root@node3 ~]# mysqlWelcome to the MariaDB monitor. Commands end with ; or \g.Your MariaDB connection id is 2Server version: 5.5.52-MariaDB MariaDB ServerCopyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.150.138',MASTER_USER='repluser',MSTER_PASSWORD='replpass',MASTER_LOG_FILE='master-bin.000003',MASTER_LOG_POS=245;Query OK, 0 rows affected (0.01 sec)MariaDB [(none)]> SHOW SLAVE STATUS\G*************************** 1. row *************************** Slave_IO_State: Master_Host: 192.168.150.138 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 245 Relay_Log_File: relay-bin.000001 Relay_Log_Pos: 4 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 245 Relay_Log_Space: 245 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 01 row in set (0.00 sec)MariaDB [(none)]> START SLAVE;Query OK, 0 rows affected (0.00 sec)MariaDB [(none)]> SHOW SLAVE STATUS\G*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.150.138 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 497 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 782 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 497 Relay_Log_Space: 1070 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 11 row in set (0.00 sec)MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only'; #此时从库是只读模式+---------------+-------+| Variable_name | Value |+---------------+-------+| read_only | ON |+---------------+-------+1 row in set (0.00 sec)7、此时一主两从架构已经配置完成,安装MHA包manager节点(node1):安装mha4mysql-manager-0.56-0.el6.noarch.rpm和mha4mysql-node-0.56-0.el6.noarch.rpmyum install mha4mysql* -ynode节点(node2-node4):安装mha4mysql-node-0.56-0.el6.noarch.rpmyum -y iinstall mha4mysql-node-0.56-0.el6.noarch.rpm8、初始化MHA创建配置目录及配置文件(在node1上执行)[root@node1 ~]# mkdir /etc/masterha[root@node1 ~]# vim /etc/masterha/app1.cnf [server default]user=mhauserpassword=mhapassmanager_workdir=/data/masterha/app1master_log=/data/masterha/app1/manager.logremote_workdir=/data/masterha/app1ssh_user=rootrepl_user=repluserrepl_password=replpassping_interval=1[server1]hostname=192.168.150.138candidate_master=1[server2]hostname=192.168.150.139candidate_master=1[server3]hostname=192.168.150.1409、启动前检测ssh互信配置是否OK[root@node1 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf......Warning: Permanently added '192.168.150.139' (ECDSA) to the list of known hosts.Wed Mar 29 17:03:03 2017 - [debug] ok.Wed Mar 29 17:03:03 2017 - [info] All SSH connection tests passed successfully.mysql复制集群的连接配置是否OK[root@node1 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf......Wed Mar 29 17:04:40 2017 - [info] 192.168.150.138(192.168.150.138:3306) (current master) +--192.168.150.139(192.168.150.139:3306) +--192.168.150.140(192.168.150.140:3306)Wed Mar 29 17:04:40 2017 - [info] Checking replication health on 192.168.150.139..Wed Mar 29 17:04:40 2017 - [info] ok.Wed Mar 29 17:04:40 2017 - [info] Checking replication health on 192.168.150.140..Wed Mar 29 17:04:40 2017 - [info] ok.Wed Mar 29 17:04:40 2017 - [warning] master_ip_failover_script is not defined.Wed Mar 29 17:04:40 2017 - [warning] shutdown_script is not defined.Wed Mar 29 17:04:40 2017 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.10、启动MHA[root@node1 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1 &[1] 16989[root@node1 ~]# tail -f /data/masterha/app1/manager.log 192.168.150.138(192.168.150.138:3306) (current master) +--192.168.150.139(192.168.150.139:3306) +--192.168.150.140(192.168.150.140:3306)Wed Mar 29 21:51:58 2017 - [warning] master_ip_failover_script is not defined.Wed Mar 29 21:51:58 2017 - [warning] shutdown_script is not defined.Wed Mar 29 21:51:58 2017 - [info] Set master ping interval 1 seconds.Wed Mar 29 21:51:58 2017 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.Wed Mar 29 21:51:58 2017 - [info] Starting ping health check on 192.168.150.138(192.168.150.138:3306)..Wed Mar 29 21:51:58 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..11、启动后查看master节点状态[root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnfapp1 (pid:16623) is running(0:PING_OK), master:192.168.150.13812、进行故障转移测试(1)maser节点关闭mariadb[root@node2 ~]# killall mysqld mysqld_safe(2)此时在manager上可以看到转移的日志----- Failover Report -----app1: MySQL Master failover 192.168.150.138(192.168.150.138:3306) to 192.168.150.139(192.168.150.139:3306) succeededMaster 192.168.150.138(192.168.150.138:3306) is down!Check MHA Manager logs at node1.com:/data/masterha/app1/manager.log for details.Started automated(non-interactive) failover.The latest slave 192.168.150.139(192.168.150.139:3306) has all relay logs for recovery.Selected 192.168.150.139(192.168.150.139:3306) as a new master.192.168.150.139(192.168.150.139:3306): OK: Applying all logs succeeded.192.168.150.140(192.168.150.140:3306): This host has the latest relay log events.Generating relay diff files from the latest slave succeeded.192.168.150.140(192.168.150.140:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.150.139(192.168.150.139:3306)192.168.150.139(192.168.150.139:3306): Resetting slave info succeeded.Master failover to 192.168.150.139(192.168.150.139:3306) completed successfully.故障转移后,manager会自动停止,此时查看master状态[root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnfapp1 is stopped(2:NOT_RUNNING).(3)查看其它两个库状态node3 已成功接管master,并可读写MariaDB [(none)]> SHOW MASTER STATUS;+-------------------+----------+--------------+------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |+-------------------+----------+--------------+------------------+| master-bin.000003 | 245 | | |+-------------------+----------+--------------+------------------+1 row in set (0.00 sec)MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only';+---------------+-------+| Variable_name | Value |+---------------+-------+| read_only | OFF |+---------------+-------+1 row in set (0.00 sec)node4MariaDB [(none)]> SHOW SLAVE STATUS\G*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.150.139 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 245 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 530 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 245 Relay_Log_Space: 818 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 21 row in set (0.00 sec)MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only';+---------------+-------+| Variable_name | Value |+---------------+-------+| read_only | ON |+---------------+-------+1 row in set (0.00 sec)(4)提供新的从节点已修复复制集群master界定啊故障后,需要重新准备好一个新的MySQL节点。基于来自于master节点的备份恢复后,将其重新配置为mster的从节点即可。新加入节点IP为原master节点IP,否则还得修改appl.cnf中相应的设置,最后再次启动manager,并再次检查状态。[root@node2 ~]# rm -rf /var/lib/mysql/*[root@node2 ~]# vim /etc/my.cnf添加从库两选项read_only = 1relay_log_purge = 0[root@node2 ~]# systemctl start mariadb.service[root@node2 ~]# mysqlWelcome to the MariaDB monitor. Commands end with ; or \g.Your MariaDB connection id is 2Server version: 5.5.52-MariaDB MariaDB ServerCopyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.MariaDB [(none)]> GRANT REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO 'repluser'@'192.168.%.%' IDENTIFIED BY 'replpass';Query OK, 0 rows affected (0.00 sec)MariaDB [(none)]> GRANT ALL ON *.* TO 'mhauser'@'192.168.%.%' IDENTIFIED BY 'mhapass';Query OK, 0 rows affected (0.00 sec)MariaDB [(none)]> FLUSH PRIVILEGES;Query OK, 0 rows affected (0.00 sec)MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.150.139',MASTER_USER='repluser',MASTER_PASSWORD='replpass',MASTER_LOG_FILE='master-bin.000003',MASTER_LOG_POS=245;Query OK, 0 rows affected (0.01 sec)MariaDB [(none)]> SHOW SLAVE\GERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '' at line 1MariaDB [(none)]> START SLAVE;Query OK, 0 rows affected (0.01 sec)MariaDB [(none)]> SHOW SLAVE STATUS\G*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.150.139 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 245 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 530 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 245 Relay_Log_Space: 818 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 21 row in set (0.00 sec)再次开启manager查看状态,状态全部OK,主库变更为node3[root@node1 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1 &[root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnfapp1 (pid:17229) is running(0:PING_OK), master:192.168.150.139
本文出自 “汪立明” 博客,请务必保留此出处http://afterdawn.blog.51cto.com/7503144/1928622