MHA+Atlas+mysql一主一从开启gtid安装配置与实验

时间:2021-04-01 15:28:04

各节点架构

(说明:生产环境有两个节点可以组成一套完整集群,我是测试环境,因此对于manager以及atlas和binlog server都是单点,如果生产环境,相应的将manager以及atlas和binlog server每个节点都部署即可)

10.80.8.89 mysql-master manager,node atlas
10.80.8.90 mysql-slave node binlog server

安装步骤

10.80.8.89操作命令

1.#增加mha用户

useradd mha

passwd mha

2.#增加mysql用户

useradd mysql

passwd mysql

3.#生成私钥

ssh-keygen -t rsa

4.#各节点建立互信

su -mha

ssh-copy-id -i ./.ssh/id_rsa.pub 10.80.8.89

ssh-copy-id -i ./.ssh/id_rsa.pub 10.80.8.90

5.#安装mysql,二进制安装

mkdir -p /soft

cd /soft && wget http://cdn.mysql.com//Downloads/MySQL-5.6/mysql-5.6.30-linux-glibc2.5-x86_64.tar.gz

tar -zxf mysql-5.6.30-linux-glibc2.5-x86_64.tar.gz

ln -s /soft/mysql-5.6.30-linux-glibc2.5-x86_64 /usr/local/mysql

echo "export $PATH=/usr/local/mysql/bin:$PATH">>/etc/profile

source /etc/profile

mkdir -p /data/mysql/3306/{data,logs,etc,tmp}

chown -R mysql:mysql /data/mysql/3306

6.#上传配置文件my.cnf到/data/mysql/3306/etc下,配置文件见附件my.cnf,两个节点只需要修改其中的server-id为不一样即可

7.#初始化mysql

/usr/local/mysql/scripts/mysql_install_db --defaults-file=/data/mysql//etc/my.cnf

8.#启动mysql

mysqld_safe --defaults-file=/data/mysql//etc/my.cnf&

9.#安装mha

yum -y install perl-DBD-MySQL  perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl

yum localinstall mha4mysql-manager-0.56-0.el6.noarch.rpm mha4mysql-node-0.56-0.el6.noarch.rpm

10.#配置文件,注意由于我的主从复制启用了gtid,所以binlog server必须配置,如果未启用gtid那么可以不必配置

/etc/masterha_default.cnf

[server default]

user=mha

password=hellomha

ssh_user=mha

ping_interval=

ping_type=INSERT

log_level=debug

/etc/app1.cnf

[server default]

master_binlog_dir= /data/mysql//logs

remote_workdir=/var/log/masterha/app1

manager_workdir=/var/log/masterha/app1

manager_log=/var/log/masterha/app1/app1.log

master_ip_failover_script=/bin/master_ip_failover

[server1]

hostname=10.80.8.89

ip=10.80.8.89

port=

master_binlog_dir=/data/mysql//logs

candidate_master=

#check_repl_delay=

[server2]

hostname=10.80.8.90

ip=10.80.8.90

port=

master_binlog_dir=/data/mysql//logs

candidate_master=

[binlog1]

hostname=10.80.8.90

ip=10.80.8.90

master_binlog_dir=/data/binlog

11.#创建manager所需目录

mkdir -p /var/log/masterha/app1

chown -R mha:mha /var/log/masterha/app1

12.#安装并配置atlas

wget https://github.com/Qihoo360/Atlas/releases/download/2.2.1/Atlas-2.2.1.el6.x86_64.rpm

rpm -ivh Atlas-2.2..el6.x86_64.rpm

13.#启动atlas

/usr/local/mysql-proxy/bin/mysql-proxyd test start

14.#配置atlas实例,由于本实验主要是配合mha在atlas上线下线mysql master,所以其他参数默认

mysql -h 10.80.8.89 -P2345 -uuser -ppwd -e "add master 10.80.8.89:3306;"

mysql -h 10.80.8.89 -P2345 -uuser -ppwd -e "add slave 10.80.8.90:3306;"

15.#创建mha用户和复制用户

grant all privileges on *.* to 'mha'@'%' identified by 'hellomha';

grant replication slave on *.* to 'repl'@'%' identified by 'hellorepl';

16. #更改自动切换atlas里的master    /bin/master_ip_failover

use strict;
use warnings FATAL => 'all'; use Getopt::Long;
use MHA::DBHelper; my $proxy_ip_1 = "10.80.8.89"; my (
$command, $ssh_user, $orig_master_host,
$orig_master_ip, $orig_master_port, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password
); GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
); sub add_vip {
my $output1 = `/usr/local/mysql/bin/mysql -h $proxy_ip_1 -P -uuser -ppwd -e "remove backend 1;"` ;
my $output2 = `/usr/local/mysql/bin/mysql -h $proxy_ip_1 -P -uuser -ppwd -e "remove backend 1;"` ;
my $output3 = `/usr/local/mysql/bin/mysql -h $proxy_ip_1 -P -uuser -ppwd -e "save config;"` ;
my $output4 = `/usr/local/mysql/bin/mysql -h $proxy_ip_1 -P -uuser -ppwd -e "add master $new_master_host:3306;"` ;
my $output5 = `/usr/local/mysql/bin/mysql -h $proxy_ip_1 -P -uuser -ppwd -e "save config;"` ;
# my $output1 = `ssh -o ConnectTimeout= -o ConnectionAttempts= $orig_master_host /sbin/ip addr del $vip/ dev $if`;
# my $output2 = `ssh -o ConnectTimeout= -o ConnectionAttempts= $new_master_host /sbin/ip addr add $vip/ dev $if`; }
exit &main(); sub main {
if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = ;
eval { # updating global catalog, etc
$exit_code = ;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) { # all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=, etc) here.
my $exit_code = ;
eval {
my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, ); ## Set read_only= on the new master
$new_master_handler->disable_log_bin_local();
print "Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only(); ## Creating an app user on the new master
#print "Creating app user on the new master..\n";
#FIXME_xxx_create_user( $new_master_handler->{dbh} );
$new_master_handler->enable_log_bin_local();
$new_master_handler->disconnect(); ## Update master ip on the catalog database, etc
&add_vip();
$exit_code = ;
};
if ($@) {
warn $@; # If you want to continue failover, exit .
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) { # do nothing
exit ;
}
else {
&usage();
exit ;
}
} sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

10.80.8.90操作命令

1-9步参见89操作命令

10.#创建manager所需目录

mkdir -p /var/log/masterha/app1

chown -R mha:mha /var/log/masterha/app1

11.#与89建立主从复制关系

mysql>CHANGE MASTER TO

MASTER_HOST="10.80.8.89",
MASTER_USER="repl",
MASTER_PASSWORD="hellomrepl",
MASTER_AUTO_POSITION = ; mysql>start slave; mysql>set global read_only=on;

12.建立binlog server

mkdir -p /data/binlog

chown -R mha:mha /data/binlog

su - mha

/usr/local/mysql/bin/mysqlbinlog -R --raw --host=10.80.8.89 --user=repl --port= --password=hellorepl --stop-never -t -r /data/binlog mysql_bin. &

10.80.8.89操作命令

1.#测试mha ssh

su - mha

[mha@iZ250pd1qtuZ ~]$ masterha_check_ssh --conf=/etc/app1.cnf
Wed May :: - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed May :: - [info] Reading application default configuration from /etc/app1.cnf..
Wed May :: - [info] Reading server configuration from /etc/app1.cnf..
Wed May :: - [info] Starting SSH connection tests..
Wed May :: - [debug]
Wed May :: - [debug] Connecting via SSH from mha@10.80.8.89(10.80.8.89:) to mha@10.80.8.90(10.80.8.90:)..
Wed May :: - [debug] ok.
Wed May :: - [debug]
Wed May :: - [debug] Connecting via SSH from mha@10.80.8.90(10.80.8.90:) to mha@10.80.8.89(10.80.8.89:)..
Wed May :: - [debug] ok.
Wed May :: - [info] All SSH connection tests passed successfully.

2.#测试mha repl

[mha@iZ250pd1qtuZ ~]$ masterha_check_repl --conf=/etc/app1.cnf
Wed May :: - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed May :: - [info] Reading application default configuration from /etc/app1.cnf..
Wed May :: - [info] Reading server configuration from /etc/app1.cnf..
Wed May :: - [info] MHA::MasterMonitor version 0.56.
Wed May :: - [debug] Connecting to servers..
Wed May :: - [debug] Connected to: 10.110.18.89(10.110.18.89:), user=mha
Wed May :: - [debug] Number of slave worker threads on host 10.110.18.89(10.110.18.89:):
Wed May :: - [debug] Connected to: 10.110.18.90(10.110.18.90:), user=mha
Wed May :: - [debug] Number of slave worker threads on host 10.110.18.90(10.110.18.90:):
Wed May :: - [debug] Comparing MySQL versions..
Wed May :: - [debug] Comparing MySQL versions done.
Wed May :: - [debug] Connecting to servers done.
Wed May :: - [info] GTID failover mode =
Wed May :: - [info] Dead Servers:
Wed May :: - [info] Alive Servers:
Wed May :: - [info] 10.110.18.89(10.110.18.89:)
Wed May :: - [info] 10.110.18.90(10.110.18.90:)
Wed May :: - [info] Alive Slaves:
Wed May :: - [info] 10.110.18.90(10.110.18.90:) Version=5.6.-76.2-log (oldest major version between slaves) log-bin:enabled
Wed May :: - [info] GTID ON
Wed May :: - [debug] Relay log info repository: TABLE
Wed May :: - [info] Replicating from 10.110.18.89(10.110.18.89:)
Wed May :: - [info] Primary candidate for the new Master (candidate_master is set)
Wed May :: - [info] Current Alive Master: 10.110.18.89(10.110.18.89:)
Wed May :: - [info] Checking slave configurations..
Wed May :: - [info] read_only= is not set on slave 10.110.18.90(10.110.18.90:).
Wed May :: - [info] Checking replication filtering settings..
Wed May :: - [info] binlog_do_db= , binlog_ignore_db= information_schema,monitor,performance_schema
Wed May :: - [info] Replication filtering check ok.
Wed May :: - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Wed May :: - [debug] SSH connection test to 10.110.18.90, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=, timeout
Wed May :: - [info] HealthCheck: SSH to 10.110.18.90 is reachable.
Wed May :: - [info] Binlog server 10.110.18.90 is reachable.
Wed May :: - [info] Checking recovery script configurations on 10.110.18.90(10.110.18.90:)..
Wed May :: - [info] Executing command: save_binary_logs --command=test --start_pos= --binlog_dir=/data/binlog --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=mysql_bin. --debug
Wed May :: - [info] Connecting to mha@10.110.18.90(10.110.18.90:)..
Creating /var/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/binlog, up to mysql_bin.
Wed May :: - [info] Binlog setting check done.
Wed May :: - [info] Checking SSH publickey authentication settings on the current master..
Wed May :: - [debug] SSH connection test to 10.110.18.89, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=, timeout
Wed May :: - [info] HealthCheck: SSH to 10.110.18.89 is reachable.
Wed May :: - [info]
10.110.18.89(10.110.18.89:) (current master)
+--10.110.18.90(10.110.18.90:) Wed May :: - [info] Checking replication health on 10.110.18.90..
Wed May :: - [info] ok.
Wed May :: - [info] Checking master_ip_failover_script status:
Wed May :: - [info] /bin/master_ip_failover --command=status --ssh_user=mha --orig_master_host=10.110.18.89 --orig_master_ip=10.110.18.89 --orig_master_port=
Wed May :: - [info] OK.
Wed May :: - [warning] shutdown_script is not defined.
Wed May :: - [debug] Disconnected from 10.110.18.89(10.110.18.89:)
Wed May :: - [debug] Disconnected from 10.110.18.90(10.110.18.90:)
Wed May :: - [info] Got exit code (Not master dead). MySQL Replication Health is OK.

3.#启动masterha_manager

nohup masterha_manager --conf=/etc/app1.cnf 

实验阶段

实验一.关闭master,观察从库有没有切成主库,观察atlas里的backend是不是已经下线了旧主并且上线了新主

1.#查看atlas中的backends信息

[mha@iZ250pd1qtuZ ~]$ mysql -h 10.80.8.89 -P2345 -uuser -p -e "select * from backends;"
Enter password:
+-------------+-------------------+-------+------+
| backend_ndx | address | state | type |
+-------------+-------------------+-------+------+
| | 10.80.8.89: | up | rw | | | 10.80.8.90: | up | ro |
+-------------+-------------------+-------+------+

2.#关闭master

mysqladmin -uroot -p shutdown

3.#查看atlas中的从库信息

[mha@iZ250pd1qtuZ ~]$ mysql -h 10.80.8.89 -P2345 -uuser -p -e "select * from backends;"
Enter password:
+-------------+-------------------+-------+------+
| backend_ndx | address | state | type |
+-------------+-------------------+-------+------+ | | 10.80.8.90: | up | ro |
+-------------+-------------------+-------+------+

4.#查看mha切换日志

----- Failover Report -----

app1: MySQL Master failover 10.80.8.89(10.80.8.89:) to 10.80.8.90(10.80.8.90:) succeeded

Master 10.80.8.89(10.80.8.89:) is down!

Check MHA Manager logs at iZ250pd1qtuZ:/var/log/masterha/app1/app1.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 10.80.8.89(10.80.8.89:)
Selected 10.80.8.90(10.80.8.90:) as a new master.
10.80.8.90(10.80.8.90:): OK: Applying all logs succeeded.
10.80.8.90(10.80.8.90:): OK: Activated master IP address.
10.80.8.90(10.80.8.90:): Resetting slave info succeeded.
Master failover to 10.80.8.90(10.80.8.90:) completed successfully.

5.#将下线的主作为从挂到新主

mysqld_safe --defaults-file=/data/mysql//etc/my.cnf &
mysql -uroot -p
mysql>CHANGE MASTER TO
MASTER_HOST="10.80.8.90",
MASTER_USER="repl",
MASTER_PASSWORD="hellorepl",
MASTER_AUTO_POSITION = ;
mysql>start slave;
mysql>set global read_only=on;

6.#启动binlog server,指向新master90

su - mha

/usr/local/mysql/bin/mysqlbinlog -R --raw --host=10.80.8.90 --user=repl --port= --password=hellorepl --stop-never -t -r /data/binlog mysql_bin. &

7.#清理mha目录下所有内容,发生切换后,状态文件还存在,会影响下次的切换,如果不删除日志,那么mha在8小时内如果再次发生宕机,那么不会发生自动切换

rm -rf /var/log/masterha/app1/*

8.#启动mha

nohup masterha_manager --conf=/etc/app1.cnf &

实验二.从库关闭slave 的io thread线程,主库写入数据,此时kill -9 mysql实例,观察主库写入的数据有没有正确被mha补全到新主

1.#从停止slave io_thread

mysql> stop slave io_thread;
Query OK, 0 rows affected (0.00 sec)

mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| gaoquan |
| infra |
| mysql |
| performance_schema |
| test |
+--------------------+
6 rows in set (0.00 sec)

2.#主创建数据库

mysql> create database sbtest;
Query OK, row affected (0.00 sec) mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| gaoquan |
| infra |
| mysql |
| performance_schema |
| sbtest |
| test |
+--------------------+
rows in set (0.01 sec)

3.#从查看数据库

mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| gaoquan |
| infra |
| mysql |
| performance_schema |
| test |
+--------------------+
rows in set (0.00 sec)

4.#杀掉主的mysql进程,mysqld_safe和mysqld进程一起干掉

kill -  

5.#从查看数据库中的sbtest是否已经正常拉取过来

mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| gaoquan |
| infra |
| mysql |
| performance_schema |
| sbtest |
| test |
+--------------------+
rows in set (0.00 sec)

实验三.交互式切换主从数据库,适用于升级Master,主从都存活,但是需要升级内存或者更换控制器等必须重启服务器时。(注意,master_ip_online_change脚本是在切换到新主之前执行的,因此从库必须制定read_only)

1.#确保masterha_manager不存活,如果存活则停止

masterha_stop --conf=/etc/app1.cnf

2.#将主由90切换到89

[mha@iZ250pd1qtuZ ~]$ masterha_master_switch --master_state=alive --conf=/etc/app1.cnf --new_master_host=10.80.8.89 --new_master_port=3306
Wed May 18 20:07:17 2016 - [info] MHA::MasterRotate version 0.56.
Wed May 18 20:07:17 2016 - [info] Starting online master switch..
Wed May 18 20:07:17 2016 - [info]
Wed May 18 20:07:17 2016 - [info] * Phase 1: Configuration Check Phase..
Wed May 18 20:07:17 2016 - [info]
Wed May 18 20:07:17 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed May 18 20:07:17 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Wed May 18 20:07:17 2016 - [info] Reading server configuration from /etc/app1.cnf..
Wed May 18 20:07:17 2016 - [debug] Connecting to servers..
Wed May 18 20:07:17 2016 - [debug] Connected to: 10.80.8.89(10.80.8.89:3306), user=mha
Wed May 18 20:07:17 2016 - [debug] Number of slave worker threads on host 10.80.8.89(10.80.8.89:3306): 0
Wed May 18 20:07:17 2016 - [debug] Connected to: 10.80.8.90(10.80.8.90:3306), user=mha
Wed May 18 20:07:17 2016 - [debug] Number of slave worker threads on host 10.80.8.90(10.80.8.90:3306): 0
Wed May 18 20:07:17 2016 - [debug] Comparing MySQL versions..
Wed May 18 20:07:17 2016 - [debug] Comparing MySQL versions done.
Wed May 18 20:07:17 2016 - [debug] Connecting to servers done.
Wed May 18 20:07:17 2016 - [info] GTID failover mode = 1
Wed May 18 20:07:17 2016 - [info] Current Alive Master: 10.80.8.90(10.80.8.90:3306)
Wed May 18 20:07:17 2016 - [info] Alive Slaves:
Wed May 18 20:07:17 2016 - [info] 10.80.8.89(10.80.8.89:3306) Version=5.6.29-76.2-log (oldest major version between slaves) log-bin:enabled
Wed May 18 20:07:17 2016 - [info] GTID ON
Wed May 18 20:07:17 2016 - [debug] Relay log info repository: TABLE
Wed May 18 20:07:17 2016 - [info] Replicating from 10.80.8.90(10.80.8.90:3306)
Wed May 18 20:07:17 2016 - [info] Primary candidate for the new Master (candidate_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.80.8.90(10.80.8.90:3306)? (YES/no): yes
Wed May 18 20:07:20 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Wed May 18 20:07:20 2016 - [info] ok.
Wed May 18 20:07:20 2016 - [info] Checking MHA is not monitoring or doing failover..
Wed May 18 20:07:20 2016 - [info] Checking replication health on 10.80.8.89..
Wed May 18 20:07:20 2016 - [info] ok.
Wed May 18 20:07:20 2016 - [info] 10.80.8.89 can be new master.
Wed May 18 20:07:20 2016 - [info]
From:
10.80.8.90(10.80.8.90:3306) (current master)
+--10.80.8.89(10.80.8.89:3306)

To:
10.80.8.89(10.80.8.89:3306) (new master)

Starting master switch from 10.80.8.90(10.80.8.90:3306) to 10.80.8.89(10.80.8.89:3306)? (yes/NO): yes
Wed May 18 20:07:21 2016 - [info] Checking whether 10.80.8.89(10.80.8.89:3306) is ok for the new master..
Wed May 18 20:07:21 2016 - [info] ok.
Wed May 18 20:07:21 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Wed May 18 20:07:21 2016 - [info]
Wed May 18 20:07:21 2016 - [debug] Disconnected from 10.80.8.90(10.80.8.90:3306)
Wed May 18 20:07:21 2016 - [info] * Phase 2: Rejecting updates Phase..
Wed May 18 20:07:21 2016 - [info]
Wed May 18 20:07:21 2016 - [info] Executing master ip online change script to disable write on the current master:
Wed May 18 20:07:21 2016 - [info] /bin/master_ip_online_change --command=stop --orig_master_host=10.80.8.90 --orig_master_ip=10.80.8.90 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='yz6xFOkF1mh3Wmkr1Rsz' --new_master_host=10.80.8.89 --new_master_ip=10.80.8.89 --new_master_port=3306 --new_master_user='mha' --new_master_password='yz6xFOkF1mh3Wmkr1Rsz' --orig_master_ssh_user=mha --new_master_ssh_user=mha
Wed May 18 20:07:21 2016 556919 Set read_only on the new master.. ok.
Wed May 18 20:07:21 2016 559821 drop vip 192.168.1.100..
Warning: Using a password on the command line interface can be insecure.
ERROR 1105 (07000) at line 1: invalid backend_id
Warning: Using a password on the command line interface can be insecure.
ERROR 1105 (07000) at line 1: invalid backend_id
Warning: Using a password on the command line interface can be insecure.
Wed May 18 20:07:21 2016 582940 Waiting all running 1 threads are disconnected.. (max 1500 milliseconds)
{'Time' => '2598','db' => undef,'Id' => '51','User' => 'mha','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Rows_examined' => '0','Info' => undef,'Rows_sent' => '0','Host' => '10.80.8.89:11427'}
Wed May 18 20:07:22 2016 083378 Waiting all running 1 threads are disconnected.. (max 1000 milliseconds)
{'Time' => '2599','db' => undef,'Id' => '51','User' => 'mha','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Rows_examined' => '0','Info' => undef,'Rows_sent' => '0','Host' => '10.80.8.89:11427'}
Wed May 18 20:07:22 2016 584152 Waiting all running 1 threads are disconnected.. (max 500 milliseconds)
{'Time' => '2599','db' => undef,'Id' => '51','User' => 'mha','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Rows_examined' => '0','Info' => undef,'Rows_sent' => '0','Host' => '10.80.8.89:11427'}
Wed May 18 20:07:23 2016 084569 Set read_only=1 on the orig master.. ok.
Wed May 18 20:07:23 2016 085676 Waiting all running 1 queries are disconnected.. (max 500 milliseconds)
{'Time' => '2600','db' => undef,'Id' => '51','User' => 'mha','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Rows_examined' => '0','Info' => undef,'Rows_sent' => '0','Host' => '10.80.8.89:11427'}
Wed May 18 20:07:23 2016 585008 Killing all application threads..
Wed May 18 20:07:23 2016 585529 done.
Wed May 18 20:07:23 2016 - [info] ok.
Wed May 18 20:07:23 2016 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Wed May 18 20:07:23 2016 - [info] Executing FLUSH TABLES WITH READ LOCK..
Wed May 18 20:07:23 2016 - [info] ok.
Wed May 18 20:07:23 2016 - [info] Orig master binlog:pos is mysql_bin.000007:193297.
Wed May 18 20:07:23 2016 - [debug] Fetching current slave status..
Wed May 18 20:07:23 2016 - [debug] Fetching current slave status done.
Wed May 18 20:07:23 2016 - [info] Waiting to execute all relay logs on 10.80.8.89(10.80.8.89:3306)..
Wed May 18 20:07:23 2016 - [info] master_pos_wait(mysql_bin.000007:193297) completed on 10.80.8.89(10.80.8.89:3306). Executed 0 events.
Wed May 18 20:07:23 2016 - [info] done.
Wed May 18 20:07:23 2016 - [debug] Stopping SQL thread on 10.80.8.89(10.80.8.89:3306)..
Wed May 18 20:07:23 2016 - [debug] done.
Wed May 18 20:07:23 2016 - [info] Getting new master's binlog name and position..
Wed May 18 20:07:23 2016 - [info] mysql_bin.000007:202995
Wed May 18 20:07:23 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.80.8.89', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='mha', MASTER_PASSWORD='xxx';
Wed May 18 20:07:23 2016 - [info] Executing master ip online change script to allow write on the new master:
Wed May 18 20:07:23 2016 - [info] /bin/master_ip_online_change --command=start --orig_master_host=10.80.8.90 --orig_master_ip=10.80.8.90 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='yz6xFOkF1mh3Wmkr1Rsz' --new_master_host=10.80.8.89 --new_master_ip=10.80.8.89 --new_master_port=3306 --new_master_user='mha' --new_master_password='yz6xFOkF1mh3Wmkr1Rsz' --orig_master_ssh_user=mha --new_master_ssh_user=mha
Wed May 18 20:07:23 2016 695963 Set read_only=0 on the new master.
Wed May 18 20:07:23 2016 696375Add vip 192.168.1.100 on eth0..
Warning: Using a password on the command line interface can be insecure.
Warning: Using a password on the command line interface can be insecure.
Wed May 18 20:07:23 2016 - [info] ok.
Wed May 18 20:07:23 2016 - [info]
Wed May 18 20:07:23 2016 - [info] * Switching slaves in parallel..
Wed May 18 20:07:23 2016 - [info]
Wed May 18 20:07:23 2016 - [info] Unlocking all tables on the orig master:
Wed May 18 20:07:23 2016 - [info] Executing UNLOCK TABLES..
Wed May 18 20:07:23 2016 - [info] ok.
Wed May 18 20:07:23 2016 - [info] All new slave servers switched successfully.
Wed May 18 20:07:23 2016 - [info]
Wed May 18 20:07:23 2016 - [info] * Phase 5: New master cleanup phase..
Wed May 18 20:07:23 2016 - [info]
Wed May 18 20:07:23 2016 - [debug] Clearing slave info..
Wed May 18 20:07:23 2016 - [debug] Stopping slave IO/SQL thread on 10.80.8.89(10.80.8.89:3306)..
Wed May 18 20:07:23 2016 - [debug] done.
Wed May 18 20:07:23 2016 - [debug] SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK.
Wed May 18 20:07:23 2016 - [info] 10.80.8.89: Resetting slave info succeeded.
Wed May 18 20:07:23 2016 - [info] Switching master to 10.80.8.89(10.80.8.89:3306) completed successfully.
Wed May 18 20:07:23 2016 - [debug] Disconnected from 10.80.8.89(10.80.8.89:3306)
Wed May 18 20:07:23 2016 - [debug] Disconnected from 10.80.8.90(10.80.8.90:3306)
[mha@iZ250pd1qtuZ ~]$ mysql -h10.80.8.89 -uuser -ppwd -P2345 -e "select * from backends;";
Warning: Using a password on the command line interface can be insecure.
+-------------+-------------------+-------+------+
| backend_ndx | address | state | type |
+-------------+-------------------+-------+------+
| 1 | 10.80.8.89:3306 | up | rw |
+-------------+-------------------+-------+------+

实验四:master挂掉,手动进行切换

[mha@iZ250pd1qtuZ ~]$ mysqladmin -uroot -p shutdown
Enter password:
:: mysqld_safe mysqld from pid file /data/mysql//logs/mysqld..pid ended
[mha@iZ250pd1qtuZ ~]$ masterha_master_switch --master_state=dead --conf=/etc/conf/masterha/app1.cnf --dead_master_host=10.80.8.89 --new_master_host=10.80.8.90 --interactive=
--dead_master_ip=<dead_master_ip> is not set. Using 10.80.8.89.
--dead_master_port=<dead_master_port> is not set. Using .
Wed May :: - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed May :: - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: /etc/conf/masterha/app1.cnf:No such file or directory
at /usr/share/perl5/vendor_perl/MHA/MasterFailover.pm line .
[mha@iZ250pd1qtuZ ~]$ masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=10.80.8.89 --new_master_host=10.80.8.90 --interactive=
--dead_master_ip=<dead_master_ip> is not set. Using 10.80.8.89.
--dead_master_port=<dead_master_port> is not set. Using .
Wed May :: - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed May :: - [info] Reading application default configuration from /etc/app1.cnf..
Wed May :: - [info] Reading server configuration from /etc/app1.cnf..
[mha@iZ250pd1qtuZ ~]$ mysql -h 10.80.8.89 -uuser -ppwd -P2345 -e "select * from backends";
Warning: Using a password on the command line interface can be insecure.
+-------------+-------------------+-------+------+
| backend_ndx | address | state | type |
+-------------+-------------------+-------+------+
| | 10.80.8.90: | up | rw |
+-------------+-------------------+-------+------+

总结:

1.由于用的是阿里云,因此没法模拟主机直接掉电关闭的情况

2.mha很重要的一个功能是补全slave的差异并与新主建立主从关系,由于我们的环境并发及重要程度不是特别大,一主挂多从的成本过高,因此一主一从已经足够,如果是业务相对比较重要,那么还是建议一主多从。

3.secondary_check_script没有开启,有需要的可以开启试下,即通过多个节点去访问master,防止由于网络抖动而误判master异常而导致master切换