工作不怎么忙,搞点儿开发吧差点儿事,就想着弄点儿架构的事儿。正好前段时间看过关于keepalived+双主实现高可用的文章,也恰好身边的朋友所在的公司也部分用这个架构。没什么事儿就搞搞吧,正好对比下MMM、MHA、keepalived+双主三种架构的优劣和DB维护的体验感。简单讲讲自己的用户体验感,就搭建难易程度讲MMM的安装包封装好的,修改的配置文件较MHA少一些,比keepalived+双主要稍微麻烦点儿。本着省事,维护起来省事还是觉得MMM真的便利,黑盒操作适合我这种懒人加小白类型。
某位大佬讲过,如果你的公司还在用MMM和MHA,那么你可以考虑跳槽了。我觉得这句话很有道理,其实并不是让我们真的去跳槽,毕竟每个人工作的目的,环境不一样,有些架构上的事儿我们决定不了。没法随性而行,但不能停止探索的脚步,本过程从搭建调研/搭建过程/搭建测试/搭建总结四个方面讲述我对双主+keepalived的理解和用户体验感。
一、搭建调研
传统的高可用架构如MHA、MMM存在一些不成熟的问题,如脑裂。引入keepalived和双主复制模式,实现高可用架构,但keepalived本身是在机器宕机时才会实现漂移功能,我们的目标是要MySQL实例宕机后要实现故障切换,还需要辅助的脚本来帮助keepalived来实现更灵活的漂移。
keepalived简介
keepalived是集群管理中保证集群高可用的一个软件解决方案,其功能类似于heartbeat,用来防止单点故障,这里的作用我理解其实就是保证VIP的顺利漂移。虚拟路由冗余协议,可以认为是实现路由器高可用的协议,即将N台提供相同功能的路由器组成一个路由器组,这个组里面有一个master和多个backup,master上面有一个对外提供服务的vip,master会发组播(组播地址为224.0.0.18),当backup收不到vrrp包时就认为master宕掉了,这时就需要根据VRRP的优先级来选举一个backup当master,这样的话就可以保证路由器的高可用了。
keepalived配置说明
keepalived只有一个配置文件keepalived.conf,里面主要包括以下几个配置区域,分别是global_defs、vrrp_instance和virtual_server。
- global_defs:主要是配置故障发生时的通知对象以及机器标识;
- vrrp_instance:用来定义对外提供服务的VIP区域及其相关属性;
- virtual_server:虚拟服务器定义。
二、搭建过程
搭建环境(服务器配置忽略)
master1:172.16.3.190/22 3309 VIP:172.16.3.123/22
1 #master1配置keepalived 2 yum install keepalived.x86_64 3 [root@172-16-3-190 we_ops_admin]# cat /etc/keepalived/keepalived.conf 4 ! Configuration File for keepalived 5 6 global_defs { 7 router_id lvs_master1 8 } 9 10 vrrp_instance VI_1 { 11 state BACKUP 12 interface eth0 13 virtual_router_id 172 14 priority 100 15 advert_int 1 16 nopreempt 17 authentication { 18 auth_type PASS 19 auth_pass 1111 20 } 21 virtual_ipaddress { 22 172.16.3.123/22 23 } 24 } 25 26 virtual_server 172.16.3.123 3309 { 27 delay_loop 6 28 lb_algo rr 29 lb_kind NAT 30 nat_mask 255.255.255.0 31 persistence_timeout 50 32 protocol TCP 33 34 real_server 172.16.3.190 3309 { 35 weight 3 36 notify_down /opt/shells/keepalived_mysql.sh 37 TCP_CHECK { 38 connect_timeout 3 39 nb_get_retry 3 40 delay_before_retry 3 41 connect_port 3309 42 } 43 } 44 } 45 46 #master2上安装keepalived 47 yum install keepalived.x86_64 48 [root@172-16-3-189 we_ops_admin]# cat /etc/keepalived/keepalived.conf 49 ! Configuration File for keepalived 50 51 global_defs { 52 router_id lvs_master2 53 } 54 55 vrrp_instance VI_1 { 56 state BACKUP 57 interface eth0 58 virtual_router_id 172 59 priority 50 60 advert_int 1 61 # nopreempt 62 authentication { 63 auth_type PASS 64 auth_pass 1111 65 } 66 virtual_ipaddress { 67 172.16.3.123/22 68 } 69 } 70 71 virtual_server 172.16.3.123 3309 { 72 delay_loop 6 73 lb_algo rr 74 lb_kind NAT 75 nat_mask 255.255.255.0 76 persistence_timeout 50 77 protocol TCP 78 79 real_server 172.16.3.189 3309 { 80 weight 3 81 notify_down /opt/shells/keepalived_mysql.sh 82 TCP_CHECK { 83 connect_timeout 3 84 nb_get_retry 3 85 delay_before_retry 3 86 connect_port 3309 87 } 88 } 89 }
上述配置中我们可以保证keepalived服务对VIP:172.16.3.123/22的控制权,默认是keepalived服务关闭,那么会触发VIP的漂移。正常运行的服务不会发生异常停止的现象,如果系统发生宕机会触发所有的服务停止,这里系统宕机是触发VIP漂移的导火索。只是这里我们想让keepalived服务于MySQL复制集,那么这里的导火索自然而然是MySQL服务的状态。如果服务状态不可用,那么我们希望这个应用VIP可以漂移到复制集的另一台机器上;如果服务状态可用,我们希望VIP不要漂移。要想实现这个目的,我们还需要一个服务脚本来帮助我们去帮助keepalived发现MySQL服务宕机后的动作,脚本如下配置。
1 [root@172-16-3-190 we_ops_admin]# cat /opt/shells/keepalived_mysql.sh 2 #!/bin/bash 3 pkill keepalived 4 /sbin/ifdown eth0 && /sbin/ifup eth0 5 #授予可执行权限 6 [root@172-16-3-190 we_ops_admin]# ls -lh /opt/shells/keepalived_mysql.sh 7 -rwxr-xr-x 1 root root 66 Sep 27 19:29 /opt/shells/keepalived_mysql.sh
通过步骤1·2的配置,启动MySQL服务,启动keepalived服务,这里的master1和master2基本就可以实现高可用,保证了master1服务不可用时,master2还能继续提供数据库的支持。
三、搭建测试(Bash脚本模拟高并发)
1、master1的MySQL服务宕机,VIP会从master1上摘除漂移落盘到master2上,且master1上的keepalived服务也会停止。应用连接VIP,master2继续为整个集群提供数据库支持。
1 #停止master1上的MySQL服务 2 [root@172-16-3-190 we_ops_admin]# /etc/init.d/mysql_3309 stop 3 Shutting down MySQL (Percona Server).. SUCCESS! 4 5 #keepalived服务也停止了,且VIP已经被从master1上摘除 6 [root@172-16-3-190 we_ops_admin]# /etc/init.d/keepalived status 7 keepalived dead but subsys locked 8 [root@172-16-3-190 we_ops_admin]# ip add 9 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 10 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 11 inet 127.0.0.1/8 scope host lo 12 inet6 ::1/128 scope host 13 valid_lft forever preferred_lft forever 14 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP qlen 1000 15 link/ether 52:54:00:f4:ec:b2 brd ff:ff:ff:ff:ff:ff 16 inet 172.16.3.190/22 brd 172.16.3.255 scope global eth0 17 inet6 fe80::5054:ff:fef4:ecb2/64 scope link 18 valid_lft forever preferred_lft forever 19 #VIP漂移到master2上 20 [root@172-16-3-189 we_ops_admin]# ip add 21 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 22 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 23 inet 127.0.0.1/8 scope host lo 24 inet6 ::1/128 scope host 25 valid_lft forever preferred_lft forever 26 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP qlen 1000 27 link/ether 52:54:00:2d:96:5c brd ff:ff:ff:ff:ff:ff 28 inet 172.16.3.189/22 brd 172.16.3.255 scope global eth0 29 inet 172.16.3.123/22 scope global secondary eth0 30 inet6 fe80::5054:ff:fe2d:965c/64 scope link 31 valid_lft forever preferred_lft forever
2、master1重新加入集群,VIP不会重新漂移回来,造成二次波动或者脑裂现象
1 #重启master1上的MySQL服务 2 [root@172-16-3-190 we_ops_admin]# /etc/init.d/mysql_3309 start 3 Starting MySQL (Percona Server)............... SUCCESS! 4 #重启master1上的keepalived服务 5 [root@172-16-3-190 we_ops_admin]# /etc/init.d/keepalived start 6 Starting keepalived: [ OK ] 7 #VIP还是在master2上,且master1上并没有VIP,因为master1上设置非抢占模式,即使优先级更高 8 [root@172-16-3-190 we_ops_admin]# ip add #master1 9 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP qlen 1000 10 link/ether 52:54:00:f4:ec:b2 brd ff:ff:ff:ff:ff:ff 11 inet 172.16.3.190/22 brd 172.16.3.255 scope global eth0 12 inet6 fe80::5054:ff:fef4:ecb2/64 scope link 13 valid_lft forever preferred_lft forever 14 15 [root@172-16-3-189 we_ops_admin]# ip add master2 16 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP qlen 1000 17 link/ether 52:54:00:2d:96:5c brd ff:ff:ff:ff:ff:ff 18 inet 172.16.3.189/22 brd 172.16.3.255 scope global eth0 19 inet 172.16.3.123/22 scope global secondary eth0 20 inet6 fe80::5054:ff:fe2d:965c/64 scope link 21 valid_lft forever preferred_lft forever
1 #关闭master2实例 2 [root@172-16-3-189 we_ops_admin]# /etc/init.d/mysql_3309 stop 3 Shutting down MySQL (Percona Server).. SUCCESS! 4 [root@172-16-3-189 we_ops_admin]# /etc/init.d/keepalived status 5 keepalived dead but subsys locked 6 #VIP已经从master2上飘走了 7 [root@172-16-3-189 we_ops_admin]# ip add 8 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP qlen 1000 9 link/ether 52:54:00:2d:96:5c brd ff:ff:ff:ff:ff:ff 10 inet 172.16.3.189/22 brd 172.16.3.255 scope global eth0 11 inet6 fe80::5054:ff:fe2d:965c/64 scope link 12 valid_lft forever preferred_lft forever 13 14 #VIP已经落盘到master1上 15 [root@172-16-3-190 we_ops_admin]# ip add 16 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP qlen 1000 17 link/ether 52:54:00:f4:ec:b2 brd ff:ff:ff:ff:ff:ff 18 inet 172.16.3.190/22 brd 172.16.3.255 scope global eth0 19 inet 172.16.3.123/22 scope global secondary eth0 20 inet6 fe80::5054:ff:fef4:ecb2/64 scope link 21 valid_lft forever preferred_lft forever 22 #server-id可以证明连接到master1实例 23 [root@172-16-3-190 we_ops_admin]# /opt/app/mysql_3309/bin/mysql -urepl -prepl --socket=/opt/app/mysql_3309/tmp/mysql.sock --port=3309 --host=172.16.3.123 24 Warning: Using a password on the command line interface can be insecure. 25 Welcome to the MySQL monitor. Commands end with ; or \g. 26 Your MySQL connection id is 33 27 Server version: 5.6.20-68.0-log Percona Server (GPL), Release 68.0, Revision 656 28 29 Copyright (c) 2009-2014 Percona LLC and/or its affiliates 30 Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. 31 32 Oracle is a registered trademark of Oracle Corporation and/or its 33 affiliates. Other names may be trademarks of their respective 34 owners. 35 36 Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. 37 38 mysql> show global variables like '%server_id%'; 39 +----------------+---------+ 40 | Variable_name | Value | 41 +----------------+---------+ 42 | server_id | 1903309 | 43 | server_id_bits | 32 | 44 +----------------+---------+ 45 2 rows in set (0.01 sec)
1 Warning: Using a password on the command line interface can be insecure. #这个错误这里测试大概会报10条 2 Warning: Using a password on the command line interface can be insecure. 3 ERROR 2003 (HY000): Can't connect to MySQL server on '172.16.3.123' (111) 4 Warning: Using a password on the command line interface can be insecure. 5 ERROR 2003 (HY000): Can't connect to MySQL server on '172.16.3.123' (111)
[root@172-16-3-189 we_ops_admin]#/etc/init.d/mysql_3309 stop #master1上停止实例
Shutting down MySQL (Percona Server).. SUCCESS!
[root@172-16-3-189 we_ops_admin]#/etc/init.d/keepalived status
keepalived dead but subsys locked
[root@172-16-3-189 we_ops_admin]#ip add #vip居然还在master2上
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP qlen 1000
link/ether 52:54:00:2d:96:5c brd ff:ff:ff:ff:ff:ff
inet 172.16.3.189/22 brd 172.16.3.255 scope global eth0
inet 172.16.3.123/22 scope global secondary eth0
inet6 fe80::5054:ff:fe2d:965c/64 scope link
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP qlen 1000
link/ether 52:54:00:f4:ec:b2 brd ff:ff:ff:ff:ff:ff
inet 172.16.3.190/22 brd 172.16.3.255 scope global eth0
inet 172.16.3.123/22 scope global secondary eth0
inet6 fe80::5054:ff:fef4:ecb2/64 scope link
valid_lft forever preferred_lft forever
#master2上有VIP,但应用没有连接到master2上且表的行数不增长
mysql> select max(id) from test_keepalived;
+---------+
| max(id) |
+---------+
| 168 |
+---------+
1 row in set (0.00 sec)
+---------+
| max(id) |
+---------+
| 168 |
+---------+
1 row in set (0.00 sec)
mysql> select max(id) from test_keepalived;
+---------+
| max(id) |
+---------+
| 387 |
+---------+
1 row in set (0.00 sec)
+---------+
| max(id) |
+---------+
| 388 |
+---------+
1 row in set (0.00 sec)
2、master2同步被中断的问题。(VIP在maste2上时,因为master2上已经写入了数据但没来得及同步到master1上;master2实例停止后,VIP也漂移到master1,应用连接master1进行写入,但因为表设计为主键自增长,会出现ID为25已写入master2而没有同步到master1,应用连接master1写入到数据库同步到master2时报主键重复)
1 mysql> show slave status \G; 2 *************************** 1. row *************************** 3 Slave_IO_State: Waiting for master to send event 4 Master_Host: 172.16.3.190 5 Master_User: repl 6 Master_Port: 3309 7 Connect_Retry: 30 8 Master_Log_File: binlog.000036 9 Read_Master_Log_Pos: 103620 10 Relay_Log_File: relay_bin.000038 11 Relay_Log_Pos: 280 12 Relay_Master_Log_File: binlog.000036 13 Slave_IO_Running: Yes 14 Slave_SQL_Running: No 15 Replicate_Do_DB: 16 Replicate_Ignore_DB: 17 Replicate_Do_Table: 18 Replicate_Ignore_Table: 19 Replicate_Wild_Do_Table: 20 Replicate_Wild_Ignore_Table: 21 Last_Errno: 1062 22 Last_Error: Error 'Duplicate entry '25' for key 'PRIMARY'' on query. Default database: 'practice'. Query: 'insert into test_keepalived values(null,1,4)' 23 Skip_Counter: 0 24 Exec_Master_Log_Pos: 120 25 Relay_Log_Space: 104434 26 Until_Condition: None 27 Until_Log_File: 28 Until_Log_Pos: 0 29 Master_SSL_Allowed: No 30 Master_SSL_CA_File: 31 Master_SSL_CA_Path: 32 Master_SSL_Cert: 33 Master_SSL_Cipher: 34 Master_SSL_Key: 35 Seconds_Behind_Master: NULL 36 Master_SSL_Verify_Server_Cert: No 37 Last_IO_Errno: 0 38 Last_IO_Error: 39 Last_SQL_Errno: 1062 40 Last_SQL_Error: Error 'Duplicate entry '25' for key 'PRIMARY'' on query. Default database: 'practice'. Query: 'insert into test_keepalived values(null,1,4)' 41 Replicate_Ignore_Server_Ids: 42 Master_Server_Id: 1903309 43 Master_UUID: 1b589d80-f450-11e7-9150-525400f4ecb2 44 Master_Info_File: /opt/app/mysql_3309/logs/master.info 45 SQL_Delay: 0 46 SQL_Remaining_Delay: NULL 47 Slave_SQL_Running_State: 48 Master_Retry_Count: 86400 49 Master_Bind: 50 Last_IO_Error_Timestamp: 51 Last_SQL_Error_Timestamp: 180929 17:43:30 52 Master_SSL_Crl: 53 Master_SSL_Crlpath: 54 Retrieved_Gtid_Set: 55 Executed_Gtid_Set: 56 Auto_Position: 0 57 1 row in set (0.00 sec)
Keepalived+双主架构总结