前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >MySQL高可用架构之Keepalived+主从架构部署

MySQL高可用架构之Keepalived+主从架构部署

作者头像
星哥玩云
发布2022-08-16 15:57:41
7060
发布2022-08-16 15:57:41
举报
文章被收录于专栏:开源部署

针对目前公司的数据库架构要做统计规划和调整,所以针对不同的业务环境要选择合适的架构就比较重要,同时作为运维人员又要考虑到维护的便捷性(不能做过多架构维护),最终停留在mha+vip/主从+vip/mycat集群/域名+主从这几种架构或组合,这里就分布对不同架构部署做下记录并对比起优劣,针对不同场景做出合适的选择。

本文主要针对主备自动切换,即VIP+主从方式,其中VIP通过keepalived控制。

一、环境描述

服务器IP

操作系统

数据库

硬件配置

备注

192.168.1.23

RHEL6.6_x86_64

MySQL 5.7.21

内存:16GB CPU:4cores 硬盘:100GB

VIP:192.168.1.27 Keepalived:1.4.4

192.168.1.24

RHEL6.6_x86_64

MySQL 5.7.21

内存:16GB CPU:4cores 硬盘:100GB

二、业务场景

1、优点

1)部署简单,管理方便。

2)可实现主备库VIP切换,对业务无感知。

2、缺点

1)存在脑裂风险

2)存在切换主备,状态无检测风险

3)跨VLAN不支持

4)看似部署简单,实则需要考虑的点非常多,如果考虑不完善建议慎用

3、业务场景

总而言之,此架构不推荐使用,使用时需谨慎。

1)双节点数据库服务器,仅仅为了实现主备VIP切换业务无感知

2)多个从节点,通过LVS,做负载均衡(现在不推荐这种架构实现负载均衡)

3)两节点的双向复制可以降低部分风险(来回切换VIP、脑裂等),单无法避免

三、数据库安装和主从部署

四、keepalived安装部署

1.keepalived下载

下载地址:http://www.keepalived.org/software/keepalived-1.4.4.tar.gz

2.keepalived安装

主库: [root@node1 tools]# tar -xzvf keepalived-1.4.4.tar.gz [root@node1 tools]# cd keepalived-1.4.4 [root@node1 keepalived-1.4.4]# ./configure  --prefix=/usr/local/keepalived [root@node1 keepalived-1.4.4]# make -j 4 [root@node1 keepalived-1.4.4]# make install [root@node1 keepalived-1.4.4]# cp -rp ./keepalived/etc/init.d/keepalived /etc/init.d/ [root@node1 keepalived-1.4.4]# chkconfig  --add keepalived [root@node1 keepalived-1.4.4]# mkdir /etc/keepalived [root@node2 keepalived-1.4.4]# cp -rp ./bin/keepalived  /usr/bin/ [root@node1 keepalived-1.4.4]# cp ./keepalived/etc/keepalived/keepalived.conf /etc/keepalived [root@node1 keepalived-1.4.4]# cp -rp /usr/local/keepalived/etc/sysconfig/keepalived  /etc/sysconfig/ [root@node1 keepalived-1.4.4]# service keepalived status keepalived is stopped 备库: 安装keepalived同上 ###备注 *** WARNING - this build will not support IPVS with IPv6. Please install libnl/libnl-3 dev libraries to support IPv6 with IPVS. ###

3.keepalived配置

主库配置:

[root@node1 keepalived-1.4.4]# cp -rp  /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf_bak [root@node1 keepalived-1.4.4]# >/etc/keepalived/keepalived.conf [root@node1 keepalived-1.4.4]# vi /etc/keepalived/keepalived.conf

! Configuration File for keepalived

global_defs {   notification_email {     rangle@163.com   }   notification_email_from rangle@163.com   smtp_server 127.0.0.1   smtp_connect_timeout 30   router_id Node_Master }

vrrp_instance VI_1 {     state BACKUP          ##可配置master和backup模式,为了防止脑裂现象,主备均需要设置为backup模式,master模式会抢占VIP     interface eth1     virtual_router_id 43  ##VRRP组名,两个节点的设置必须一样,以指明各个节点属于同一VRRP组     priority 100     nopreempt            ##配合backup,防止切换后,主库服务恢复正常后,IP漂移过来     advert_int 1          ###组播信息发送间隔,两个节点设置必须一样     authentication {      ###设置验证信息,两个节点必须一致         auth_type PASS         auth_pass 1111     }     virtual_ipaddress {  ###指定虚拟IP, 两个节点设置必须一样         192.168.1.27 label eth1:1     } } virtual_server 192.168.1.27 3306 {     delay_loop 6     lb_algo wrr          #lvs调度算法rr|wrr|lc|wlc|lblc|sh|dh     lb_kind DR          #负载均衡转发规则NAT|DR|RUN     persistence_timeout 50     protocol TCP

    real_server 192.168.1.23 3306 {       weight 1       notify_down /etc/keepalived/mysql.sh       TCP_CHECK {       connect_timeout 10       nb_get_retry 3       delay_before_retry 3       connect_port 3306         }     } }

备库配置:

[root@node2 keepalived-1.4.4]# cp -rp  /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf_bak [root@node2 keepalived-1.4.4]# >/etc/keepalived/keepalived.conf [root@node2 keepalived-1.4.4]# vi /etc/keepalived/keepalived.conf

! Configuration File for keepalived   global_defs {       notification_email {       rangle@163.com       }       notification_email_from rangle@163.com       smtp_server 127.0.0.1       smtp_connect_timeout 30       router_id Node_backup     }  vrrp_instance VI_1 {       state BACKUP       interface eth1       virtual_router_id 43       priority 90                  ##主优先级高于备,主设置100,备可设置90,(1-254之间)       advert_int 1       authentication {       auth_type PASS       auth_pass 1111       }       virtual_ipaddress {       192.168.1.27 label eth1:1       }       }  virtual_server 192.168.1.27 3306 {       delay_loop 2                ##每隔2秒检查真实服务器状态       lb_algo wrr                  ##LVS算法,如果仅做高可用可不设       lb_kind DR                  ##LVS模式,如果仅做高可用可不设       persistence_timeout 60       protocol TCP

      real_server 192.168.1.24 3306{       weight 3       notify_down /etc/keepalived/mysql.sh    ##如果mysql服务宕掉,执行的脚本       echo '3' >  /etc/keepalived/t.log       TCP_CHECK {       connect_timeout 10       nb_get_retry 3          delay_before_retry 3       connect_port 3306       }       } }

mysql.sh配置内容:

#!/bin/bash

run_status=`service keepalived status|grep running|wc -l` pro_status=`ps -ef |grep keepalived|grep -v grep |wc -l`

service keepalived stop

if [ {run_status} != 0 ] || [ {pro_status} != 0 ]then  pkill keepalivedfi

五、keepalived优化

1、keepalived+mysql风险点

1)脑裂风险

现象:

如果由于主备内部网络通信(vrrp组播)故障(网络波动、防火墙等)导致VIP切换,但实际主备服务均正常,此时VIP会存在两个节点上,这个就会导致主备数据同时写入导致数据不一致。

优化方案:

****主库节点增加脑裂检查脚本,通过本机增加网关链路的检查,增加仲裁节点,判断是否本机对外的网络出现问题,此时在配合VRRP组播,如果网络存在问题则直接关闭keepalived和mysql服务,然后发送告警邮件、短信。

2)主从同步漂移风险

现象:

VIP漂移不会检查从库状态,即无论从库数据是否同步,VIP都将完成漂移,数据不一致风险较大。

优化方案:

****在从节点增加主从状态检查脚本,判断主从同步进程是否正常,同时判断是否有数据延迟,如果有延迟或同步错误则不能完成切换,直接关闭keepalived服务

2、优化后配置如下

具体优化配置和脚本如下:

主库配置:

keepalived.conf

! Configuration File for keepalived

global_defs {   notification_email {     abc@163.com   }   notification_email_from root@localhost   smtp_server root@localhost   smtp_connect_timeout 30   router_id node1 }

vrrp_script check_local {     script "/etc/keepalived/check_gateway.sh"     interval 5     fall    4     rise    4     weight 3 }

vrrp_instance VI_1 {     state BACKUP     interface eth1     virtual_router_id 43     priority  100     nopreempt     advert_int 1     authentication {         auth_type PASS         auth_pass 1111     }

    virtual_ipaddress {         192.168.1.27 label eth1:1     }     track_script {         check_local     }

} virtual_server 192.168.1.27 3306 {     delay_loop 6     lb_algo wrr          #lvs调度算法rr|wrr|lc|wlc|lblc|sh|dh     lb_kind DR          #负载均衡转发规则NAT|DR|RUN     persistence_timeout 50     protocol TCP

    real_server 192.168.1.23 3306 {       weight 3       notify_down /etc/keepalived/mysql.sh       TCP_CHECK {       connect_timeout 10       nb_get_retry 3       delay_before_retry 3       connect_port 3306         }     } }

备库配置:

keepalived.conf

! Configuration File for keepalived global_defs {       notification_email {       abc@163.com       }       notification_email_from root@localhost       smtp_server root@localhost       smtp_connect_timeout 30       router_id node2     }

vrrp_script check_local {     script "/etc/keepalived/check_gateway.sh "     interval 5     fall    4     rise    4     weight 3 }

vrrp_instance VI_1 {       state BACKUP       interface eth1       virtual_router_id 43       priority 90       advert_int 1       authentication {       auth_type PASS       auth_pass 1111       }

      virtual_ipaddress {       192.168.1.27 label eth1:1       }     track_script {         check_local     }

      }

 virtual_server 192.168.1.27 3306 {       delay_loop 2       lb_algo wrr       lb_kind DR       persistence_timeout 60       protocol TCP

      real_server 192.168.1.24 3306{       weight 3       notify_down /etc/keepalived/mysql.sh       echo '3' >  /etc/keepalived/t.log       TCP_CHECK {       connect_timeout 10       nb_get_retry 3          delay_before_retry 3       connect_port 3306       }       } }

脑裂检查脚本:

/etc/keepalived/check_gateway.sh

#!/bin/sh VIP=192.168.1.27 GATEWAY=192.168.1.254

/sbin/arping -I eth1 -c 5 -s $VIP $GATEWAY  &>/dev/null

gateway_status=$? keepalived_status=`ps -ef |grep keepalived|grep -v grep |wc -l`

if [ ${gateway_status} != 0 ] && [ ${keepalived_status} = 0 ] then   service keepalived start elif [ ${gateway_status} != 0 ] && [ ${keepalived_status} != 0 ]   service keepalived stop fi

 数据库服务宕机转移脚本

/etc/keepalived/mysql.sh

#!/bin/bash

run_status=`service keepalived status|grep running|wc -l` pro_status=`ps -ef |grep keepalived|grep -v grep |wc -l`

service keepalived stop

if [ ${run_status} != 0 ] || [ ${pro_status} != 0 ] then   pkill keepalived fi

 从库状态检查脚本(在备库做crontab定时任务*/1 * * * * sh /etc/keepalived/check_slave.sh >/dev/null 2>&1)

#!/bin/bash

VIP="192.168.1.27"

vip_status=`ip add |egrep "${VIP}"|wc -l` keepalived_status=`ps -ef |grep keepalived|grep -v grep|wc -l`

slave_status=`mysql -uroot -e "show slave status \G"|egrep  "Slave.*Running|Seconds_Behind_Master|Last_Errno"|grep -v "Slave_SQL_Running_State"|awk -F ':' '{printf("%s",$NF)}'` io_status=`echo "${slave_status}"|awk '{print $1}'|grep "Yes"|wc -l` sql_status=`echo "${slave_status}"|awk '{print $2}'|grep "Yes"|wc -l` sync_status=`echo "${slave_status}"|awk '{print $3}'` sync_time=`echo "${slave_status}"|awk '{print $4}'`

if [ ${io_status} -ne 1 ] || [ ${sql_status} -ne 1 ] || [ ${sync_status} -ne 0 ] then     if [ ${vip_status} -eq 0 ] && [ ${keepalived_status} -ne 0 ];then         service keepalived stop     fi elif [ ${io_status} -eq 1 ] && [ ${sql_status} -eq 1 ] && [ ${sync_status} -eq 0 ] then     if [ ${sync_time} -ge 120 ];then         if [ ${vip_status} -eq 0 ] && [ ${keepalived_status} -ne 0 ];then           service keepalived stop         fi     elif [ ${sync_time} -eq 0 ];then         if [ ${vip_status} -eq 0 ] && [ ${keepalived_status} -eq 0 ];then           service keepalived start         fi     fi fi

3、防火墙端口开放

VRRP是用IP多播的方式(默认多播地址224.0.0.18)实现高可用对之间通信的,如果开启防火墙记得开放这个vrrp协议对应的策略

iptables -A INPUT -p vrrp -d 224.0.0.18  -j ACCEPT

iptables -A INPUT -p vrrp  -j ACCEPT

六、keepalived故障转移测试

1、主数据库实例宕机

当前主数据库主机状态:

[root@node1 keepalived]# service mysqld status MySQL (Percona Server) running (22983)                    [  OK  ] [root@node1 keepalived]# service keepalived status keepalived (pid  14092) is running... [root@node1 keepalived]# ip add |grep inet     inet 127.0.0.1/8 scope host lo     inet6 ::1/128 scope host     inet 192.168.1.23/24 brd 192.168.1.255 scope global eth1     inet 192.168.1.27/32 scope global eth1:1     inet6 fe80::250:56ff:fe8d:496/64 scope link

当前从数据库主机状态:

[root@node2 keepalived]# service mysqld status MySQL (Percona Server) running (14146)                    [  OK  ] [root@node2 keepalived]# service keepalived status keepalived (pid  12207) is running... [root@node2 keepalived]# ip add |grep inet     inet 127.0.0.1/8 scope host lo     inet6 ::1/128 scope host     inet 192.168.1.24/24 brd 192.168.1.255 scope global eth1     inet6 fe80::250:56ff:fe8d:3d1e/64 scope link

主库执行关闭mysql实例命令,数据库关闭后发现VIP已漂移

[root@node1 keepalived]# service mysqld stop Shutting down MySQL (Percona Server)...............        [  OK  ] [root@node1 keepalived]# service mysqld status    MySQL (Percona Server) is not running                      [FAILED] [root@node1 keepalived]# service keepalived status keepalived is stopped [root@node1 keepalived]# ip add |grep inet     inet 127.0.0.1/8 scope host lo     inet6 ::1/128 scope host     inet 192.168.1.23/24 brd 192.168.1.255 scope global eth1     inet6 fe80::250:56ff:fe8d:496/64 scope link

此时检查从库相关状态

[root@node2 keepalived]# service mysqld status MySQL (Percona Server) running (14146)                    [  OK  ] [root@node2 keepalived]# service keepalived status keepalived (pid  12207) is running... [root@node2 keepalived]# ip add |grep inet     inet 127.0.0.1/8 scope host lo     inet6 ::1/128 scope host     inet 192.168.1.24/24 brd 192.168.1.255 scope global eth1     inet 192.168.1.27/32 scope global eth1:1     inet6 fe80::250:56ff:fe8d:3d1e/64 scope link [root@node2 keepalived]# mysql -uroot -e "show slave status \G"|egrep  "Slave.*Running|Seconds_Behind_Master|Last_Errno"|grep -v "Slave_SQL_Running_State"             Slave_IO_Running: Connecting             Slave_SQL_Running: Yes                   Last_Errno: 0         Seconds_Behind_Master: NULL

2、主数据库服务器宕机

同上,实现检查主备状态后,在主库执行shutdown -h now的命令后,检查从库状态

3、主keepalived服务停止

同上

4、主服务器网卡故障

此时keepalived服务正常,mysqld服务正常,VRRP通信无法完成的情况下,主库会执行关闭keepalived服务并完成切换

5、模拟VRRP通信故障(脑裂)

在备库开启防火墙,并不开通vrrp允许,此时VIP会存在于两边的服务器上

[root@node2 keepalived]# service iptables start iptables: Applying firewall rules:                        [  OK  ]

主库服务器状态检查

[root@node1 keepalived]# ip add|grep inet     inet 127.0.0.1/8 scope host lo     inet6 ::1/128 scope host     inet 192.168.1.23/24 brd 192.168.1.255 scope global eth1     inet 192.168.1.27/32 scope global eth1:1     inet6 fe80::250:56ff:fe8d:496/64 scope link

备库服务器状态检查:

[root@node2 keepalived]# ip add|grep inet     inet 127.0.0.1/8 scope host lo     inet6 ::1/128 scope host     inet 192.168.1.24/24 brd 192.168.1.255 scope global eth1     inet 192.168.1.27/32 scope global eth1:1     inet6 fe80::250:56ff:fe8d:3d1e/64 scope link

如果关掉从库的防火墙,那么VIP会选择高优先级的服务器上(priority)

此现象如何避免?我们可以在check_gateway.sh脚本里添加上组播状态检查的命令,我这里就不做了,仅做了网关检查的脑裂避免(网络问题导致网关暂时不可达而产生的脑裂)

6、模拟主切换到备后,主服务启动后是否会回切(优先级抢占测试)

(本次测试接着测试1进行,首先恢复主库的实例进程和keepalived进程)

主库执行命令:

[root@node1 keepalived]# service mysqld start Starting MySQL (Percona Server)......                      [  OK  ] [root@node1 keepalived]# service keepalived start Starting keepalived:                                      [  OK  ] [root@node1 keepalived]# service mysqld status    MySQL (Percona Server) running (1141)                      [  OK  ] [root@node1 keepalived]# service keepalived status keepalived (pid  1230) is running... [root@node1 keepalived]# ip add |grep inet            inet 127.0.0.1/8 scope host lo     inet6 ::1/128 scope host     inet 192.168.1.23/24 brd 192.168.1.255 scope global eth1     inet6 fe80::250:56ff:fe8d:496/64 scope link

备库执行命令:

[root@node2 keepalived]# service keepalived status keepalived (pid  12207) is running... [root@node2 keepalived]# service mysqld status    MySQL (Percona Server) running (14146)                    [  OK  ] [root@node2 keepalived]# ip add |grep inet     inet 127.0.0.1/8 scope host lo     inet6 ::1/128 scope host     inet 192.168.1.24/24 brd 192.168.1.255 scope global eth1     inet 192.168.1.27/32 scope global eth1:1     inet6 fe80::250:56ff:fe8d:3d1e/64 scope link

此时发现即使主库的服务恢复正常也不会主动抢占从库的VIP,因为两边设置的都是backup模式,但此时如果备库的服务、实例等原因导致需要切换VIP而此时主库由具备切换的条件,那么依旧可以完成切换(此时切记一旦完成主备切换后,需要手动恢复新的主备状态保障数据一致性后才能进行二次切换)

7、模拟备库数据延迟120S以上切换

从库先停止io_thread

(root:localhost:Sun May 20 10:24:45 2018)[(none)]>stop slave io_thread; Query OK, 0 rows affected (0.00 sec)

主库做大量的数据变更操作

然后监控从库keepalived状态即可

8、模拟备库数据同步错误切换

从库状态:

[root@node2 keepalived]# mysql -uroot -e "show slave status \G"|egrep  "Slave.*Running|Seconds_Behind_Master|Last_Errno"|grep -v "Slave_SQL_Running_State"             Slave_IO_Running: Yes             Slave_SQL_Running: No                   Last_Errno: 1050         Seconds_Behind_Master: NULL [root@node2 keepalived]# service keepalived status keepalived is stopped [root@node2 keepalived]# ip add |grep inet     inet 127.0.0.1/8 scope host lo     inet6 ::1/128 scope host     inet 192.168.1.24/24 brd 192.168.1.255 scope global eth1     inet6 fe80::250:56ff:fe8d:3d1e/64 scope link

主库状态:

[root@node1 keepalived]# service mysqld status MySQL (Percona Server) running (1141)                      [  OK  ] [root@node1 keepalived]# service keepalived  status      keepalived (pid  1230) is running... [root@node1 keepalived]# ip add |grep inet     inet 127.0.0.1/8 scope host lo     inet6 ::1/128 scope host     inet 192.168.1.23/24 brd 192.168.1.255 scope global eth1     inet 192.168.1.27/32 scope global eth1:1     inet6 fe80::250:56ff:fe8d:496/64 scope link

此时如果主库相关服务宕机,是无法完成VIP切换,通过关闭mysqld服务测试

主库已剔除VIP [root@node1 keepalived]# service mysqld stop Shutting down MySQL (Percona Server)..............        [  OK  ] [root@node1 keepalived]# ip addr |grep inet     inet 127.0.0.1/8 scope host lo     inet6 ::1/128 scope host     inet 192.168.1.23/24 brd 192.168.1.255 scope global eth1     inet6 fe80::250:56ff:fe8d:496/64 scope link 从库没有附加上VIP [root@node2 keepalived]# ip add |grep inet     inet 127.0.0.1/8 scope host lo     inet6 ::1/128 scope host     inet 192.168.1.24/24 brd 192.168.1.255 scope global eth1     inet6 fe80::250:56ff:fe8d:3d1e/64 scope link

附录: 导致裂脑发生的原因

高可用服务器之间心跳线链路故障,导致无法正常通信。 心跳线坏了(包括断了,老化)。 网卡即相关驱动坏了,IP配置及冲突问题(网卡直连) 心跳线间连接的设备故障(网卡及交换机) 仲裁的机器出问题(采用仲裁的方案)。 高可用服务器对上开启了iptables防火墙阻挡了心跳信息传输。 高可用服务器对上心跳网卡地址等信息配置不正确,导致发送心跳失败。 其他服务器配置不当等原因,如心跳方式不同,心跳广播冲突,软件BUG

在实际生产环境中,我们可以从以下几个方面来防止裂脑问题的发生。 (1)同时使用串行电缆和以太网电缆连接,同时用两条心跳线路,这样一条线路坏了,另一个还是好的,依然能传送心跳消息。 (2)当检测到裂脑时强行关闭一个心跳节点(这个功能需特殊设备支持,如fence,stonith)。相当于备节点接收不到心跳信息,发送关机命令通过单独的线路关闭主节点电源。 (3)做好对裂脑的监控报警(如邮件及手机短信等),在问题发生时人为的第一时间介入仲裁,降低损失。例如:百度的监控报警短信就有上行和下行的区别。报警信息到管理员手机上,就可以通过回复对应的字符串等操作就可以返回给服务器,让服务器根据指令自动执行处理相关。

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
云数据库 MySQL
腾讯云数据库 MySQL(TencentDB for MySQL)为用户提供安全可靠,性能卓越、易于维护的企业级云数据库服务。其具备6大企业级特性,包括企业级定制内核、企业级高可用、企业级高可靠、企业级安全、企业级扩展以及企业级智能运维。通过使用腾讯云数据库 MySQL,可实现分钟级别的数据库部署、弹性扩展以及全自动化的运维管理,不仅经济实惠,而且稳定可靠,易于运维。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档