LVS+Heartbeat实现高可用web集群方案操作记录

之前分别介绍了LVS基础知识Heartbeat基础知识, 今天这里简单说下LVS+Heartbeat实现高可用web集群方案的操作说明.

Heartbeat 项目是 Linux-HA 工程的一个组成部分,它实现了一个高可用集群系统。心跳服务和集群通信是高可用集群的两个关键组件,在 Heartbeat 项目里,由 heartbeat 模块实现了这两个功能。

Heartbeat的高可用集群采用的通信方式是udp协议和串口通信,而且heartbeat插件技术实现了集群间的串口、多播、广播和组播通信。它实现了HA 功能中的核心功能——心跳,将Heartbeat软件同时安装在两台服务器上,用于监视系统的状态,协调主从服务器的工作,维护系统的可用性。它能侦测服务器应用级系统软件、硬件发生的故障,及时地进行错误隔绝、恢复;通过系统监控、服务监控、IP自动迁移等技术实现在整个应用中无单点故障,简单、经济地确保重要的服务持续高可用性。   Heartbeat采用虚拟IP地址映射技术实现主从服务器的切换对客户端透明的功能。但是单一的heartbeat是无法提供健壮的服务的,所以这里结合使用lvs进行负载均衡。

LVS是Linux Virtual Server的简写, 意即Linux虚拟服务器,是一个虚拟的服务器集群系统。说到lvs就得提到ipvs (ipvsadm命令),ipvs 是 lvs集群系统的核心软件,它的主要作用是安装在 Load Balancer 上,把发往 Virtual IP 的请求转发到 Real Server 上。

ldirectord是配合lvs作为一种健康检测机制,要不负载均衡器在节点挂掉后依然没有检测的功能。

案例架构草图如下:

1) 基本环境准备 (centos6.9系统)

172.16.60.206(eth0)    HA主节点(ha-master)       heartbeat, ipvsadm, ldirectord
172.16.60.207(eth0)    HA备节点(ha-slave)        heartbeat, ipvsadm, ldirectord
172.16.60.111          VIP地址
172.16.60.204(eth0)    后端节点1(rs-204)         nginx, realserver
172.16.60.205(eth0)    后端节点2(rs-205)         nginx, realserver

1) 关闭防火墙和selinux (四台节点机都操作)
[root@ha-master ~]# /etc/init.d/iptables stop
[root@ha-master ~]# setenforce 0
[root@ha-master ~]# vim /etc/sysconfig/selinux 
SELINUX=disabled

2) 设置主机名和绑定hosts (两台HA节点机器都操作)
主节点操作
[root@ha-master ~]# hostname ha-master
[root@ha-master ~]# vim /etc/sysconfig/network
HOSTNAME=ha-master
[root@ha-master ~]# vim /etc/hosts
172.16.60.206 ha-master
172.16.60.207 ha-slave

备节点操作
[root@ha-slave ~]# hostname ha-slave
[root@ha-slave ~]# vim /etc/sysconfig/network
HOSTNAME=ha-slave
[root@ha-slave ~]# vim /etc/hosts
172.16.60.206 ha-master
172.16.60.207 ha-slave

3) 设置ip路由转发功能 (两台HA节点机器都操作)
[root@ha-master ~]# echo 1 > /proc/sys/net/ipv4/ip_forward
[root@ha-master ~]# vim /etc/sysctl.conf 
net.ipv4.ip_forward = 1
[root@ha-master ~]# sysctl -p

2) 安装和配置 Heartbeat 和LVS (两台HA节点机都操作)

1) 首先安装heartbeat (HA主备两个节点都要同样操作)
分别下载epel-release-latest-6.noarch.rpm 和 ldirectord-3.9.5-3.1.x86_64.rpm
下载地址: https://pan.baidu.com/s/1IvCDEFLCBYddalV89YvonQ
提取密码: gz53
 
[root@ha-master ~]# ll epel-release-latest-6.noarch.rpm
-rw-rw-r-- 1 root root 14540 Nov  5  2012 epel-release-latest-6.noarch.rpm
[root@ha-master ~]# ll ldirectord-3.9.5-3.1.x86_64.rpm
-rw-rw-r-- 1 root root 90140 Dec 24 15:54 ldirectord-3.9.5-3.1.x86_64.rpm
 
[root@ha-master ~]# yum install -y epel-release
[root@ha-master ~]# rpm -ivh epel-release-latest-6.noarch.rpm --force
[root@ha-master ~]# yum install -y heartbeat* libnet
[root@ha-master ~]# yum install -y ldirectord-3.9.5-3.1.x86_64.rpm      #因为依赖比较多, 所以直接采用yum方式安装
 
2) 配置heartbeat (HA主备两个节点都要操作)
安装完heartbeat后系统会生成一个/etc/ha.d/目录,此目录用于存放heartbeat的有关配置文件。
Heartbeat自带配置文件的注释信息较多,在此手工编写有关配置文件,heartbeat常用配置文件有四个,分别是:
ha.cf:heartbeat主配置文件
ldirectord.cf:资源管理文件
haresources:本地资源文件
authkeys:认证文件
 
[root@ha-master ~]# cd /usr/share/doc/heartbeat-3.0.4/
[root@ha-master heartbeat-3.0.4]# cp authkeys ha.cf haresources /etc/ha.d/
 
[root@ha-master heartbeat-3.0.4]# cd /usr/share/doc/ldirectord-3.9.5
[root@ha-master ldirectord-3.9.5]# cp ldirectord.cf /etc/ha.d/
[root@ha-master ldirectord-3.9.5]# cd /etc/ha.d/
[root@ha-master ha.d]# ll
total 56
-rw-r--r-- 1 root root   645 Dec 24 21:37 authkeys
-rw-r--r-- 1 root root 10502 Dec 24 21:37 ha.cf
-rwxr-xr-x 1 root root   745 Dec  3  2013 harc
-rw-r--r-- 1 root root  5905 Dec 24 21:37 haresources
-rw-r--r-- 1 root root  8301 Dec 24 21:38 ldirectord.cf
drwxr-xr-x 2 root root  4096 Dec 24 21:28 rc.d
-rw-r--r-- 1 root root   692 Dec  3  2013 README.config
drwxr-xr-x 2 root root  4096 Dec 24 21:28 resource.d
-rw-r--r-- 1 root root  2082 Mar 24  2017 shellfuncs
 
3) 配置heartbeat的主配置文件ha.cf  (HA主备节点配置一样)
[root@ha-master ha.d]# pwd
/etc/ha.d
[root@ha-master ha.d]# cp ha.cf ha.cf.bak
[root@ha-master ha.d]# > ha.cf
[root@ha-master ha.d]# vim ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log         #日志存放位置
#crm yes                            #是否开启集群资源管理功能
logfacility        local0         #记录日志等级
keepalive 2                         #心跳的时间间隔,默认时间单位为秒
deadtime 5                         #超出该时间间隔未收到对方节点的心跳,则认为对方已经死亡。
warntime 3                         #超出该时间间隔未收到对方节点的心跳,则发出警告并记录到日志中,但此时不会切换
initdead 10          #在某些系统上,系统启动或重启之后需要经过一段时间网络才能正常工作,该选项用于解决这种情况产生的时间间隔。取值至少为deadtime的两倍。
udpport  694        #设置广播通信使用的端口,694为默认使用的端口号。
bcast        eth0               # Linux指定心跳使用以太网广播方式,并在eth0上进行广播。"#"后的要完全删除,要不然要出错。
ucast eth0 172.16.60.207       #采用网卡eth0的UDP多播来组织心跳,后面跟的IP地址应该为双机中对方的IP地址!!!!!
auto_failback on            #在该选项设为on的情况下,一旦主节点恢复运行,则自动获取资源并取代备用节点。off主节点恢复后变为备用节点,备用为主节点!!!!!
#stonith_host *     baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3  rps10 /dev/ttyS1 kathy 0
#stonith_host kathy rps10 /dev/ttyS1 ken3 0
#watchdog /dev/watchdog          
node   ha-master           #主机节点名,可通过"uname -n"查看,默认为主节点!!!!!
node   ha-slave              #备用机节点名,默认为次节点,要注意顺序!!!!
#ping 172.16.60.207         # 选择ping节点,选择固定路由作为节点。ping节点仅用来测试网络连接。一般选择这行ping测试就行, 下面一行注释掉.
ping_group group1 172.16.60.204 172.16.60.205     #这个地址并不是双机中的两个节点地址,而是仅仅用来测试网络的连通性. 当这两个IP 都不能ping通时,对方即开始接管资源。
respawn root /usr/lib64/heartbeat/ipfail                    #选配项。其中rootr表示启动ipfail进程的身份。要确保/usr/lib64/heartbeat/ipfail这个路径正确(可以用find命令搜索出来), 否则heartbeat启动失败
apiauth ipfail gid=root uid=root
 
============
温馨提示:
HA备节点的ha.cf文件只需要将上面配置中的ucast一行内容改为"ucast eth0 172.16.60.206" 即可, 其他配置内容和上面HA主节点的ha.cf完全一样!
 
4) 配置heartbeat的认证文件authkeys (HA主备节点配置必须一致)
[root@ha-master ~]# cd /etc/ha.d/
[root@ha-master ha.d]# cp authkeys authkeys.bak
[root@ha-master ha.d]# >authkeys
auth 3                                                      #auth后面指定的数字,下一行必须作为关键字再次出现! 一共有"1", "2","3" 三行, 这里选择"3"关键字, 选择"1"和"2"关键字也行, HA主备节点必须一致!
#1 crc
#2 sha1 HI!
3 md5 Hello!
 
必须将该文件授权为600
[root@ha-master ha.d]# chmod 600 authkeys
[root@ha-master ha.d]# ll authkeys
-rw------- 1 root root 20 Dec 25 00:16 authkeys
 
5) 修改heartbeat的资源文件haresources (HA主备节点配置必须完全一致)
[root@ha-slave ha.d]# cp haresources haresources.bak
[root@ha-slave ha.d]# >haresources
[root@ha-slave ha.d]# vim haresources          # 在文件结尾添加下面一行内容. 由于该文件默认全是注释,可以先清空该文件, 然后添加下面这一行内容
ha-master IPaddr::172.16.60.111 ipvsadm ldirectord       

说明:
上面设置ha-maser为主节点, 集群VIP为172.16.60.111, ipvsadm ldirectord为所要监视的应用
 
6) 配置heartbeat的监控文件ldirectord.cf (HA主备节点配置必须完全一致)
ldirectord,用于监控在lvs集群的真实服务。ldirectord是和heartbeat相结合的一个服务,可以作为heartbeat的一个启动服务。
所以之前要安装好heartbeat,而且配置好,再将ldrectord的配置文件复制到/etc/ha.d下,因为默认没有放到里面。

[root@ha-master ha.d]# cp ldirectord.cf ldirectord.cf.bak
[root@ha-master ha.d]# >ldirectord.cf
[root@ha-master ha.d]# vim ldirectord.cf
#GlobalDirectives
checktimeout=3                #判定realserver出错时间
checkinterval=1                #指定ldirectord在两次检查之间的间隔时间  
autoreload=yes                 #是否自动重载配置文件
logfile="/var/log/ldirectord.log"     #设定Ldirectord日志输出文件路径
#logfile="local0"
#emailalert="root@30920.cn"  
#emailalertfreq=3600
#emailalertstatus=all
quiescent=no                 #如果一个realserver节点在checktimeout设置的时间周期内没响应,将会被踢除,中断现有客户端的连接。 设置为yes, 则出问题的realserver节点不会被踢出, 只是新的连接不能到达。
 
# Samplefor an http virtual service
virtual=172.16.60.111:80             #指定虚拟IP,注意在virtual这行后面的行必须缩进4个空格或以一个tab字符进行标记!! 最好是一个tab字符进行标记!!!
     real=172.16.60.204:80 gate    #gate为DR模式,ipip表示TUNL模式,masq表示NAT模式
     real=172.16.60.205:80 gate
 
     fallback=127.0.0.1:80 gate   #当所有RS机器不能访问的时候WEB重写向地址; 即表示realserver全部失败,vip指向本机80端口
     service=http                         #指定服务类型,这里对HTTP进行负载均衡
     request="lvs_testpage.html"         #请求监控地址, 即监听测试页面名称,这个页面放入后端真实realserver节点的web服务的根目录下!!!
     receive="Test HA Page"          #指定请求和应答字符串,也就是lvs_testpage.html的内容
     #virtualhost=some.domain.com.au  #虚拟服务器的名称可任意指定
 
     scheduler=wlc                 #指定调度算法,这里的算法一定要和lvs脚本(/etc/sysconfig/ipvsadm)的算法一样
 
     persistent=600             #持久链接:表示600s之内同意ip将访问同一台realserver
     #netmask=255.255.255.255
     protocol=tcp
     checktype=connect
     checkport=80
 
温馨提示:
配置如上,通过virtual来定义vip,接下来是定义real service的节点,fallback是当所有real挂掉后,访问请求到本机的80端口上去,一般这个页面显示服务器正在维护等界面。
service表示;调度的服务,scheduler是调度算法,protocol是定义协议,checktype是检查类型为协商,checkport就是检查的端口,也就是健康检查。

上面在/etc/ha.d/ldirectord.cf文件里定义了一个80端口的代理转发, 如果还有其他端口, 比如3306,
 则只需要在下面再添加一个"virtual=172.16.60.111:3306 ...."类似上面的配置即可! 配置案例在备份的ldirectord.cf.bak文件里有.
 
7) 安装lvs (两台HA节点机都操作)
安装lvs依赖
[root@ha-master ~]# yum install -y libnl* popt*
 
查看是否加载lvs模块
[root@ha-master ~]# modprobe -l |grep ipvs
kernel/net/netfilter/ipvs/ip_vs.ko
kernel/net/netfilter/ipvs/ip_vs_rr.ko
kernel/net/netfilter/ipvs/ip_vs_wrr.ko
kernel/net/netfilter/ipvs/ip_vs_lc.ko
kernel/net/netfilter/ipvs/ip_vs_wlc.ko
kernel/net/netfilter/ipvs/ip_vs_lblc.ko
kernel/net/netfilter/ipvs/ip_vs_lblcr.ko
kernel/net/netfilter/ipvs/ip_vs_dh.ko
kernel/net/netfilter/ipvs/ip_vs_sh.ko
kernel/net/netfilter/ipvs/ip_vs_sed.ko
kernel/net/netfilter/ipvs/ip_vs_nq.ko
kernel/net/netfilter/ipvs/ip_vs_ftp.ko
kernel/net/netfilter/ipvs/ip_vs_pe_sip.ko
 
下载并安装LVS
[root@ha-master ~]# cd /usr/local/src/
[root@ha-master src]# unlink /usr/src/linux
[root@ha-master src]# ln -s /usr/src/kernels/2.6.32-431.5.1.el6.x86_64/ /usr/src/linux
[root@ha-master src]# wget http://www.linuxvirtualserver.org/software/kernel-2.6/ipvsadm-1.26.tar.gz
[root@ha-master src]# tar -zvxf ipvsadm-1.26.tar.gz
[root@ha-master src]# cd ipvsadm-1.26
[root@ha-master ipvsadm-1.26]# make && make install
 
LVS安装完成,查看当前LVS集群
[root@ha-master ipvsadm-1.26]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
 
8) 在HA主备节点上添加lvs启动脚本
[root@ha-master ha.d]# vim /etc/sysconfig/ipvsadm    
-A -t 172.16.60.111:80 -s wlc -p 600
-a -t 172.16.60.111:80 -r 172.16.60.204:80 -g
-a -t 172.16.60.111:80 -r 192.168.0.205:80 -g
 
说明: -p 600的意思是会话保持时间为600秒,应该和ldirectord.cf文件配置一致
 
9) 在后端的realserver节点上编写LVS启动脚本 (两个后端realserver操作完全一致)
[root@rs-204 ~]# vim /etc/init.d/realserver
#!/bin/sh
VIP=172.16.60.111
. /etc/rc.d/init.d/functions
     
case "$1" in
# 禁用本地的ARP请求、绑定本地回环地址
start)
    /sbin/ifconfig lo down
    /sbin/ifconfig lo up
    echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore
    echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce
    echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore
    echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce
    /sbin/sysctl -p >/dev/null 2>&1
    /sbin/ifconfig lo:0 $VIP netmask 255.255.255.255 up    
    /sbin/route add -host $VIP dev lo:0
    echo "LVS-DR real server starts successfully.\n"
    ;;
stop)
    /sbin/ifconfig lo:0 down
    /sbin/route del $VIP >/dev/null 2>&1
    echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore
    echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce
    echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore
    echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce
echo "LVS-DR real server stopped.\n"
    ;;
status)
    isLoOn=`/sbin/ifconfig lo:0 | grep "$VIP"`
    isRoOn=`/bin/netstat -rn | grep "$VIP"`
    if [ "$isLoON" == "" -a "$isRoOn" == "" ]; then
        echo "LVS-DR real server has run yet."
    else
        echo "LVS-DR real server is running."
    fi
    exit 3
    ;;
*)
    echo "Usage: $0 {start|stop|status}"
    exit 1
esac
exit 0
 
 
启动两台realserver节点的realserver脚本
[root@rs-204 ~]# chmod 755 /etc/init.d/realserver
[root@rs-204 ~]# ll /etc/init.d/realserver
-rwxr-xr-x 1 root root 1278 Dec 24 13:40 /etc/init.d/realserver
 
[root@rs-204 ~]# /etc/init.d/realserver start
LVS-DR real server starts successfully.\n
 
设置开机启动
[root@rs-204 ~]# echo "/etc/init.d/realserver" >> /etc/rc.local
 
查看, 发现两台realserver节点上的lo:0上已经配置了vip地址
[root@rs-204 ~]# ifconfig
...........
lo:0      Link encap:Local Loopback 
          inet addr:172.16.60.111  Mask:255.255.255.255
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
 
 
接着部署两台realserver的web测试环境 (两个后端realserver操作完全一致)
采用yum方式安装nginx (先安装nginx的yum源)
[root@rs-204 ~]# rpm -ivh http://nginx.org/packages/centos/6/noarch/RPMS/nginx-release-centos-6-0.el6.ngx.noarch.rpm
[root@rs-204 ~]# yum install nginx
 
realserver01的nginx配置
[root@rs-204 ~]# cd /etc/nginx/conf.d/
[root@rs-204 conf.d]# cat default.conf
[root@rs-204 conf.d]# >/usr/share/nginx/html/index.html
[root@rs-204 conf.d]# vim /usr/share/nginx/html/index.html
this is test page of realserver01:172.16.60.204
 
[root@rs-204 conf.d]# vim /usr/share/nginx/html/lvs_testpage.html
Test HA Page
 
[root@rs-204 conf.d]# /etc/init.d/nginx start
Starting nginx:                                            [  OK  ]
[root@rs-204 conf.d]# lsof -i:80
COMMAND   PID  USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
nginx   31944  root    6u  IPv4  91208      0t0  TCP *:http (LISTEN)
nginx   31945 nginx    6u  IPv4  91208      0t0  TCP *:http (LISTEN)
 
realserver02的nginx配置
[root@rs-205 src]# cd /etc/nginx/conf.d/
[root@rs-205 conf.d]# cat default.conf
[root@rs-205 conf.d]# >/usr/share/nginx/html/index.html
[root@rs-205 conf.d]# vim /usr/share/nginx/html/index.html
this is test page of realserver02:172.16.60.205
 
[root@rs-205 conf.d]# /etc/init.d/nginx start
Starting nginx:                                            [  OK  ]
[root@rs-205 conf.d]# lsof -i:80
COMMAND   PID  USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
nginx   20839  root    6u  IPv4 289527645      0t0  TCP *:http (LISTEN)
nginx   20840 nginx    6u  IPv4 289527645      0t0  TCP *:http (LISTEN)
 
最后分别访问realserver01和realserver02节点的nginx,:
访问http://172.16.60.204/, 出现信息"this is test page of realserver01:172.16.60.204"
访问http://172.16.60.205/, 出现信息"this is test page of realserver02:172.16.60.205"
 
10) 接着启动两台HA主备节点的lvs和heartbeat服务
 
先启动HA主节点的heartbeat,
[root@ha-master ~]# service heartbeat start
Starting High-Availability services: ERROR: Unable to find cidr_netmask.
INFO: [findif] failed
INFO:  Resource is stopped
Done.
 
[root@ha-master ~]# ps -ef|grep heartbeat
root      6947     1  0 01:03 ?        00:00:00 heartbeat: master control process
root      6952  6947  0 01:03 ?        00:00:00 heartbeat: FIFO reader       
root      6953  6947  0 01:03 ?        00:00:00 heartbeat: write: bcast eth0 
root      6954  6947  0 01:03 ?        00:00:00 heartbeat: read: bcast eth0  
root      6955  6947  0 01:03 ?        00:00:00 heartbeat: write: ucast eth0 
root      6956  6947  0 01:03 ?        00:00:00 heartbeat: read: ucast eth0  
root      6957  6947  0 01:03 ?        00:00:00 heartbeat: write: ping_group group1
root      6958  6947  0 01:03 ?        00:00:00 heartbeat: read: ping_group group1
root      6961  2247  0 01:03 pts/0    00:00:00 grep heartbeat

heartbeat服务端口默认是694. 且heartbeat服务启动时会去启动ldirectord
[root@ha-master ~]# lsof -i:694
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
heartbeat 8298 root    7u  IPv4  18888      0t0  UDP *:ha-cluster 
heartbeat 8299 root    7u  IPv4  18888      0t0  UDP *:ha-cluster 
heartbeat 8300 root    7u  IPv4  18894      0t0  UDP *:ha-cluster 
heartbeat 8301 root    7u  IPv4  18894      0t0  UDP *:ha-cluster
 
查看HA主节点的heartbeat的log日志
[root@ha-master ~]# tail -f /var/log/ha-log
.........
.........
ResourceManager(default)[4001]: 2018/12/25_01:13:29 info: Acquiring resource group: ha-master IPaddr::172.16.60.111 ipvsadm ldirectord
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.60.111)[4029]:  2018/12/25_01:13:29 INFO:  Resource is stopped
ResourceManager(default)[4001]: 2018/12/25_01:13:29 info: Running /etc/ha.d/resource.d/IPaddr 172.16.60.111 start
.........
IPaddr(IPaddr_172.16.60.111)[4125]:     2018/12/25_01:13:29 INFO: Adding inet address 172.16.60.111/24 with broadcast address 172.16.60.255 to device eth0
..........
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.60.111)[4111]:  2018/12/25_01:13:29 INFO:  Success
ResourceManager(default)[4001]: 2018/12/25_01:13:29 info: Running /etc/init.d/ipvsadm  start
ResourceManager(default)[4001]: 2018/12/25_01:13:29 info: Running /etc/init.d/ldirectord  start
 
查看HA主节点, 发现此时vip资源在主节点上
[root@ha-master ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:ac:50:9b brd ff:ff:ff:ff:ff:ff
    inet 172.16.60.206/24 brd 172.16.60.255 scope global eth0
    inet 172.16.60.111/24 brd 172.16.60.255 scope global secondary eth0
    inet6 fe80::250:56ff:feac:509b/64 scope link
       valid_lft forever preferred_lft forever
 
在HA主备节点上可以查看到lvs负载情况
[root@ha-master ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.16.60.111:80 wlc persistent 600
  -> 172.16.60.204:80             Route   1      0          0        
  -> 192.168.0.205:80             Route   1      0          0 
 
再启动HA备节点的heartbeat
[root@ha-slave ha.d]# service heartbeat start
Starting High-Availability services: INFO:  Resource is stopped
Done.
 
[root@ha-slave ha.d]# ps -ef|grep heartbeat 
root      4257     1  0 01:04 ?        00:00:00 heartbeat: master control process
root      4260  4257  0 01:04 ?        00:00:00 heartbeat: FIFO reader       
root      4261  4257  0 01:04 ?        00:00:00 heartbeat: write: bcast eth0 
root      4262  4257  0 01:04 ?        00:00:00 heartbeat: read: bcast eth0  
root      4263  4257  0 01:04 ?        00:00:00 heartbeat: write: ucast eth0 
root      4264  4257  0 01:04 ?        00:00:00 heartbeat: read: ucast eth0  
root      4265  4257  0 01:04 ?        00:00:00 heartbeat: write: ping_group group1
root      4266  4257  0 01:04 ?        00:00:00 heartbeat: read: ping_group group1
root      4269  4257  0 01:04 ?        00:00:00 /usr/lib64/heartbeat/ipfail
root      4272  4257  0 01:04 ?        00:00:00 heartbeat: master control process
 
查看HA备份节点, 发现此时vip资源不在备份节点上
[root@ha-slave ~]# ip addr   
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:ac:05:b5 brd ff:ff:ff:ff:ff:ff
    inet 172.16.60.207/24 brd 172.16.60.255 scope global eth0
    inet6 fe80::250:56ff:feac:5b5/64 scope link
       valid_lft forever preferred_lft forever
 
在备份节点上查看不到lvs负载情况
[root@ha-slave ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn

3) 故障转移切换测试

1) 首先关闭HA主节点的heartbeat服务
[root@ha-master ~]# service heartbeat stop
Stopping High-Availability services: Done.

[root@ha-master ~]# ps -ef|grep heartbeat
root      5377  3353  0 01:22 pts/0    00:00:00 grep heartbeat

查看HA主节点的heartbeat的log日志
[root@ha-master ~]# tail -f /var/log/ha-log 
.........
ResourceManager(default)[5223]: 2018/12/25_01:22:46 info: Releasing resource group: ha-master IPaddr::172.16.60.111 ipvsadm ldirectord
ResourceManager(default)[5223]: 2018/12/25_01:22:46 info: Running /etc/init.d/ldirectord  stop
ResourceManager(default)[5223]: 2018/12/25_01:22:46 info: Running /etc/init.d/ipvsadm  stop
ResourceManager(default)[5223]: 2018/12/25_01:22:46 info: Running /etc/ha.d/resource.d/IPaddr 172.16.60.111 stop
...........
Dec 25 01:22:49 ha-master heartbeat: [4846]: info: killing HBWRITE process 4856 with signal 15
Dec 25 01:22:49 ha-master heartbeat: [4846]: info: killing HBREAD process 4857 with signal 15
Dec 25 01:22:49 ha-master heartbeat: [4846]: info: Core process 4854 exited. 7 remaining
.........
Dec 25 01:22:49 ha-master heartbeat: [4846]: info: ha-master Heartbeat shutdown complete.

然后发现HA主节点的vip资源已经被转移走了(转移到备节点上了), 在主节点上也查看不到lvs的负载情况
[root@ha-master ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:ac:50:9b brd ff:ff:ff:ff:ff:ff
    inet 172.16.60.206/24 brd 172.16.60.255 scope global eth0
    inet6 fe80::250:56ff:feac:509b/64 scope link 
       valid_lft forever preferred_lft forever

[root@ha-master ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn

========================================================
接着到HA备节点上查看, 发现vip资源已经转移过来了, 并且在备份节点上可以查看到lvs负载情况

[root@ha-slave ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:ac:05:b5 brd ff:ff:ff:ff:ff:ff
    inet 172.16.60.207/24 brd 172.16.60.255 scope global eth0
    inet 172.16.60.111/24 brd 172.16.60.255 scope global secondary eth0
    inet6 fe80::250:56ff:feac:5b5/64 scope link 
       valid_lft forever preferred_lft forever

[root@ha-slave ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.16.60.111:80 wlc persistent 600
  -> 172.16.60.204:80             Route   1      0          0         
  -> 192.168.0.205:80             Route   1      0          0        

查看HA备节点的heartbeat的log日志
[root@ha-slave ~]# tail -2000 /var/log/ha-log 
...........
ResourceManager(default)[7413]: 2018/12/25_01:18:15 info: Acquiring resource group: ha-master IPaddr::172.16.60.111 ipvsadm ldirectord
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.60.111)[7441]:  2018/12/25_01:18:15 INFO:  Resource is stopped
ResourceManager(default)[7413]: 2018/12/25_01:18:15 info: Running /etc/ha.d/resource.d/IPaddr 172.16.60.111 start
IPaddr(IPaddr_172.16.60.111)[7537]:     2018/12/25_01:18:15 INFO: Adding inet address 172.16.60.111/24 with broadcast address 172.16.60.255 to device eth0
IPaddr(IPaddr_172.16.60.111)[7537]:     2018/12/25_01:18:15 INFO: Bringing device eth0 up
IPaddr(IPaddr_172.16.60.111)[7537]:     2018/12/25_01:18:15 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.16.60.111 eth0 172.16.60.111 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.60.111)[7523]:  2018/12/25_01:18:15 INFO:  Success
ResourceManager(default)[7413]: 2018/12/25_01:18:15 info: Running /etc/init.d/ipvsadm  start
ResourceManager(default)[7413]: 2018/12/25_01:18:15 info: Running /etc/init.d/ldirectord  start
mach_down(default)[7386]:       2018/12/25_01:18:15 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[7386]:       2018/12/25_01:18:15 info: mach_down takeover complete for node ha-master.

2) 接着第一步测试结果, 再启动HA主节点的heartbeat服务
[root@ha-master ~]# ps -ef|grep heartbeat
root      5391  3353  0 01:25 pts/0    00:00:00 grep heartbeat

[root@ha-master ~]# service heartbeat start
Starting High-Availability services: INFO:  Resource is stopped
Done.

[root@ha-master ~]# ps -ef|grep heartbeat  
root      5519     1  0 01:25 ?        00:00:00 heartbeat: master control process
root      5522  5519  0 01:25 ?        00:00:00 heartbeat: FIFO reader        
root      5523  5519  0 01:25 ?        00:00:00 heartbeat: write: bcast eth0  
root      5524  5519  0 01:25 ?        00:00:00 heartbeat: read: bcast eth0   
root      5525  5519  0 01:25 ?        00:00:00 heartbeat: write: ucast eth0  
root      5526  5519  0 01:25 ?        00:00:00 heartbeat: read: ucast eth0   
root      5527  5519  0 01:25 ?        00:00:00 heartbeat: write: ping_group group1
root      5528  5519  0 01:25 ?        00:00:00 heartbeat: read: ping_group group1
root      5531  3353  0 01:25 pts/0    00:00:00 grep heartbeat

查看HA主节点的heartbeat的log日志
[root@ha-master ~]# tail -f /var/log/ha-log 
........
ResourceManager(default)[5566]: 2018/12/25_01:25:58 info: Acquiring resource group: ha-master IPaddr::172.16.60.111 ipvsadm ldirectord
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.60.111)[5594]:  2018/12/25_01:25:58 INFO:  Resource is stopped
ResourceManager(default)[5566]: 2018/12/25_01:25:58 info: Running /etc/ha.d/resource.d/IPaddr 172.16.60.111 start
IPaddr(IPaddr_172.16.60.111)[5690]:     2018/12/25_01:25:58 INFO: Adding inet address 172.16.60.111/24 with broadcast address 172.16.60.255 to device eth0
IPaddr(IPaddr_172.16.60.111)[5690]:     2018/12/25_01:25:58 INFO: Bringing device eth0 up
IPaddr(IPaddr_172.16.60.111)[5690]:     2018/12/25_01:25:58 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.16.60.111 eth0 172.16.60.111 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.60.111)[5676]:  2018/12/25_01:25:58 INFO:  Success
ResourceManager(default)[5566]: 2018/12/25_01:25:58 info: Running /etc/init.d/ipvsadm  start
ResourceManager(default)[5566]: 2018/12/25_01:25:58 info: Running /etc/init.d/ldirectord  star
......

然后发现HA主节点的heartbeat服务恢复后, vip资源又抢回来了, 并且又能在主节点上查看lvs负载情况
因为在ha.cf 文件里配置了"auto_failback on", 表示 一旦主节点恢复运行,则自动获取资源并取代备用节点!

[root@ha-master ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:ac:50:9b brd ff:ff:ff:ff:ff:ff
    inet 172.16.60.206/24 brd 172.16.60.255 scope global eth0
    inet 172.16.60.111/24 brd 172.16.60.255 scope global secondary eth0
    inet6 fe80::250:56ff:feac:509b/64 scope link 
       valid_lft forever preferred_lft forever

[root@ha-master ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.16.60.111:80 wlc persistent 600
  -> 172.16.60.204:80             Route   1      0          0         
  -> 192.168.0.205:80             Route   1      0          0   

============================================================
接着到HA备节点上查看, 发现vip资源已经被转移走了(又转移回主节点上了), 并且在备份节点上查看不到lvs负载情况

[root@ha-slave ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:ac:05:b5 brd ff:ff:ff:ff:ff:ff
    inet 172.16.60.207/24 brd 172.16.60.255 scope global eth0
    inet6 fe80::250:56ff:feac:5b5/64 scope link 
       valid_lft forever preferred_lft forever

[root@ha-slave ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn

查看HA备节点的heartbeat的log日志
[root@ha-slave ~]# tail -2000 /var/log/ha-log 
.........
ResourceManager(default)[8376]: 2018/12/25_01:25:58 info: Releasing resource group: ha-master IPaddr::172.16.60.111 ipvsadm ldirectord
ResourceManager(default)[8376]: 2018/12/25_01:25:58 info: Running /etc/init.d/ldirectord  stop
ResourceManager(default)[8376]: 2018/12/25_01:25:58 info: Running /etc/init.d/ipvsadm  stop
ResourceManager(default)[8376]: 2018/12/25_01:25:58 info: Running /etc/ha.d/resource.d/IPaddr 172.16.60.111 stop

3) 访问http://172.16.60.111
.........
关闭realserver两个节点中的任何一个nginx, 比如关闭rs-204的nginx
则访问lvs, 即http://172.16.60.111不受影响

查看lvs负载情况
[root@ha-master ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.16.60.111:80 wlc persistent 600       
  -> 192.168.0.205:80             Route   1      0          0 

说明rs-204节点已经从lvs集群中踢出来了

接着恢复rs-204的nginx, 再次查看lvs情况
[root@ha-master ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.16.60.111:80 wlc persistent 600
  -> 172.16.60.204:80             Route   1      0          0         
  -> 192.168.0.205:80             Route   1      0          0 

说明rs-204节点的nginx恢复后, 再次被加到lvs集群中.
这是因为在ldirectord.cf文件中设置了quiescent=no

整个故障过程, 对于http://172.16.60.111 的lvs代理层的访问没有任何影响!

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券