[TOC]
Mysql数据库高可用涉及技术:1.mysql 主从同步 / 2. heartbeat技术 / 3. drdb 技术 总结:改变单一软件应用思想打组合权;
描述:一主多从得MYSQL数据架构是最常见得DB架构方法,该架构方案部署简单/维护方便/并且通过配置简单得代理或者通过程序得方法就可以实现应用服务器对主从库得读写分离,且多个从库还可以通过lvs或者haproxy等代理实现多个从库得负载均衡,分担读的压力同时排除单点问题;
WeiyiGeek.架构
正常情况说明:
故障切换: 经过高可以用方案切换后的数据库架构,变成了一个常规的主从架构,此时新主库从热备变成了单点服务了。
WeiyiGeek.主库切换
注意事项:
网络规划:
#data1
Master eth1 192.168.1.100 管理IP
eth2 192.168.2.100 心跳IP
192.168.1.200 漂移IP (VIP)
#-----------------------------------------------
#data2
BACKUP eth1 192.168.1.100 管理IP
eth2 192.168.2.101 心跳IP
##############################################
#网卡得配置
[root@data1 network-scripts]$ cat ifcfg-eth1
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
IPADDR=192.168.1.100
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=eth1
UUID=ae67c9f9-383c-4270-a49a-4667977a0d6d
DEVICE=eth1
ONBOOT=yes
IPV6_PRIVACY=no
#环境配置关闭selinux / 防火墙 和升级系统
[root@data1 network-scripts]$ sed -i '/SELINUX/s/enforcing/disabled/' /etc/sysconfig/selinux
[root@data1 network-scripts]$ systemctl stop firewalld
[root@data1 network-scripts]$ systemctl disable firewalld
ntpdate -u asia.pool.ntp.org # 时间同步
#系统升级与环境
CentOS Linux release 7.6.1810 (Core) 3.10.0-957.el7.x86_64
yum update -y && yum upgrade -y
#rpm --import /etc/pki/rpm-gpg/xxxx
yum install -y epel-release.noarch gcc gcc-c++ openssl openssl-devel libaio libaio-devel ncurses ncurses-deve numactl lsof vim wget net-tools fuser
#mysql采用二进制得 (安装方法查看MYSQL介绍安装与运维配置文章中)
#https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.16-linux-glibc2.12-x86_64.tar.xz
#mysql创建主从复制环境 (配置方法查看MYSQL多实例配置详解文章中)
#heartbeat 安装(参看前面得文章 高可用服务解决方案)
#drbd 安装(参看前面得文章 磁盘高可用解决方案)
安装源:http://elrepo.org/tiki/tiki-index.php
#建立磁盘
data1 = 512 MB
data2 = 1024 MB
DRBD管理名称与挂载目录: data /data
DRBD逻辑设备: /dev/drbd0 #关键点
DRBD 存储设备: /dev/sdb1
DRBD Meta设备: /dev/sdb2[0]
[root@data1 ~]$ fdisk -l
Disk /dev/sdb: 536 MB, 536870912 bytes, 1048576 sectors
[root@data2 ~]$ fdisk -l
Disk /dev/sdb: 1073 MB, 1073741824 bytes, 2097152 sectors
单多实例高可用架构图:
WeiyiGeek.案例图
Step1. 添加配置编辑集群资源文件
#在mysql + drbd + Heratbeat 中可以进行体现 (后面还能接一些脚本 但是必须要设置 start / stop )
#/data为mysql目录
$ vi /etc/ha.d/haresources
data1 IPaddr::192.168.1.200/24/eth1 drbddisk::data Filesystem::/dev/drbd0::/data::ext4 mysqld
IPaddr::192.168.1.200/24/eth1:用IPaddr脚本配置对外服务的浮动虚拟IP
drbddisk::data:用drbddisk脚本实现DRBD主从节点资源组的挂载和卸载
Filesystem::/dev/drbd0::/mnt::ext4:用Filesystem脚本实现磁盘挂载和卸载
Step2. 添加heartbeat脚本漂移切换脚本
#在/usr/local/heartbeat/etc/ha.d/resource.d/添加drbddisk脚本
#并且将mysqld启动配置文件放入该目录中
cp /etc/init.d/mysqld /usr/local/heartbeat/etc/ha.d/resource.d/
【注意:】 我这里是源码编译安装,可能与yum安装的目录不同,但是配置差不多;
#将主节点的drbddisk直接复制到从机中
scp /usr/local/heartbeat/etc/ha.d/resource.d/drbddisk root@192.168.1.101:/usr/local/heartbeat/etc/ha.d/resource.d/drbddisk
#权限非常重要
$ chmod 755 /usr/local/heartbeat/etc/ha.d/resource.d/drbddisk
$ chown -R root:haclient /usr/local/heartbeat/etc/ha.d/resource.d/
Step3. 高可用服务的启动
1. 在主/备节点上都运行开启drbd服务并监听状态
systemctl start drbd
drbdsetup /dev/drbd0 primary --force #[主节点运行]
watch -n1 cat /proc/drbd
# Every 1.0s: cat /proc/drbd Mon Apr 29 22:11:17 2019
# 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
# ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
2.在两个节点上启动HeartBeat服务,先启动data1:(data1,data2) ,注意主必须比从先启动
[root@data1 ~]$ systemctl start heartbeat
[root@data2 ~]$ systemctl start heartbeat
3.观察VIP情况网卡情况以及drbd主从状态
[root@data1 heartbeat]$ ip addr | grep "192.168.1"
inet 192.168.1.100/24 brd 192.168.1.255 scope global noprefixroute eth1
inet 192.168.1.200/24 brd 192.168.1.255 scope global secondary eth1:4
[root@data2 heartbeat]$ ip addr | grep "192.168.1"
inet 192.168.1.101/24 brd 192.168.1.255 scope global eth1
[root@data1 heartbeat]$ drbdadm status
data role:Primary
disk:UpToDate
peer role:Secondary
replication:Established peer-disk:UpToDate
[root@data2 heartbeat]$drbdadm status
data role:Secondary
disk:UpToDate
peer role:Primary
replication:Established peer-disk:UpToDate
4. 磁盘挂载情况如下图
WeiyiGeek.启动heartbeat
5.进行测试VIP漂移
#测试漂移脚本
[root@data1 heartbeat]$ /usr/local/heartbeat/share/heartbeat/hb_standby
#资源接管回data1主节点 (漂移回来后如上面的图)
[root@data1 heartbeat]$ /usr/local/heartbeat/share/heartbeat/hb_takeover local
[root@data1 heartbeat]$ service mysqld status
SUCCESS! MySQL running (5669)
[root@data2 heartbeat]$ service mysqld status
ERROR! MySQL is not running
WeiyiGeek.测试VIP故障漂移到从机
安装多实例环境即可,注意路径的一致性,在主节点安装即可;本例实现主库VIP进行漂移而从库从能正常连接到VIP地址上服务器进行主从同步;
WeiyiGeek.主从同步架构
Step 1. 主从多实例安装:
#主库建立
mkdir -vp /data/{{3306,3307}/{data,tmp,binlog,innodb_ts,innodb_log,logs},backup,scripts}
touch /data/{3306,3307}/tmp/mysql-error.log
touch /data/{3306,3307}/my.cnf
cat > /data/3306/my.cnf<<EOF
#主库
[client]
default-character-set=utf8 # 设置mysql客户端默认字符集
port = 3306
socket = /data/3306/mysql.sock
# The MySQL server
[mysqld]
port = 3306
mysqlx_port = 33060
user = mysql
server-id = 3306
socket = /data/3306/mysql.sock
mysqlx_socket=/data/3306/mysqlx.sock
pid-file = /data/3306/mysql.pid
basedir = /usr/local/mysql
datadir = /data/3306/data
tmpdir = /data/3306/tmp #非必须
log-bin = /data/3306/binlog/mysql-bin
log-error = /data/3306/mysql-error.log
explicit_defaults_for_timestamp
character-set-server=utf8 #服务端默认字符集
EOF
sed 's/3306/3307/g' /data/3306/my.cnf > /data/3307/my.cnf
chown -R mysql:mysql /data/
/usr/local/mysql/bin/mysqld --defaults-file=/data/3307/my.cnf --initialize --user=mysql --basedir=/usr/local/mysql && /usr/local/mysql/bin/mysqld --defaults-file=/data/3306/my.cnf --initialize --user=mysql --basedir=/usr/local/mysql
#获取root密码
echo -n "主3306端口mysql的root密码:" && grep "password" /data/3306/mysql-error.log | awk '{print $13}' && echo -n "从3307端口mysql的root密码:"&& grep "password" /data/3307/mysql-error.log | awk '{print $13}'
pass3306=`grep "password" /data/3306/mysql-error.log | awk '{print $13}'`
pass3307=`grep "password" /data/3307/mysql-error.log | awk '{print $13}'`
nohup /usr/local/mysql/bin/mysqld_safe --defaults-file=/data/3306/my.cnf --user=mysql &
sleep 10
nohup /usr/local/mysql/bin/mysqld_safe --defaults-file=/data/3307/my.cnf --user=mysql &
sleep 10
echo $pass3306 $pass3307
#登录数据库以及密码更改:
mysql -u root -p$pass3306 -S /data/3306/mysql.sock -e "ALTER USER 'root'@'localhost' IDENTIFIED BY 'System123@';" --connect-expired-password
mysql -u root -p$pass3307 -S /data/3307/mysql.sock -e "ALTER USER 'root'@'localhost' IDENTIFIED BY 'System123@';" --connect-expired-password
#显示更改成功
mysql -uroot -pSystem123@ -S /data/3306/mysql.sock -e "status;"
mysql -uroot -pSystem123@ -S /data/3307/mysql.sock -e "status;"
Step 2. 一键主从脚本搭建:
#!/bin/bash
MYUSER="root"
MYPASS="System123@"
MYSOCK=/data/3306/mysql.sock
BAKPATH=/data/backup
LOGFILE=${BAKPATH}/mysql_Binlogs_`date +%F`.log
DATAFILE=${BAKPATH}/mysql_Backup_`date +%F`.sql
MYSQL_HOME=/usr/local/mysql/bin
MYSQL_CMD="${MYSQL_HOME}/mysql -u$MYUSER -p$MYPASS -S $MYSOCK"
MYSQL_DUMP="${MYSQL_HOME}/mysqldump -u$MYUSER -p$MYPASS -S$MYSOCK -A -B --flush-logs --events --master-data=2"
[ `$MYSQL_CMD -e "select user,host from mysql.user"|grep rep|wc -l` -ne 1 ] && $MYSQL_CMD -e "CREATE USER 'rep'@'%' IDENTIFIED BY 'System123@';" &&
$MYSQL_CMD -e "grant replication slave on *.* to 'rep'@'%';" && $MYSQL_CMD -e "flush privileges;" || echo "用户已建立"
$MYSQL_CMD -e "flush tables with read lock;"
sleep 2
echo "--------show Master status---------------" > $LOGFILE
$MYSQL_CMD -e "show master status;">> $LOGFILE
$MYSQL_DUMP > $DATAFILE
sleep 5
$MYSQL_CMD -e "unlock tables;"
cat $LOGFILE
#MAIL -e "mysql slave log" 3333@qq.com < $LOG_FILE
Step 3. 从库配置建立:
#从库建立
#!/bin/bash
MYUSER="root"
MYPASS="System123@"
MYSOCK=/data/3307/mysql.sock
BAKPATH=/data/backup
MYSQL_HOME=/usr/local/mysql/bin
LOGFILE=${BAKPATH}/mysql_Binlogs_`date +%F`.log
DATAFILE=${BAKPATH}/mysql_Backup_`date +%F`.sql
MYSQL_CMD="${MYSQL_HOME}/mysql -u$MYUSER -p$MYPASS -S $MYSOCK"
#导入数据到从库中
cd /data/backup && ${MYSQL_CMD} < ${DATAFILE}
#config slave
#下面为了测试填写的管理IP实际填写VIP
$MYSQL_CMD -e "CHANGE MASTER TO MASTER_HOST='192.168.1.200',MASTER_PORT=3306,MASTER_USER='rep',MASTER_PASSWORD='System123@',MASTER_LOG_FILE='mysql-bin.000010',MASTER_LOG_POS=815;"
$MYSQL_CMD -e "start slave;"
sleep 5
$MYSQL_CMD -e "show slave status\G;" | egrep "IO_Running|SQL_Running" >> $LOG_FILE
WeiyiGeek.主从同步搭建成功
Step 4. 配置heartbeatmysql的启动脚本 vim /usr/local/heartbeat/etc/ha.d/resource.d/Mastermysqld
#!/bin/bash
port=3306 #必须
mysql_user="root" #需求更改
mysql_pwd="System123@" #需求更改
CmdPath="/usr/local/mysql/bin"
mysql_sock="/data/${port}/mysql.sock"
#startup function
function_start_mysql()
{
if [ ! -e "$mysql_sock" ];then
printf "Starting MySQL...\n"
/bin/sh ${CmdPath}/mysqld_safe --defaults-file=/data/${port}/my.cnf 2>&1 > /dev/null &
else
printf "MySQL is running...\n"
exit
fi
}
#stop function
function_stop_mysql()
{
if [ ! -e "$mysql_sock" ];then
printf "MySQL is stopped...\n"
exit
else
printf "Stoping MySQL...\n"
${CmdPath}/mysqladmin -u ${mysql_user} -p${mysql_pwd} -S /data/${port}/mysql.sock shutdown
fi
}
#restart function
function_restart_mysql()
{
function_stop_mysql
sleep 2
function_start_mysql
}
case $1 in
start)
function_start_mysql
;;
stop)
function_stop_mysql
;;
restart)
function_restart_mysql
;;
*)
esac
# heartbeat haresources 资源文件
data1 IPaddr::192.168.1.200/24/eth1 drbddisk::data Filesystem::/dev/drbd0::/data/3306/::ext4 Mastermysqld
Step 5. 拷贝资源配置文件到热备服务器中
[root@data1 ha.d]$ scp haresources root@192.168.1.101:/usr/local/heartbeat/etc/ha.d/haresources
haresources
[root@data1 ha.d]$ scp ./resource.d/Mastermysqld root@192.168.1.101:/usr/local/heartbeat/etc/ha.d/resource.d/Mastermysqld
Mastermysqld
[root@data2 ha.d]$ chown -R root:haclient /usr/local/heartbeat/etc/ha.d/
#热备建立mysql目录(来存储drbd0中的数据)
$ chown mysql:mysql /data/mysql
```
<br>
Step 6.主从高可用的启动流程
```bash
#注意事项:
#(1)启动前查看文件权限是否正常
# (2) 停止主库3306的服务,从库一般不做高可用
#(3)主节点先启动heartbeat服务
#data1 和 data2 分别执行
[data1/data2]$ service drbd start
#将主库的文件移动到其他地方(带权限复制)
[root@data1 data]$ cp -PR 3306/ 3308/
#启动data1 heartbeat ,data2 延迟启动
service heartbeat start
$ssh root@192.168.1.101 "service heartbeat start"
Starting heartbeat (via systemctl): [ OK ]
#data1查看drbd 磁盘挂载
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
/dev/drbd0 262M 166M 76M 69% /data/3306
#主从同步状态
data1 - 3306 - mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000012 | 349 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
#从库的状态(没问题OK)
data1 - 3307 - mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.200
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000012
Read_Master_Log_Pos: 349
Relay_Log_File: data1-relay-bin.000005
Relay_Log_Pos: 516
Relay_Master_Log_File: mysql-bin.000012
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Step 7. 模拟测试机器宕机
[root@data1 ~]# /usr/local/heartbeat/share/heartbeat/hb_standby
Going standby [all].
# /usr/local/heartbeat/share/heartbeat/hb_takeover local #重新接管回来
[root@data2 ~]# df -h
/dev/drbd0 262M 166M 76M 69% /data/3306
[root@data2 ~]# ls /data/3306/
binlog/ data/ innodb_log/ innodb_ts/ logs/ mysql.sock mysqlx.sock tmp/
#此时备服务器的drbd变成主
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
#mysql服务也正常启动
[root@data2 ~]# lsof -i:3306
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mysqld 9844 mysql 39u IPv6 33224 0t0 TCP *:mysql (LISTEN)
#测试登陆插入一些数据
[root@data2 ~]$ mysql -uroot -pSystem123@ -S /data/3306/mysql.sock
mysql> status;
--------------
mysql Ver 8.0.16 for linux-glibc2.12 on x86_64 (MySQL Community Server - GPL)
Step 8. 登陆从库查看通过状态
#不能同步的时候采用rep 进行远程登陆
#** VIP -DATA2 - 3306 **
mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000015 | 441 | | | |
+------------------+----------+--------------+------------------+-------------------+
#** 从库 -DATA1 - 3307 **
mysql> show slave status\G;
Master_Log_File: mysql-bin.000015
Read_Master_Log_Pos: 441
Relay_Log_File: data1-relay-bin.000009
Relay_Log_Pos: 655
Relay_Master_Log_File: mysql-bin.000015
Slave_IO_Running: Yes
WeiyiGeek.VIP漂移后的主从同步
总括: 描述:HeartBeat是基于服务器级别的,如果要检测某项服务是否宕掉然后进行切换,就必须自己编写脚本来实时监控进程状态;然后来进行heartbeat切换;
1. drbddisk 脚本 $ vi /usr/local/heartbeat/etc/ha.d/resource.d/drbddisk
#!/bin/bash
#
# This script is inteded to be used as resource script by heartbeat
#
# Copright 2003-2008 LINBIT Information Technologies
# Philipp Reisner, Lars Ellenberg
#
###
DEFAULTFILE="/etc/default/drbd"
DRBDADM="/sbin/drbdadm"
if [ -f $DEFAULTFILE ]; then
. $DEFAULTFILE
fi
if [ "$#" -eq 2 ]; then
RES="$1"
CMD="$2"
else
RES="all"
CMD="$1"
fi
## EXIT CODES
# since this is a "legacy heartbeat R1 resource agent" script,
# exit codes actually do not matter that much as long as we conform to
# http://wiki.linux-ha.org/HeartbeatResourceAgent
# but it does not hurt to conform to lsb init-script exit codes,
# where we can.
# http://refspecs.linux-foundation.org/LSB_3.1.0/
#LSB-Core-generic/LSB-Core-generic/iniscrptact.html
####
drbd_set_role_from_proc_drbd()
{
local out
if ! test -e /proc/drbd; then
ROLE="Unconfigured"
return
fi
dev=$( $DRBDADM sh-dev $RES )
minor=${dev#/dev/drbd}
if [[ $minor = *[!0-9]* ]] ; then
# sh-minor is only supported since drbd 8.3.1
minor=$( $DRBDADM sh-minor $RES )
fi
if [[ -z $minor ]] || [[ $minor = *[!0-9]* ]] ; then
ROLE=Unknown
return
fi
if out=$(sed -ne "/^ *$minor: cs:/ { s/:/ /g; p; q; }" /proc/drbd); then
set -- $out
ROLE=${5%/**}
: ${ROLE:=Unconfigured} # if it does not show up
else
ROLE=Unknown
fi
}
case "$CMD" in
start)
# try several times, in case heartbeat deadtime
# was smaller than drbd ping time
try=6
while true; do
$DRBDADM primary $RES && break
let "--try" || exit 1 # LSB generic error
sleep 1
done
;;
stop)
# heartbeat (haresources mode) will retry failed stop
# for a number of times in addition to this internal retry.
try=3
while true; do
$DRBDADM secondary $RES && break
# We used to lie here, and pretend success for anything != 11,
# to avoid the reboot on failed stop recovery for "simple
# config errors" and such. But that is incorrect.
# Don't lie to your cluster manager.
# And don't do config errors...
let --try || exit 1 # LSB generic error
sleep 1
done
;;
status)
if [ "$RES" = "all" ]; then
echo "A resource name is required for status inquiries."
exit 10
fi
ST=$( $DRBDADM role $RES )
ROLE=${ST%/**}
case $ROLE in
Primary|Secondary|Unconfigured)
# expected
;;
*)
# unexpected. whatever...
# If we are unsure about the state of a resource, we need to
# report it as possibly running, so heartbeat can, after failed
# stop, do a recovery by reboot.
# drbdsetup may fail for obscure reasons, e.g. if /var/lock/ is
# suddenly readonly. So we retry by parsing /proc/drbd.
drbd_set_role_from_proc_drbd
esac
case $ROLE in
Primary)
echo "running (Primary)"
exit 0 # LSB status "service is OK"
;;
Secondary|Unconfigured)
echo "stopped ($ROLE)"
exit 3 # LSB status "service is not running"
;;
*)
# NOTE the "running" in below message.
# this is a "heartbeat" resource script,
# the exit code is _ignored_.
echo "cannot determine status, may be running ($ROLE)"
exit 4 # LSB status "service status is unknown"
;;
esac
;;
*)
echo "Usage: drbddisk [resource] {start|stop|status}"
exit 1
;;
esac
exit 0
(0)脚本权限问题导致heartbeat不能正常执行
#仅限源码安装的
chown root:haclient /usr/lib/ocf/resource.d/heartbeat/Filesystem
chown root:haclient /usr/lib/ocf/resource.d/heartbeat/IPaddr
chown -R root:haclient /etc/ha.d/resource.d/
usermod -g haclient -G root hacluster #hacluster加入到附属组 (更加实际需求)
(1)高可用ha-log日志Filesystem报错 问题:找不到fuser命令
ResourceManager(default)[17749]: 2019/04/29_22:41:11 info: Running /usr/local/heartbeat/etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext4 stop
Filesystem(Filesystem_/dev/drbd0)[18665]: 2019/04/29_22:41:11 ERROR: Setup problem: couldn't find command: fuser
解决方法:
yum install psmisc
(2) 普通用户无法执行mount
usermod -g haclient -G root hacluster #hacluster加入到附属组 (更加实际需求)
(3) 高可用主从同步 IO_Running NO 错误
(1) 有时候需要等待60s进行重试连接(如果这种方法不行就手动)
(2) 停止slave再启动slave即可
(3) 先利用rep登陆mysql然后再退出记可用正常同步