MHA是众多使用MySQL数据库企业高可用的不二选择,它简单易用,功能强大,实现了基于MySQL replication架构的自动主从故障转移,本文主要描述了MHA自动切换的步骤,对切换过程做了演示以及进行了适当的分析,供大家参考和理解MHA以及MySQL的原理。
1、MHA自动切换的步骤 a、MHA manager启动时的校验阶段 根据配置文件校验复制配置以及识别当前的master 导致监控终止情形:复制配置异常,存在的异常slave,一些需要的脚本脚本异常 MHA manager启动前可以通过masterha_check_ssh以及masterha_check_repl检测
b、监控master server阶段 MHA manager启动成功后,输出日志[info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. MHA会定期ping master,缺省的间隔为1秒,直到master死掉 MHA manager不会监控slave的状态,任意的Stopping/Restarting/Adding/Removing slaves不影响当前MHA监控 当添加或移除slave节点时,建议修改配置文件以及重启MHA manager
c、侦测到master server异常阶段 3次连续ping master失败,参数secondary_check_script可用于double check
d、开启failover阶段 MHA再次读取配置文件(防止主从配置已更改),校验master以及当前masetr的从库 MHA也会校验诸如复制异常以及是否存在一些从库有不同的主 启动failover(排除上次failover失败或者failover时间间隔太短)
e、隔离master server阶段(可选)
如果配置文件定义了IP漂移脚本则此时会关闭master IP 如果配置文件定义了关闭master脚本,则调用脚本关闭master以避免脑裂
f、恢复一个新的master 从slave节点获取最新的end_log_pos (Read_Master_Log_Pos)位置 从上一步获取的位置开始,读取及保存crashed master全部二进制日志文件(ssh可达) 决定新的master(根据配置文件定义的优先级别) 产生差量的binary/relay log events并且apply到新的master
g、激活一个新的master 如果配置文件定义了IP漂移脚本则此时会将IP漂移到新master
i、恢复剩余的slave 对所有从库并行地产生差量binary/relay log events 并用apply差量binary/relay log events到各从库 启动复制 j、通知切换结果(可选) 邮件发送 备份作业,管理工具调整等
2、MHA配置信息 [root@vdbsrv4 ~]$ more /etc/masterha/app1.cnf [server default] manager_workdir=/var/log/masterha/app1 manager_log=/var/log/masterha/app1/manager.log
user=mha password=xxx ssh_user=root repl_user=repl repl_password=repl ping_interval=1 shutdown_script="" master_ip_online_change_script="" report_script="" master_ip_failover_script=/tmp/master_ip_failover [server1] hostname=vdbsrv1 master_binlog_dir=/data/mysqldata
[server2] hostname=vdbsrv2 master_binlog_dir=/data/mysqldata
[server3] hostname=vdbsrv3 master_binlog_dir=/data/mysqldata/ #candidate_master=1
[root@vdbsrv1 ~]# more /etc/hosts 127.0.0.1 localhost.localdomain localhost 192.168.1.6 vdbsrv1 #master 192.168.1.7 vdbsrv2 #slave1 192.168.1.8 vdbsrv3 #slave2 192.168.1.12 vdbsrv4 #manager
mysql> show slave hosts; +-----------+---------+------+-----------+--------------------------------------+ | Server_id | Host | Port | Master_id | Slave_UUID | +-----------+---------+------+-----------+--------------------------------------+ | 1001 | vdbsrv2 | 3306 | 1 | 75bef614-e342-11e4-921d-000c295fb2eb | | 1002 | vdbsrv3 | 3306 | 1 | 091f79b8-e386-11e4-93d5-000c2943c830 | +-----------+---------+------+-----------+--------------------------------------+ 2 rows in set (0.00 sec)
3、演示切换 ###开启主库VIP [root@vdbsrv4 ~]# ssh vdbsrv1 "/sbin/ifconfig eth0:0 192.168.1.13 netmask 255.255.255.0 up"
###开启MHA manager [root@vdbsrv4 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf &
###使用sysbench产生事务 [root@vdbsrv1 ~]# sysbench --test=oltp \ > --oltp-table-size=5000000 \ > --oltp-read-only=off \ > --init-rng=on \ > --num-threads=16 \ > --max-requests=0 \ > --oltp-dist-type=uniform \ > --max-time=180 \ > --mysql-user=root \ > --mysql-socket=/tmp/mysql.sock \ > --mysql-password='' \ > --db-driver=mysql \ > --mysql-table-engine=innodb \ > --oltp-test-mode=complex \ > --mysql-db=replicatedb run sysbench 0.4.12: multi-threaded system evaluation benchmark
###从库vdbsrv2停止io进程以模拟io滞后 [root@vdbsrv2 ~]# mysql -e 'stop slave io_thread' [root@vdbsrv2 ~]# mysql -e 'show slave status\G' | egrep 'Slave_IO|Slave_SQL' Slave_IO_State: Slave_IO_Running: No ###Author : Leshami Slave_SQL_Running: Yes ###Blog : http://blog.csdn.net/leshami Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
###2分钟后从库vdbsrv2启动io进程 [root@vdbsrv2 ~]# mysql -e 'start slave io_thread'
###从库vdbsrv3停止io进程以模拟io滞后 [root@vdbsrv3 ~]# mysql -e 'stop slave io_thread'
###模拟主库宕机 [root@vdbsrv4 ~]# ssh vdbsrv1 "killall -r mysqld"
###主库的sysbench被强制断开且伴随下列错误提示 [root@vdbsrv1 ~]# ....... Doing OLTP test. Running mixed OLTP test Using Uniform distribution Using "BEGIN" for starting transactions Using auto_inc on the id column Threads started! ALERT: failed to execute mysql_stmt_execute(): Err1317 Query execution was interrupted FATAL: database error, exiting... ALERT: failed to execute mysql_stmt_execute(): Err1317 Query execution was interrupted FATAL: database error, exiting...
###查看VIP漂移情况 [root@vdbsrv4 app1]# grep VIP /var/log/masterha/app1/manager.log Disabling the VIP on old master: vdbsrv1 Enabling the VIP - 192.168.1.13/24 on the new master - vdbsrv2
###登陆到主机vdbsrv2 mysql mysql> show variables like '%hostname%'; +---------------+---------+ | Variable_name | Value | +---------------+---------+ | hostname | vdbsrv2 | +---------------+---------+ 1 row in set (0.00 sec)
###此时vdbsrv2已提升为master,vdbsrv3指向了vdbsrv2 mysql> show slave hosts; +-----------+---------+------+-----------+--------------------------------------+ | Server_id | Host | Port | Master_id | Slave_UUID | +-----------+---------+------+-----------+--------------------------------------+ | 1002 | vdbsrv3 | 3306 | 1001 | 091f79b8-e386-11e4-93d5-000c2943c830 | +-----------+---------+------+-----------+--------------------------------------+ 1 row in set (0.00 sec)
4、MHA日志初步剖析
[root@vdbsrv4 ~]# ls -hltr /var/log/masterha/* total 32K -rw-r--r-- 1 root root 143 Apr 20 14:30 saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog -rw-r--r-- 1 root root 26K Apr 20 14:30 manager.log -rw-r--r-- 1 root root 0 Apr 20 14:30 app1.failover.complete
###具体日志信息manager.log
Mon Apr 20 14:27:41 2015 - [info] MHA::MasterMonitor version 0.56. Mon Apr 20 14:27:42 2015 - [info] GTID failover mode = 0 Mon Apr 20 14:27:42 2015 - [info] Dead Servers: Mon Apr 20 14:27:42 2015 - [info] Alive Servers: Mon Apr 20 14:27:42 2015 - [info] vdbsrv1(192.168.1.6:3306) Mon Apr 20 14:27:42 2015 - [info] vdbsrv2(192.168.1.7:3306) Mon Apr 20 14:27:42 2015 - [info] vdbsrv3(192.168.1.8:3306) Mon Apr 20 14:27:42 2015 - [info] Alive Slaves: Mon Apr 20 14:27:42 2015 - [info] vdbsrv2(192.168.1.7:3306) Version=5.6.22-log (oldest major version between slaves) log-bin:enabled Mon Apr 20 14:27:42 2015 - [info] Replicating from 192.168.1.6(192.168.1.6:3306) Mon Apr 20 14:27:42 2015 - [info] vdbsrv3(192.168.1.8:3306) Version=5.6.22-log (oldest major version between slaves) log-bin:enabled Mon Apr 20 14:27:42 2015 - [info] Replicating from 192.168.1.6(192.168.1.6:3306) Mon Apr 20 14:27:42 2015 - [info] Current Alive Master: vdbsrv1(192.168.1.6:3306) Mon Apr 20 14:27:42 2015 - [info] Checking slave configurations.. Mon Apr 20 14:27:42 2015 - [info] read_only=1 is not set on slave vdbsrv2(192.168.1.7:3306). Mon Apr 20 14:27:42 2015 - [info] Checking replication filtering settings.. Mon Apr 20 14:27:42 2015 - [info] binlog_do_db= , binlog_ignore_db= Mon Apr 20 14:27:42 2015 - [info] Replication filtering check ok. Mon Apr 20 14:27:42 2015 - [info] GTID (with auto-pos) is not supported Mon Apr 20 14:27:42 2015 - [info] Starting SSH connection tests.. Mon Apr 20 14:27:43 2015 - [info] All SSH connection tests passed successfully. Mon Apr 20 14:27:43 2015 - [info] Checking MHA Node version.. Mon Apr 20 14:27:44 2015 - [info] Version check ok. Mon Apr 20 14:27:44 2015 - [info] Checking SSH publickey authentication settings on the current master.. Mon Apr 20 14:27:44 2015 - [info] HealthCheck: SSH to vdbsrv1 is reachable. Mon Apr 20 14:27:44 2015 - [info] Master MHA Node version is 0.56. Mon Apr 20 14:27:44 2015 - [info] Checking recovery script configurations on vdbsrv1(192.168.1.6:3306).. Mon Apr 20 14:27:44 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqldata --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000020 Mon Apr 20 14:27:44 2015 - [info] Connecting to root@192.168.1.6(vdbsrv1:22).. Creating /var/tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/mysqldata, up to mysql-bin.000020 Mon Apr 20 14:27:44 2015 - [info] Binlog setting check done. Mon Apr 20 14:27:44 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Mon Apr 20 14:27:44 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=vdbsrv2 --slave_ip=192.168.1.7 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.22-log --manager_version=0.56 --relay_log_info=/data/mysqldata/relay-log.info --relay_dir=/data/mysqldata/ --slave_pass=xxx Mon Apr 20 14:27:44 2015 - [info] Connecting to root@192.168.1.7(vdbsrv2:22).. Checking slave recovery environment settings.. Opening /data/mysqldata/relay-log.info ... ok. Relay log found at /data/mysqldata, up to vdbsrv2-relay-bin.000002 Temporary relay log file is /data/mysqldata/vdbsrv2-relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Mon Apr 20 14:27:45 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=vdbsrv3 --slave_ip=192.168.1.8 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.22-log --manager_version=0.56 --relay_log_info=/data/mysqldata/relay-log.info --relay_dir=/data/mysqldata/ --slave_pass=xxx Mon Apr 20 14:27:45 2015 - [info] Connecting to root@192.168.1.8(vdbsrv3:22).. Checking slave recovery environment settings.. Opening /data/mysqldata/relay-log.info ... ok. Relay log found at /data/mysqldata, up to vdbsrv3-relay-bin.000002 Temporary relay log file is /data/mysqldata/vdbsrv3-relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Mon Apr 20 14:27:45 2015 - [info] Slaves settings check done. Mon Apr 20 14:27:45 2015 - [info] vdbsrv1(192.168.1.6:3306) (current master) +--vdbsrv2(192.168.1.7:3306) +--vdbsrv3(192.168.1.8:3306)
Mon Apr 20 14:27:45 2015 - [info] Checking master_ip_failover_script status: Mon Apr 20 14:27:45 2015 - [info] /tmp/master_ip_failover --command=status --ssh_user=root --orig_master_host=vdbsrv1 --orig_master_ip=192.168.1.6 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.1.13/24===
Checking the Status of the script.. OK Mon Apr 20 14:27:45 2015 - [info] OK. Mon Apr 20 14:27:45 2015 - [warning] shutdown_script is not defined. Mon Apr 20 14:27:45 2015 - [info] Set master ping interval 1 seconds. Mon Apr 20 14:27:45 2015 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. Mon Apr 20 14:27:45 2015 - [info] Starting ping health check on vdbsrv1(192.168.1.6:3306).. Mon Apr 20 14:27:45 2015 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. Mon Apr 20 14:30:19 2015 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query) ###侦测到错误 Mon Apr 20 14:30:19 2015 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqldata --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-bin Mon Apr 20 14:30:19 2015 - [info] HealthCheck: SSH to vdbsrv1 is reachable. Mon Apr 20 14:30:20 2015 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.1.6' (111)) Mon Apr 20 14:30:20 2015 - [warning] Connection failed 2 time(s).. Mon Apr 20 14:30:21 2015 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.1.6' (111)) Mon Apr 20 14:30:21 2015 - [warning] Connection failed 3 time(s).. Mon Apr 20 14:30:22 2015 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.1.6' (111)) Mon Apr 20 14:30:22 2015 - [warning] Connection failed 4 time(s).. Mon Apr 20 14:30:22 2015 - [warning] Master is not reachable from health checker! Mon Apr 20 14:30:22 2015 - [warning] Master vdbsrv1(192.168.1.6:3306) is not reachable! Mon Apr 20 14:30:22 2015 - [warning] SSH is reachable. Mon Apr 20 14:30:22 2015 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status.. Mon Apr 20 14:30:22 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Mon Apr 20 14:30:22 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Mon Apr 20 14:30:22 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Mon Apr 20 14:30:22 2015 - [info] GTID failover mode = 0 Mon Apr 20 14:30:22 2015 - [info] Dead Servers: Mon Apr 20 14:30:22 2015 - [info] vdbsrv1(192.168.1.6:3306) Mon Apr 20 14:30:22 2015 - [info] Alive Servers: Mon Apr 20 14:30:22 2015 - [info] vdbsrv2(192.168.1.7:3306) Mon Apr 20 14:30:22 2015 - [info] vdbsrv3(192.168.1.8:3306) Mon Apr 20 14:30:22 2015 - [info] Alive Slaves: Mon Apr 20 14:30:22 2015 - [info] vdbsrv2(192.168.1.7:3306) Version=5.6.22-log (oldest major version between slaves) log-bin:enabled Mon Apr 20 14:30:22 2015 - [info] Replicating from 192.168.1.6(192.168.1.6:3306) Mon Apr 20 14:30:22 2015 - [info] vdbsrv3(192.168.1.8:3306) Version=5.6.22-log (oldest major version between slaves) log-bin:enabled Mon Apr 20 14:30:22 2015 - [info] Replicating from 192.168.1.6(192.168.1.6:3306) Mon Apr 20 14:30:22 2015 - [info] Checking slave configurations.. Mon Apr 20 14:30:22 2015 - [info] read_only=1 is not set on slave vdbsrv2(192.168.1.7:3306). Mon Apr 20 14:30:22 2015 - [info] Checking replication filtering settings.. Mon Apr 20 14:30:22 2015 - [info] Replication filtering check ok. Mon Apr 20 14:30:22 2015 - [info] Master is down! Mon Apr 20 14:30:22 2015 - [info] Terminating monitoring script. Mon Apr 20 14:30:22 2015 - [info] Got exit code 20 (Master dead). Mon Apr 20 14:30:22 2015 - [info] MHA::MasterFailover version 0.56. Mon Apr 20 14:30:22 2015 - [info] Starting master failover. Mon Apr 20 14:30:22 2015 - [info] Mon Apr 20 14:30:22 2015 - [info] * Phase 1: Configuration Check Phase.. Mon Apr 20 14:30:22 2015 - [info] Mon Apr 20 14:30:22 2015 - [info] GTID failover mode = 0 Mon Apr 20 14:30:22 2015 - [info] Dead Servers: Mon Apr 20 14:30:22 2015 - [info] vdbsrv1(192.168.1.6:3306) Mon Apr 20 14:30:22 2015 - [info] Checking master reachability via MySQL(double check)... Mon Apr 20 14:30:22 2015 - [info] ok. Mon Apr 20 14:30:22 2015 - [info] Alive Servers: Mon Apr 20 14:30:22 2015 - [info] vdbsrv2(192.168.1.7:3306) Mon Apr 20 14:30:22 2015 - [info] vdbsrv3(192.168.1.8:3306) Mon Apr 20 14:30:22 2015 - [info] Alive Slaves: Mon Apr 20 14:30:22 2015 - [info] vdbsrv2(192.168.1.7:3306) Version=5.6.22-log (oldest major version between slaves) log-bin:enabled Mon Apr 20 14:30:22 2015 - [info] Replicating from 192.168.1.6(192.168.1.6:3306) Mon Apr 20 14:30:22 2015 - [info] vdbsrv3(192.168.1.8:3306) Version=5.6.22-log (oldest major version between slaves) log-bin:enabled Mon Apr 20 14:30:22 2015 - [info] Replicating from 192.168.1.6(192.168.1.6:3306) Mon Apr 20 14:30:22 2015 - [info] Starting Non-GTID based failover. Mon Apr 20 14:30:22 2015 - [info] Mon Apr 20 14:30:22 2015 - [info] ** Phase 1: Configuration Check Phase completed. Mon Apr 20 14:30:22 2015 - [info] Mon Apr 20 14:30:22 2015 - [info] * Phase 2: Dead Master Shutdown Phase.. ###隔离master server Mon Apr 20 14:30:22 2015 - [info] Mon Apr 20 14:30:22 2015 - [info] Forcing shutdown so that applications never connect to the current master.. Mon Apr 20 14:30:22 2015 - [info] Executing master IP deactivation script: Mon Apr 20 14:30:22 2015 - [info] /tmp/master_ip_failover --orig_master_host=vdbsrv1 --orig_master_ip=192.168.1.6 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.1.13/24===
Disabling the VIP on old master: vdbsrv1 Mon Apr 20 14:30:22 2015 - [info] done. Mon Apr 20 14:30:22 2015 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Mon Apr 20 14:30:22 2015 - [info] * Phase 2: Dead Master Shutdown Phase completed. Mon Apr 20 14:30:22 2015 - [info] Mon Apr 20 14:30:22 2015 - [info] * Phase 3: Master Recovery Phase.. Mon Apr 20 14:30:22 2015 - [info] Mon Apr 20 14:30:22 2015 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Mon Apr 20 14:30:22 2015 - [info] Mon Apr 20 14:30:22 2015 - [info] The latest binary log file/position on all slaves is mysql-bin.000020:315514388 Mon Apr 20 14:30:22 2015 - [info] Latest slaves (Slaves that received relay log files to the latest): Mon Apr 20 14:30:22 2015 - [info] vdbsrv2(192.168.1.7:3306) Version=5.6.22-log (oldest major version between slaves) log-bin:enabled Mon Apr 20 14:30:22 2015 - [info] Replicating from 192.168.1.6(192.168.1.6:3306) Mon Apr 20 14:30:22 2015 - [info] The oldest binary log file/position on all slaves is mysql-bin.000020:313962621 Mon Apr 20 14:30:22 2015 - [info] Oldest slaves: Mon Apr 20 14:30:22 2015 - [info] vdbsrv3(192.168.1.8:3306) Version=5.6.22-log (oldest major version between slaves) log-bin:enabled Mon Apr 20 14:30:22 2015 - [info] Replicating from 192.168.1.6(192.168.1.6:3306) Mon Apr 20 14:30:22 2015 - [info] Mon Apr 20 14:30:22 2015 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Mon Apr 20 14:30:22 2015 - [info] Mon Apr 20 14:30:23 2015 - [info] Fetching dead master's binary logs.. ###从dead master提取binlog Mon Apr 20 14:30:23 2015 - [info] Executing command on the dead master vdbsrv1(192.168.1.6:3306): save_binary_logs --command=save --start_file=mysql-bin.000020 --start_pos=315514388 --binlog_dir=/data/mysqldata --output_file=/var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /var/tmp if not exists.. ok. Concat binary/relay logs from mysql-bin.000020 pos 315514388 to mysql-bin.000020 EOF into /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 120.. ok. Dumping effective binlog data from /data/mysqldata/mysql-bin.000020 position 315514388 to tail(315514411).. ok. Binlog Checksum enabled Concat succeeded. Mon Apr 20 14:30:23 2015 - [info] scp from root@192.168.1.6:/var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog succeeded. ###复制dump的binlog到监控节点 Mon Apr 20 14:30:23 2015 - [info] HealthCheck: SSH to vdbsrv2 is reachable. Mon Apr 20 14:30:24 2015 - [info] HealthCheck: SSH to vdbsrv3 is reachable. Mon Apr 20 14:30:24 2015 - [info] Mon Apr 20 14:30:24 2015 - [info] * Phase 3.3: Determining New Master Phase.. Mon Apr 20 14:30:24 2015 - [info] Mon Apr 20 14:30:24 2015 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. ###寻找最近的slave中继日志 Mon Apr 20 14:30:24 2015 - [info] Checking whether vdbsrv2 has relay logs from the oldest position.. ###用于恢复其他slave Mon Apr 20 14:30:24 2015 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000020 --latest_rmlp=315514388 --target_mlf=mysql-bin.000020 --target_rmlp=313962621 --server_id=1001 --workdir=/var/tmp --timestamp=20150420143022 --manager_version=0.56 --relay_log_info=/data/mysqldata/relay-log.info --relay_dir=/data/mysqldata/ Opening /data/mysqldata/relay-log.info ... ok. Relay log found at /data/mysqldata, up to vdbsrv2-relay-bin.000003 Fast relay log position search succeeded. Target relay log file/position found. start_file:vdbsrv2-relay-bin.000003, start_pos:6265486. Target relay log FOUND! Mon Apr 20 14:30:24 2015 - [info] OK. vdbsrv2 has all relay logs. Mon Apr 20 14:30:24 2015 - [info] Searching new master from slaves.. ###寻找及决定新的master Mon Apr 20 14:30:24 2015 - [info] Candidate masters from the configuration file: Mon Apr 20 14:30:24 2015 - [info] Non-candidate masters: Mon Apr 20 14:30:24 2015 - [info] New master is vdbsrv2(192.168.1.7:3306) Mon Apr 20 14:30:24 2015 - [info] Starting master failover.. ###执行failover Mon Apr 20 14:30:24 2015 - [info] From: vdbsrv1(192.168.1.6:3306) (current master) +--vdbsrv2(192.168.1.7:3306) +--vdbsrv3(192.168.1.8:3306)
To: vdbsrv2(192.168.1.7:3306) (new master) +--vdbsrv3(192.168.1.8:3306) Mon Apr 20 14:30:24 2015 - [info] Mon Apr 20 14:30:24 2015 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Mon Apr 20 14:30:24 2015 - [info] Mon Apr 20 14:30:24 2015 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Mon Apr 20 14:30:24 2015 - [info] Sending binlog.. Mon Apr 20 14:30:25 2015 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog to root@vdbsrv2:/var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog succeeded. Mon Apr 20 14:30:25 2015 - [info] Mon Apr 20 14:30:25 2015 - [info] * Phase 3.4: Master Log Apply Phase.. Mon Apr 20 14:30:25 2015 - [info] Mon Apr 20 14:30:25 2015 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Mon Apr 20 14:30:25 2015 - [info] Starting recovery on vdbsrv2(192.168.1.7:3306).. Mon Apr 20 14:30:25 2015 - [info] Generating diffs succeeded. Mon Apr 20 14:30:25 2015 - [info] Waiting until all relay logs are applied. Mon Apr 20 14:30:36 2015 - [info] done. Mon Apr 20 14:30:36 2015 - [info] Getting slave status.. Mon Apr 20 14:30:36 2015 - [info] This slave(vdbsrv2)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000020:315514388). No need to recover from Exec_Master_Log_Pos. Mon Apr 20 14:30:36 2015 - [info] Connecting to the target slave host vdbsrv2, running recover script.. ###基于新主库Apply差量日志 Mon Apr 20 14:30:36 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=vdbsrv2 --slave_ip=192.168.1.7 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog --workdir=/var/tmp --target_version=5.6.22-log --timestamp=20150420143022 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Mon Apr 20 14:30:36 2015 - [info] Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog on vdbsrv2:3306. This may take long time... Applying log files succeeded. Mon Apr 20 14:30:36 2015 - [info] All relay logs were successfully applied. Mon Apr 20 14:30:36 2015 - [info] Getting new master's binlog name and position.. Mon Apr 20 14:30:36 2015 - [info] mysql-bin.000011:241926689 Mon Apr 20 14:30:36 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='vdbsrv2 or 192.168.1.7', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000011', MASTER_LOG_POS=241926689, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Mon Apr 20 14:30:36 2015 - [info] Executing master IP activate script: ###IP漂移 Mon Apr 20 14:30:36 2015 - [info] /tmp/master_ip_failover --command=start --ssh_user=root --orig_master_host=vdbsrv1 --orig_master_ip=192.168.1.6 --orig_master_port=3306 --new_master_host=vdbsrv2 --new_master_ip=192.168.1.7 --new_master_port=3306 --new_master_user='mha' --new_master_password='xxx' Unknown option: new_master_user Unknown option: new_master_password
IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 192.168.1.13/24===
Enabling the VIP - 192.168.1.13/24 on the new master - vdbsrv2 Mon Apr 20 14:30:36 2015 - [info] OK. Mon Apr 20 14:30:36 2015 - [info] ** Finished master recovery successfully. Mon Apr 20 14:30:36 2015 - [info] * Phase 3: Master Recovery Phase completed. Mon Apr 20 14:30:36 2015 - [info] Mon Apr 20 14:30:36 2015 - [info] * Phase 4: Slaves Recovery Phase.. Mon Apr 20 14:30:36 2015 - [info] Mon Apr 20 14:30:36 2015 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Mon Apr 20 14:30:36 2015 - [info] Mon Apr 20 14:30:36 2015 - [info] -- Slave diff file generation on host vdbsrv3(192.168.1.8:3306) started, pid: 7726. Check tmp log /var/log/masterha/app1/vdbsrv3_3306_20150420143022.log if it takes time.. Mon Apr 20 14:30:37 2015 - [info] Mon Apr 20 14:30:37 2015 - [info] Log messages from vdbsrv3 ... Mon Apr 20 14:30:37 2015 - [info] Mon Apr 20 14:30:36 2015 - [info] Server vdbsrv3 received relay logs up to: mysql-bin.000020:313962621 Mon Apr 20 14:30:36 2015 - [info] Need to get diffs from the latest slave(vdbsrv2) up to: mysql-bin.000020:315514388 (using the latest slave's relay logs) ###此时vdbsrv3需要从vdbsrv2获取差量日志 Mon Apr 20 14:30:36 2015 - [info] Connecting to the latest slave host vdbsrv2, generating diff relay log files.. Mon Apr 20 14:30:36 2015 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.1.8 --latest_mlf=mysql-bin.000020 --latest_rmlp=315514388 --target_mlf=mysql-bin.000020 --target_rmlp=313962621 --server_id=1001 --diff_file_readtolatest=/var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog --workdir=/var/tmp --timestamp=20150420143022 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_log_info=/data/mysqldata/relay-log.info --relay_dir=/data/mysqldata/ Mon Apr 20 14:30:37 2015 - [info] Opening /data/mysqldata/relay-log.info ... ok. Relay log found at /data/mysqldata, up to vdbsrv2-relay-bin.000003 Fast relay log position search succeeded. Target relay log file/position found. start_file:vdbsrv2-relay-bin.000003, start_pos:6265486. Concat binary/relay logs from vdbsrv2-relay-bin.000003 pos 6265486 to vdbsrv2-relay-bin.000003 EOF into /var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog .. ###注,此时是从vdbsrv2中继日志获取 Binlog Checksum enabled Binlog Checksum enabled Dumping binlog format description event, from position 0 to 283.. ok. Dumping effective binlog data from /data/mysqldata/vdbsrv2-relay-bin.000003 position 6265486 to tail(7817253).. ok. Binlog Checksum enabled Binlog Checksum enabled Concat succeeded. Generating diff relay log succeeded. Saved at /var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog . scp vdbsrv2:/var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog to root@192.168.1.8(22) succeeded. Mon Apr 20 14:30:37 2015 - [info] Generating diff files succeeded. Mon Apr 20 14:30:37 2015 - [info] End of log messages from vdbsrv3. Mon Apr 20 14:30:37 2015 - [info] -- Slave diff log generation on host vdbsrv3(192.168.1.8:3306) succeeded. Mon Apr 20 14:30:37 2015 - [info] Generating relay diff files from the latest slave succeeded. Mon Apr 20 14:30:37 2015 - [info] Mon Apr 20 14:30:37 2015 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Mon Apr 20 14:30:37 2015 - [info] Mon Apr 20 14:30:37 2015 - [info] -- Slave recovery on host vdbsrv3(192.168.1.8:3306) started, pid: 7735. Check tmp log /var/log/masterha/app1/vdbsrv3_3306_20150420143022.log if it takes time.. Mon Apr 20 14:30:43 2015 - [info] Mon Apr 20 14:30:43 2015 - [info] Log messages from vdbsrv3 ... Mon Apr 20 14:30:43 2015 - [info] Mon Apr 20 14:30:37 2015 - [info] Sending binlog.. ###这部分则是在原master宕机后未发送的binlog部分 Mon Apr 20 14:30:38 2015 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog to root@vdbsrv3:/var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog succeeded. Mon Apr 20 14:30:38 2015 - [info] Starting recovery on vdbsrv3(192.168.1.8:3306).. Mon Apr 20 14:30:38 2015 - [info] Generating diffs succeeded. Mon Apr 20 14:30:38 2015 - [info] Waiting until all relay logs are applied. Mon Apr 20 14:30:38 2015 - [info] done. Mon Apr 20 14:30:38 2015 - [info] Getting slave status.. Mon Apr 20 14:30:38 2015 - [info] This slave(vdbsrv3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000020:313962621). No need to recover from Exec_Master_Log_Pos. Mon Apr 20 14:30:38 2015 - [info] Connecting to the target slave host vdbsrv3, running recover script.. Mon Apr 20 14:30:38 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=vdbsrv3 --slave_ip=192.168.1.8 --slave_port=3306 --apply_files=/var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog, /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog --workdir=/var/tmp --target_version=5.6.22-log --timestamp=20150420143022 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Mon Apr 20 14:30:43 2015 - [info] Concat all apply files to /var/tmp/total_binlog_for_vdbsrv3_3306.20150420143022.binlog .. Copying the first binlog file /var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog to /var/tmp/total_binlog_for_vdbsrv3_3306.20150420143022.binlog.. ok. Dumping binlog head events (rotate events), skipping format description events from /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog.. Binlog Checksum enabled dumped up to pos 120. ok. /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog has effective binlog events from pos 120. Dumping effective binlog data from /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog position 120 to tail(143).. ok. Concat succeeded. All apply target binary logs are concatinated at /var/tmp/total_binlog_for_vdbsrv3_3306.20150420143022.binlog . Applying differential binary/relay log files /var/tmp/relay_from_read_to_latest_vdbsrv3_3306_20150420143022.binlog, /var/tmp/saved_master_binlog_from_vdbsrv1_3306_20150420143022.binlog on vdbsrv3:3306. This may take long time... Applying log files succeeded. Mon Apr 20 14:30:43 2015 - [info] All relay logs were successfully applied. Mon Apr 20 14:30:43 2015 - [info] Resetting slave vdbsrv3(192.168.1.8:3306) and starting replication from the new master vdbsrv2(192.168.1.7:3306).. Mon Apr 20 14:30:43 2015 - [info] Executed CHANGE MASTER. Mon Apr 20 14:30:43 2015 - [info] Slave started. Mon Apr 20 14:30:43 2015 - [info] End of log messages from vdbsrv3. Mon Apr 20 14:30:43 2015 - [info] -- Slave recovery on host vdbsrv3(192.168.1.8:3306) succeeded. Mon Apr 20 14:30:43 2015 - [info] All new slave servers recovered successfully. Mon Apr 20 14:30:43 2015 - [info] Mon Apr 20 14:30:43 2015 - [info] * Phase 5: New master cleanup phase.. Mon Apr 20 14:30:43 2015 - [info] Mon Apr 20 14:30:43 2015 - [info] Resetting slave info on the new master.. Mon Apr 20 14:30:43 2015 - [info] vdbsrv2: Resetting slave info succeeded. Mon Apr 20 14:30:43 2015 - [info] Master failover to vdbsrv2(192.168.1.7:3306) completed successfully. Mon Apr 20 14:30:43 2015 - [info]
----- Failover Report -----
app1: MySQL Master failover vdbsrv1(192.168.1.6:3306) to vdbsrv2(192.168.1.7:3306) succeeded
Master vdbsrv1(192.168.1.6:3306) is down! ###整个failover总时间14:30:43-14:30:19=24s
Check MHA Manager logs at vdbsrv4:/var/log/masterha/app1/manager.log for details.
Started automated(non-interactive) failover. Invalidated master IP address on vdbsrv1(192.168.1.6:3306) The latest slave vdbsrv2(192.168.1.7:3306) has all relay logs for recovery. Selected vdbsrv2(192.168.1.7:3306) as a new master. vdbsrv2(192.168.1.7:3306): OK: Applying all logs succeeded. vdbsrv2(192.168.1.7:3306): OK: Activated master IP address. vdbsrv3(192.168.1.8:3306): Generating differential relay logs up to vdbsrv2(192.168.1.7:3306)succeeded. Generating relay diff files from the latest slave succeeded. vdbsrv3(192.168.1.8:3306): OK: Applying all logs succeeded. Slave started, replicating from vdbsrv2(192.168.1.7:3306) vdbsrv2(192.168.1.7:3306): Resetting slave info succeeded. Master failover to vdbsrv2(192.168.1.7:3306) completed successfully.