MHA 在测试手动故障转移和在线切换的过程中,碰到了2个比较诡异的问题,在使用IP地址调用的时候均无法测试成功,出现了Detected dead master xxx does not match with specified dead master以及xxx is not alive。下面是这2个错误问题的描述及解决方案。
1、MHA配置文件 [root@vdbsrv4 ~]# more /etc/masterha/app1.cnf [server default] manager_workdir=/var/log/masterha/app1 manager_log=/var/log/masterha/app1/manager.log
user=mha password=xxx ssh_user=root repl_user=repl repl_password=repl ping_interval=1 shutdown_script="" master_ip_online_change_script="" report_script="" #master_ip_failover_script=/usr/bin/master_ip_failover master_ip_failover_script=/tmp/master_ip_failover [server1] hostname=vdbsrv1 master_binlog_dir=/data/mysqldata
[server2] hostname=vdbsrv2 master_binlog_dir=/data/mysqldata
[server3] hostname=vdbsrv3 master_binlog_dir=/data/mysqldata/ #candidate_master=1
2、手动故障转移时的错误提示 [root@vdbsrv4 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.1.6 \ > --dead_master_port=3306 --new_master_host=192.168.1.8 --new_master_port=3306 --ignore_last_failover --dead_master_ip=<dead_master_ip> is not set. Using 192.168.1.6. Wed Apr 21 09:08:30 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Apr 21 09:08:30 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Wed Apr 21 09:08:30 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Wed Apr 21 09:08:30 2015 - [info] MHA::MasterFailover version 0.56. Wed Apr 21 09:08:30 2015 - [info] Starting master failover. Wed Apr 21 09:08:30 2015 - [info] Wed Apr 21 09:08:30 2015 - [info] * Phase 1: Configuration Check Phase.. Wed Apr 21 09:08:30 2015 - [info] Wed Apr 21 09:08:31 2015 - [info] GTID failover mode = 0 Wed Apr 21 09:08:31 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterFailover.pm, ln2083] Detected dead master vdbsrv1(192.168.1.6:3306) does not match with specified dead master 192.168.1.6(192.168.1.6:3306)! Wed Apr 21 09:08:31 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterFailover.pm, ln2151] Got ERROR: at /usr/bin/masterha_master_switch line 53
3、在线切换时的错误提示 [root@vdbsrv4 ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.1.8 \ > --orig_master_is_new_slave --running_updates_limit=10000 Tue Apr 21 11:50:14 2015 - [info] MHA::MasterRotate version 0.56. Tue Apr 21 11:50:14 2015 - [info] Starting online master switch.. Tue Apr 21 11:50:14 2015 - [info] Tue Apr 21 11:50:14 2015 - [info] * Phase 1: Configuration Check Phase.. Tue Apr 21 11:50:14 2015 - [info] Tue Apr 21 11:50:14 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Apr 21 11:50:14 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Apr 21 11:50:14 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Apr 21 11:50:14 2015 - [info] GTID failover mode = 0 Tue Apr 21 11:50:14 2015 - [info] Current Alive Master: vdbsrv1(192.168.1.6:3306) Tue Apr 21 11:50:14 2015 - [info] Alive Slaves: Tue Apr 21 11:50:14 2015 - [info] vdbsrv2(192.168.1.7:3306) Version=5.6.22-log (oldest major version between slaves) log-bin:enabled Tue Apr 21 11:50:14 2015 - [info] Replicating from 192.168.1.6(192.168.1.6:3306) Tue Apr 21 11:50:14 2015 - [info] vdbsrv3(192.168.1.8:3306) Version=5.6.22-log (oldest major version between slaves) log-bin:enabled Tue Apr 21 11:50:14 2015 - [info] Replicating from 192.168.1.6(192.168.1.6:3306)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on vdbsrv1(192.168.1.6:3306)? (YES/no): yes Tue Apr 21 11:50:41 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Tue Apr 21 11:50:41 2015 - [info] ok. Tue Apr 21 11:50:41 2015 - [info] Checking MHA is not monitoring or doing failover.. Tue Apr 21 11:50:41 2015 - [info] Checking replication health on vdbsrv2.. Tue Apr 21 11:50:41 2015 - [info] ok. Tue Apr 21 11:50:41 2015 - [info] Checking replication health on vdbsrv3.. Tue Apr 21 11:50:41 2015 - [info] ok. Tue Apr 21 11:50:41 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterRotate.pm, ln228] 192.168.1.8 is not alive! Tue Apr 21 11:50:41 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterRotate.pm, ln613] Failed to get new master! Tue Apr 21 11:50:41 2015 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterRotate.pm, ln652] Got ERROR: at /usr/bin/masterha_master_switch line 53
4、解决方案 直接将IP地址替换为主机名后问题解决,不再演示。
按官方文档描述,参数--dead_master_host=(hostname),而不是可以用IP地址。
If these parameters are not set, --dead_master_ip will be the result of gethostbyname(dead_master_host), and --dead_master_port will be 3306.
补充: 如果配置文件里hostname=IP地址,则在切换的时候使用IP地址也是可行的。 @20150522