MHA快速搭建

很早之前写过MHA的文章,但是常常在技术群看到有同学问MHA搭建的问题,不是权限问题就是配置问题,我在这里就再次一写下配置过程以及快速的搭建。如果想知道更多的细节与原理,请参考:MySQL高可用架构之MHA

环境:

1主1从,manager放在从库。

主库:192.168.0.10

从库:192.168.0.20

两台机器的mysql安装完成初始化以后进行复制搭建,首先登录主库(192.168.0.10),查看pos点:

mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 |      154 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

然后在主库(192.168.0.10) 添加复制账号以及mha用的账号

mysql> grant replication slave on *.* to 'repl'@'192.168.0.10'  identified by 'repl'; 
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> grant replication slave on *.* to 'repl'@'192.168.0.20'  identified by 'repl'; 
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> grant all on *.* to 'root'@'192.168.0.20'  identified by '123';                     
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> grant all on *.* to 'root'@'192.168.0.10'  identified by '123';  
Query OK, 0 rows affected, 1 warning (0.01 sec)

从库(192.168.0.10 )change到mysql-bin.000001,pos点154

CHANGE MASTER TO  MASTER_HOST='192.168.0.10',MASTER_USER='repl',MASTER_PASSWORD='repl',MASTER_LOG_FILE='mysql-bin.000001',MASTER_LOG_POS=154;
mysql> start slave;
Query OK, 0 rows affected (0.15 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.10
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 1344
               Relay_Log_File: relaylog.000002
                Relay_Log_Pos: 1510
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
复制到这来就搭建完成了,然后配置192.168.0.10和192.168.0.20 ssh互信。(两台机器都执行)
ssh-keygen -t rsa
ssh-copy-id -i  /root/.ssh/id_rsa.pub  '-p 22 192.168.0.10'
ssh-copy-id -i  /root/.ssh/id_rsa.pub  '-p 22 192.168.0.20'

安装MHA软件,首先安装epel源。(2台机器)

rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

安装依赖包

yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y

安装MHA软件(两台机器)

tar xf mha4mysql-node-0.56.tar.gz 
cd mha4mysql-node-0.56
perl Makefile.PL
make && make install

tar xf mha4mysql-manager-0.56.tar.gz
cd mha4mysql-manager-0.56
perl Makefile.PL
make && make install

在从库(192.168.0.20)创建目录:

mkdir /data/mha/3306/log
cd /data/mha/3306/touch mha.cnf

mha.cnf配置文件内容如下:

[server default]
client_bindir=/usr/local/mysql/bin/
manager_log=/data/mha/3306/log/manager.log
manager_workdir=/data/mha/3306/log
master_binlog_dir=/data/mysql/3306/binlog/
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script= /usr/local/bin/master_ip_online_change
report_script=/usr/local/bin/send_report
init_conf_load_script=/usr/local/bin/load_cnf
remote_workdir=/data/mysql/3306/mysqltmp
#secondary_check_script= /usr/local/bin/masterha_secondary_check -s 192.168.0.30 -s 192.168.0.40
user=root
ping_interval=3
repl_user=repl
ssh_port=22
ssh_user=root
max_ping_errors=40

[server1]
hostname=192.168.0.10
port=3306

[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.0.20
port=3306

编辑文件 /usr/local/bin/load_cnf 里面的密码修改成对应的密码

#!/usr/bin/perl

  print "password=123\n";
  print "repl_password=repl\n";

执行chek命令查看复制是否正常:

[[email protected] 3306]# masterha_check_repl --conf=/data/mha/3306/mha.cnf 
Mon Mar 13 21:36:54 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Mar 13 21:36:54 2017 - [info] Reading application default configuration from /data/mha/3306/mha.cnf..
Mon Mar 13 21:36:54 2017 - [info] Updating application default configuration from /usr/local/bin/load_cnf..
Mon Mar 13 21:36:54 2017 - [info] Reading server configuration from /data/mha/3306/mha.cnf..
Mon Mar 13 21:36:54 2017 - [info] Setting max_ping_errors to 40, ping_interval to 3.
Mon Mar 13 21:36:54 2017 - [info] MHA::MasterMonitor version 0.56.
Mon Mar 13 21:36:54 2017 - [info] GTID failover mode = 0
Mon Mar 13 21:36:54 2017 - [info] Dead Servers:
Mon Mar 13 21:36:54 2017 - [info] Alive Servers:
Mon Mar 13 21:36:54 2017 - [info]   192.168.0.10(192.168.0.10:3306)
Mon Mar 13 21:36:54 2017 - [info]   192.168.0.20(192.168.0.20:3306)
Mon Mar 13 21:36:54 2017 - [info] Alive Slaves:
Mon Mar 13 21:36:54 2017 - [info]   192.168.0.20(192.168.0.20:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Mar 13 21:36:54 2017 - [info]     Replicating from 192.168.0.10(192.168.0.10:3306)
Mon Mar 13 21:36:54 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Mar 13 21:36:54 2017 - [info] Current Alive Master: 192.168.0.10(192.168.0.10:3306)
Mon Mar 13 21:36:54 2017 - [info] Checking slave configurations..
Mon Mar 13 21:36:54 2017 - [info]  read_only=1 is not set on slave 192.168.0.20(192.168.0.20:3306).
Mon Mar 13 21:36:54 2017 - [warning]  relay_log_purge=0 is not set on slave 192.168.0.20(192.168.0.20:3306).
Mon Mar 13 21:36:54 2017 - [info] Checking replication filtering settings..
Mon Mar 13 21:36:54 2017 - [info]  binlog_do_db= , binlog_ignore_db= 
Mon Mar 13 21:36:54 2017 - [info]  Replication filtering check ok.
Mon Mar 13 21:36:54 2017 - [info] GTID (with auto-pos) is not supported
Mon Mar 13 21:36:54 2017 - [info] Starting SSH connection tests..
Mon Mar 13 21:36:55 2017 - [info] All SSH connection tests passed successfully.
Mon Mar 13 21:36:55 2017 - [info] Checking MHA Node version..
Mon Mar 13 21:36:55 2017 - [info]  Version check ok.
Mon Mar 13 21:36:55 2017 - [info] Checking SSH publickey authentication settings on the current master..
Mon Mar 13 21:36:55 2017 - [info] HealthCheck: SSH to 192.168.0.10 is reachable.
Mon Mar 13 21:36:56 2017 - [info] Master MHA Node version is 0.56.
Mon Mar 13 21:36:56 2017 - [info] Checking recovery script configurations on 192.168.0.10(192.168.0.10:3306)..
Mon Mar 13 21:36:56 2017 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/3306/binlog/ --output_file=/data/mysql/3306/mysqltmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000001 
Mon Mar 13 21:36:56 2017 - [info]   Connecting to [email protected]192.168.0.10(192.168.0.10:22).. 
  Creating /data/mysql/3306/mysqltmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysql/3306/binlog/, up to mysql-bin.000001
Mon Mar 13 21:36:56 2017 - [info] Binlog setting check done.
Mon Mar 13 21:36:56 2017 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Mar 13 21:36:56 2017 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.0.20 --slave_ip=192.168.0.20 --slave_port=3306 --workdir=/data/mysql/3306/mysqltmp --target_version=5.7.17-log --manager_version=0.56 --client_bindir=/usr/local/mysql/bin/ --relay_dir=/data/mysql/3306/relaylog --current_relay_log=relaylog.000002  --slave_pass=xxx
Mon Mar 13 21:36:56 2017 - [info]   Connecting to [email protected]192.168.0.20(192.168.0.20:22).. 
  Checking slave recovery environment settings..
    Relay log found at /data/mysql/3306/relaylog, up to relaylog.000002
    Temporary relay log file is /data/mysql/3306/relaylog/relaylog.000002
    Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Mon Mar 13 21:36:56 2017 - [info] Slaves settings check done.
Mon Mar 13 21:36:56 2017 - [info] 
192.168.0.10(192.168.0.10:3306) (current master)
 +--192.168.0.20(192.168.0.20:3306)

Mon Mar 13 21:36:56 2017 - [info] Checking replication health on 192.168.0.20..
Mon Mar 13 21:36:56 2017 - [info]  ok.
Mon Mar 13 21:36:56 2017 - [info] Checking master_ip_failover_script status:
Mon Mar 13 21:36:56 2017 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.0.10 --orig_master_ip=192.168.0.10 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1 192.168.0.88/16===

Checking the Status of the script.. OK 
Mon Mar 13 21:36:56 2017 - [info]  OK.
Mon Mar 13 21:36:56 2017 - [warning] shutdown_script is not defined.
Mon Mar 13 21:36:56 2017 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

主库(192.168.0.10)执行命令,启动vip:

/sbin/ifconfig eth1:1 192.168.0.88/16

在线切换,把主库切到192.168.0.20

masterha_master_switch --master_state=alive --conf=/data/mha/3306/mha.cnf --new_master_host=192.168.0.20 --new_master_port=3306 --orig_master_is_new_slave

输出如下:

Mon Mar 13 22:15:08 2017 - [info] MHA::MasterRotate version 0.56.
Mon Mar 13 22:15:08 2017 - [info] Starting online master switch..
Mon Mar 13 22:15:08 2017 - [info] 
Mon Mar 13 22:15:08 2017 - [info] * Phase 1: Configuration Check Phase..
Mon Mar 13 22:15:08 2017 - [info] 
Mon Mar 13 22:15:08 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Mar 13 22:15:08 2017 - [info] Reading application default configuration from /data/mha/3306/mha.cnf..
Mon Mar 13 22:15:08 2017 - [info] Updating application default configuration from /usr/local/bin/load_cnf..
Mon Mar 13 22:15:08 2017 - [info] Reading server configuration from /data/mha/3306/mha.cnf..
Mon Mar 13 22:15:08 2017 - [info] Setting max_ping_errors to 40, ping_interval to 3.
Mon Mar 13 22:15:08 2017 - [info] GTID failover mode = 0
Mon Mar 13 22:15:08 2017 - [info] Current Alive Master: 192.168.0.10(192.168.0.10:3306)
Mon Mar 13 22:15:08 2017 - [info] Alive Slaves:
Mon Mar 13 22:15:08 2017 - [info]   192.168.0.20(192.168.0.20:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Mar 13 22:15:08 2017 - [info]     Replicating from 192.168.0.10(192.168.0.10:3306)
Mon Mar 13 22:15:08 2017 - [info]     Primary candidate for the new Master (candidate_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.0.10(192.168.0.10:3306)? (YES/no): yes
Mon Mar 13 22:15:09 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Mar 13 22:15:09 2017 - [info]  ok.
Mon Mar 13 22:15:09 2017 - [info] Checking MHA is not monitoring or doing failover..
Mon Mar 13 22:15:09 2017 - [info] Checking replication health on 192.168.0.20..
Mon Mar 13 22:15:09 2017 - [info]  ok.
Mon Mar 13 22:15:09 2017 - [info] 192.168.0.20 can be new master.
Mon Mar 13 22:15:09 2017 - [info] 
From:
192.168.0.10(192.168.0.10:3306) (current master)
 +--192.168.0.20(192.168.0.20:3306)

To:
192.168.0.20(192.168.0.20:3306) (new master)
 +--192.168.0.10(192.168.0.10:3306)

Starting master switch from 192.168.0.10(192.168.0.10:3306) to 192.168.0.20(192.168.0.20:3306)? (yes/NO): yes
Mon Mar 13 22:15:10 2017 - [info] Checking whether 192.168.0.20(192.168.0.20:3306) is ok for the new master..
Mon Mar 13 22:15:10 2017 - [info]  ok.
Mon Mar 13 22:15:10 2017 - [info] 192.168.0.10(192.168.0.10:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Mar 13 22:15:11 2017 - [info] 192.168.0.10(192.168.0.10:3306): Resetting slave pointing to the dummy host.
Mon Mar 13 22:15:11 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Mar 13 22:15:11 2017 - [info] 
Mon Mar 13 22:15:11 2017 - [info] * Phase 2: Rejecting updates Phase..
Mon Mar 13 22:15:11 2017 - [info] 
Mon Mar 13 22:15:11 2017 - [info] Executing master ip online change script to disable write on the current master:
Mon Mar 13 22:15:11 2017 - [info]   /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.0.10 --orig_master_ip=192.168.0.10 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123' --new_master_host=192.168.0.20 --new_master_ip=192.168.0.20 --new_master_port=3306 --new_master_user='root' --new_master_password='123' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave
Mon Mar 13 22:15:11 2017 173907 Set read_only on the new master.. ok.
Mon Mar 13 22:15:11 2017 177320 Drpping app user on the orig master..
Mon Mar 13 22:15:11 2017 177844 Set read_only=1 on the orig master.. ok.
Mon Mar 13 22:15:11 2017 179299 Killing all application threads..
Mon Mar 13 22:15:11 2017 179318 done.
Mon Mar 13 22:15:11 2017 - [info]  ok.
Mon Mar 13 22:15:11 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Mar 13 22:15:11 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Mar 13 22:15:11 2017 - [info]  ok.
Mon Mar 13 22:15:11 2017 - [info] Orig master binlog:pos is mysql-bin.000001:2265.
Mon Mar 13 22:15:11 2017 - [info]  Waiting to execute all relay logs on 192.168.0.20(192.168.0.20:3306)..
Mon Mar 13 22:15:11 2017 - [info]  master_pos_wait(mysql-bin.000001:2265) completed on 192.168.0.20(192.168.0.20:3306). Executed 0 events.
Mon Mar 13 22:15:11 2017 - [info]   done.
Mon Mar 13 22:15:11 2017 - [info] Getting new master's binlog name and position..
Mon Mar 13 22:15:11 2017 - [info]  mysql-bin.000001:2265
Mon Mar 13 22:15:11 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.20', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=2265, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Mar 13 22:15:11 2017 - [info] Executing master ip online change script to allow write on the new master:
Mon Mar 13 22:15:11 2017 - [info]   /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.0.10 --orig_master_ip=192.168.0.10 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123' --new_master_host=192.168.0.20 --new_master_ip=192.168.0.20 --new_master_port=3306 --new_master_user='root' --new_master_password='123' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave
Mon Mar 13 22:15:11 2017 493612 Set read_only=0 on the new master.
Mon Mar 13 22:15:11 2017 493874 Creating app user on the new master..
Mon Mar 13 22:15:32 2017 - [info]  ok.
Mon Mar 13 22:15:32 2017 - [info] 
Mon Mar 13 22:15:32 2017 - [info] * Switching slaves in parallel..
Mon Mar 13 22:15:32 2017 - [info] 
Mon Mar 13 22:15:32 2017 - [info] Unlocking all tables on the orig master:
Mon Mar 13 22:15:32 2017 - [info] Executing UNLOCK TABLES..
Mon Mar 13 22:15:32 2017 - [info]  ok.
Mon Mar 13 22:15:32 2017 - [info] Starting orig master as a new slave..
Mon Mar 13 22:15:32 2017 - [info]  Resetting slave 192.168.0.10(192.168.0.10:3306) and starting replication from the new master 192.168.0.20(192.168.0.20:3306)..
Mon Mar 13 22:15:32 2017 - [info]  Executed CHANGE MASTER.
Mon Mar 13 22:15:32 2017 - [info]  Slave started.
Mon Mar 13 22:15:32 2017 - [info] All new slave servers switched successfully.
Mon Mar 13 22:15:32 2017 - [info] 
Mon Mar 13 22:15:32 2017 - [info] * Phase 5: New master cleanup phase..
Mon Mar 13 22:15:32 2017 - [info] 
Mon Mar 13 22:15:32 2017 - [info]  192.168.0.20: Resetting slave info succeeded.
Mon Mar 13 22:15:32 2017 - [info] Switching master to 192.168.0.20(192.168.0.20:3306) completed successfully.

关于配置文件中的参数: max_ping_errors=40,这个是修改了源码,增加了检测次数的定义,默认是3次,太容易误切换。

启动管理进程:

/usr/bin/nohup /usr/local/bin/masterha_manager --conf=/data/mha/3306/mha.cnf --ignore_last_failover > /data/mha/3306/log/manager.log 2>&1 &

在主库(192.168.0.10)封掉ip,可以看到日志输出。

iptables -I INPUT -s 192.168.0.20 -j DROP
Mon Mar 13 22:22:50 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
Mon Mar 13 22:22:50 2017 - [warning] Connection failed 3 time(s)..
Mon Mar 13 22:22:53 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
Mon Mar 13 22:22:53 2017 - [warning] Connection failed 4 time(s)..
Mon Mar 13 22:22:56 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
Mon Mar 13 22:22:56 2017 - [warning] Connection failed 5 time(s)..
Mon Mar 13 22:22:59 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
Mon Mar 13 22:22:59 2017 - [warning] Connection failed 6 time(s)..
Mon Mar 13 22:23:02 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
Mon Mar 13 22:23:02 2017 - [warning] Connection failed 7 time(s)..
Mon Mar 13 22:23:05 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
Mon Mar 13 22:23:05 2017 - [warning] Connection failed 8 time(s)..
Mon Mar 13 22:23:08 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
Mon Mar 13 22:23:08 2017 - [warning] Connection failed 9 time(s)..
Mon Mar 13 22:23:11 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
Mon Mar 13 22:23:11 2017 - [warning] Connection failed 10 time(s)..
Mon Mar 13 22:23:14 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))
Mon Mar 13 22:23:14 2017 - [warning] Connection failed 11 time(s)..
Mon Mar 13 22:23:17 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.10' (4))

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏一个会写诗的程序员的博客

Gradle 环境安装Installation

The current Gradle release is 4.10.2. You can download binaries and view docs fo...

20230
来自专栏Netkiller

Phalcon VS Spring 用法对照手册

Phalcon VS Spring 摘要 Phalcon VS Spring 用法对照表 ---- 目录 1. Install 1.1. Phalcon 1.2...

43660
来自专栏Netkiller

Spring boot with Thymeleaf

本文节选自电子书《Netkiller Java 手札》 5.19. Spring boot with Thymeleaf 5.19.1. Maven <dep...

373130
来自专栏用户2442861的专栏

java SLF4J 使用其他的 log框架

http://saltnlight5.blogspot.com/2013/08/how-to-configure-slf4j-with-different.ht...

15710
来自专栏乐沙弥的世界

MHA 自动故障转移步骤及过程剖析

    MHA是众多使用MySQL数据库企业高可用的不二选择,它简单易用,功能强大,实现了基于MySQL replication架构的自动主从故障转移,本文主要...

12330
来自专栏Java帮帮-微信公众号-技术文章全总结

Java操作数据库Spring(2)

pom.xml <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www...

40570
来自专栏Netkiller

Spring Cloud Config

摘要: 本文节选自《Netkiller Java 手札》 Spring Cloud Config 本文节选自《Netkiller Java 手札》 https:...

40070
来自专栏增长技术

deploy to heroku

12410
来自专栏云知识学习

kubernetes 基础集群排障

在排错过程中,kubectl 是最重要的工具,通常也是定位错误的起点。这里也列出一些常用的命令,在后续的各种排错过程中都会经常用到。

1.6K120
来自专栏一个会写诗的程序员的博客

$ spring init --list

11920

扫码关注云+社区

领取腾讯云代金券