前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >PXC集群第3个节点无法加入故障处理

PXC集群第3个节点无法加入故障处理

原创
作者头像
雪人
发布2022-10-13 09:53:34
1.2K0
发布2022-10-13 09:53:34
举报
文章被收录于专栏:DataOpsDataOps

一个PXC 8.0.23集群,因为项目操作导致无法提供服务了,提示信息为:

ERROR 1047 (08S01): WSREP has not yet prepared node for application use

或者

2013 - Lost connection to MySQL server during query

登录各个节点查看集群wsrep_cluster_size均为0,节点状态wsrep_cluster_status都不是Primary状态(好像是not connected),查看grastate.dat文件,3号节点safe_to_bootstrap为1.

因此关闭各个节点,在3号节点启动集群,之后顺利将2号加入,可是在加入1号是遭遇错误如下:

代码语言:txt
复制
2022-01-12T11:12:43.552286Z 0 [Note] [MY-000000] [WSREP-SST] ............Waiting for SST streaming to complete!
2022-01-12T11:20:32.979860Z 0 [ERROR] [MY-000000] [WSREP-SST] Killing SST (16448) with SIGKILL after stalling for 120 seconds
2022-01-12T11:20:33.010860Z 0 [Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_xtrabackup-v2: 行 183: 16450 已杀死               socat -u openssl-listen:4444,reuseaddr,cert=/mysql/pxc/data//server-cert.pem,key=/mysql/pxc/data//server-key.pem,cafile=/mysql/pxc/data//ca.pem,verify=1,retry=30 stdio
2022-01-12T11:20:33.010931Z 0 [Note] [MY-000000] [WSREP-SST]      16451                       | /usr/bin/pxc_extra/pxb-8.0/bin/xbstream -x
2022-01-12T11:20:33.011525Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2022-01-12T11:20:33.011676Z 0 [ERROR] [MY-000000] [WSREP-SST] Error while getting data from donor node:  exit codes: 137 137
2022-01-12T11:20:33.011756Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1268
2022-01-12T11:20:33.011874Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2022-01-12T11:20:33.012861Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2022-01-12T11:20:33.210760Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.222.50.101' --datadir '/mysql/pxc/data/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '15908' --mysqld-version '8.0.23-14.1'   '' : 32 (Broken pipe)
2022-01-12T11:20:33.210898Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2022-01-12T11:20:33.210973Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2022-01-12T11:20:33.211182Z 3 [Note] [MY-000000] [Galera] Processing SST received
2022-01-12T11:20:33.211268Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2022-01-12T11:20:33.211352Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.

网搜的文章五花八门,参考过几个文章,均没用。因为看到错误日志信息--address '10.222.50.101',一度怀疑配置参数wsrep_node_address是否需要显式指定,因为都是默认注释掉的,显式指定后仍然报错如下:

代码语言:txt
复制
2022-01-13T08:03:32.978322Z 0 [Note] [MY-000000] [WSREP-SST] Proceeding with SST.........
2022-01-13T08:03:33.036563Z 0 [Note] [MY-000000] [WSREP-SST] ............Waiting for SST streaming to complete!
2022-01-13T08:12:38.715388Z 0 [Note] [MY-000000] [Galera] Created page /mysql/pxc/data/gcache.page.000000 of size 592621440 bytes
2022-01-13T08:12:51.193262Z 0 [ERROR] [MY-000000] [WSREP-SST] Killing SST (27632) with SIGKILL after stalling for 120 seconds
2022-01-13T08:12:51.217686Z 0 [Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_xtrabackup-v2: line 183: 27634 killed               socat -u openssl-listen:4444,reuseaddr,cert=/mysql/pxc/data//server-cert.pem,key=/mysql/pxc/data//server-key.pem,cafile=/mysql/pxc/data//ca.pem,verify=1,retry=30 stdio
2022-01-13T08:12:51.217754Z 0 [Note] [MY-000000] [WSREP-SST]      27635                       | /usr/bin/pxc_extra/pxb-8.0/bin/xbstream -x
2022-01-13T08:12:51.218372Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR ********************** 
2022-01-13T08:12:51.218550Z 0 [ERROR] [MY-000000] [WSREP-SST] Error while getting data from donor node:  exit codes: 137 137
2022-01-13T08:12:51.218628Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1268
2022-01-13T08:12:51.218722Z 0 [ERROR] [MY-000000] [WSREP-SST] ****************************************************** 
2022-01-13T08:12:51.219631Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2022-01-13T08:12:51.431617Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.230.245.214' --datadir '/mysql/pxc/data/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '27097' --mysqld-version '8.0.23-14.1'   '' : 32 (Broken pipe)
2022-01-13T08:12:51.431820Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2022-01-13T08:12:51.431892Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2022-01-13T08:12:51.432257Z 3 [Note] [MY-000000] [Galera] Processing SST received
2022-01-13T08:12:51.432372Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2022-01-13T08:12:51.432458Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.

也怀疑过防火墙配置问题,去掉所有的配置,并关闭防火墙还是报错依旧。

为了不影响业务,只好先用2个节点提供服务,恢复业务。

同时到官网提交了这个问题,得到了官方回复如下:

代码语言:txt
复制
【matthewb Percona】
Your log indicates that port 4444 is not open TCP/UDP to all hosts. Make sure all necessary ports (3306, 4444, 4567, 4568) are open between all nodes.

【liking】
Thanks for your reply, but I am sure I have closed firewall between all nodes. Maybe there is some other issues?

【Evgeniy_Patlan Percona】
"while getting data from donor node: exit codes: 137 137"
Such issue appeared once it is not possible to connect to the needed port. So please recheck your firewall options

【matthewb Percona】
"I am sure I have closed firewall between all nodes"
That’s your problem. You need to OPEN the firewall between nodes, not close it. Use socat or nc to test connectivity between nodes on the ports I mentioned.

【liking】
Many thanks to you all, I will do this according to your suggest

看到了,官方很肯定是网络端口设置的原因,由于目前网络不太方便,择机再试。

数天后,择机重试,在官方论坛回复如下:

It is ok now.

According to your suggest, I modified the netfilter rules on all nodes like this:

  1. Accept all input
  2. Clear all netfilter rules

Now the cluster works fine.

以下是具体的操作步骤:

代码语言:txt
复制
[root@db-1 ~]#  iptables -P INPUT ACCEPT
[root@db-1 ~]#  iptables -F
[root@db-1 ~]#  iptables -X
[root@db-1 ~]#  iptables -Z
[root@db-1 ~]#  iptables -A INPUT -i lo -j ACCEPT
[root@db-1 ~]#  iptables-save
#Generated by iptables-save v1.4.21 on Mon Jan 24 11:33:23 2022
*filter
:INPUT ACCEPT [884:105489]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [685:162312]
-A INPUT -i lo -j ACCEPT
COMMIT
#Completed on Mon Jan 24 11:33:23 2022

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
云数据库 MySQL
腾讯云数据库 MySQL(TencentDB for MySQL)为用户提供安全可靠,性能卓越、易于维护的企业级云数据库服务。其具备6大企业级特性,包括企业级定制内核、企业级高可用、企业级高可靠、企业级安全、企业级扩展以及企业级智能运维。通过使用腾讯云数据库 MySQL,可实现分钟级别的数据库部署、弹性扩展以及全自动化的运维管理,不仅经济实惠,而且稳定可靠,易于运维。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档