前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >MySQL 8.0 MGR网络抖动一例

MySQL 8.0 MGR网络抖动一例

作者头像
AsiaYe
发布2022-04-01 19:29:23
1.7K0
发布2022-04-01 19:29:23
举报
文章被收录于专栏:DBA随笔

MySQL 8.0 MGR网络抖动怎么办?

今天中午,线上一个MySQL8.0的MGR失联了一阵,之前其实没有遇到过这个场景,觉得挺新鲜,就记录了下当时的状态。

01

背景

首先介绍下我们这套环境,这套环境是由4个MySQL 8.0.20节点组成的MGR集群(建议配置奇数个节点,这套环境比较特殊),配置的是MGR的多主环境。

线上某个业务反馈连接MGR有连接报错,报错内容如下:

exec insert into check_job failed: Error 1290: The MySQL server is running with the --super-read-only option so it cannot execute this statement

报错原因:业务向MGR中写入数据的时候,报错MGR开启了--super-read-only参数,写不进去。

正常运行的MGR是不会将节点设置为--super-read-only的,于是使用SQL查看了下MGR集群的状态,发现果然掉了一个节点。

正常节点

代码语言:javascript
复制
11:55:29> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST  | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 18142f04-95fe-11ec-aff9-246e96be2d00 | 10.xx.xx.184 |        5562 | ONLINE       | PRIMARY     | 8.0.20         |
| group_replication_applier | 32a10c3f-ce69-11eb-bd13-fa163e311dfa | 10.xxx.1.236 |        5562 | ONLINE       | PRIMARY     | 8.0.20         |
| group_replication_applier | 63a8b9e3-95f9-11ec-bfc3-fa163e06c313 | 10.xxx.1.129 |        5562 | ONLINE       | PRIMARY     | 8.0.20         |
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)

可以看到,正常节点无法和异常的节点通信,当前组里面只有3个节点了。

异常节点:

代码语言:javascript
复制
select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | ca12c7a7-95fd-11ec-9979-6c92bf982df2 | 10.xx.xx.21 |        5562 | ERROR        |             | 8.0.20         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
1 row in set (0.00 sec)

可以看到,异常的节点无法和其他3个正常节点通信,而且自己的状态是ERROR状态。

02

排查过程

首先查看坏掉节点的MySQL日志,报错如下,为了方便理解,我对日志做了一些注释:

代码语言:javascript
复制
2022-03-30T08:35:12.689967+08:00 26 [Warning] [MY-010957] [Server] The replication timestamps have returned to normal values.
----------这部分日志说明它连接不上其他节点了----------------
2022-03-30T11:23:59.451419+08:00 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address 10.xxx.1.236:5562 has become unreachable.'
2022-03-30T11:23:59.466959+08:00 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address 10.xxx.17.37:5562 has become unreachable.'
2022-03-30T11:23:59.467008+08:00 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address 10.xxx.1.129:5562 has become unreachable.'
2022-03-30T11:23:59.467020+08:00 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address 10.xx.xx.184:5562 has become unreachable.'

---------无法连接其他节点,停止所有更新,并提示可以使用参数group_replication_force_members重开一个组--------------
2022-03-30T11:23:59.467218+08:00 0 [ERROR] [MY-011495] [Repl] Plugin group_replication reported: 'This server is not able to reach a majority of members in the group. This
server will now block all updates. The server will remain blocked until contact with the majority is restored. It is possible to use group_replication_force_members to forc
e a new group membership.'

----------这部分日志说明它网络恢复了----------------
2022-03-30T11:24:02.141493+08:00 0 [Warning] [MY-011494] [Repl] Plugin group_replication reported: 'Member with address 10.xxx.1.236:5562 is reachable again.'
2022-03-30T11:24:03.219209+08:00 0 [Warning] [MY-011494] [Repl] Plugin group_replication reported: 'Member with address 10.xxx.17.37:5562 is reachable again.'
2022-03-30T11:24:03.245901+08:00 0 [Warning] [MY-011494] [Repl] Plugin group_replication reported: 'Member with address 10.xxx.1.129:5562 is reachable again.'
2022-03-30T11:24:03.245941+08:00 0 [Warning] [MY-011494] [Repl] Plugin group_replication reported: 'Member with address 10.xx.xx.184:5562 is reachable again.'
2022-03-30T11:24:03.245956+08:00 0 [Warning] [MY-011498] [Repl] Plugin group_replication reported: 'The member has resumed contact with a majority of the members in the gro
up. Regular operation is restored and transactions are unblocked.'

--------由于网络原因,它被从MGR复制组中踢出了,状态变更成ERROR-------
2022-03-30T11:24:06.907099+08:00 0 [ERROR] [MY-011505] [Repl] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing membe
r status to ERROR.'

--------一些事务将会回滚--------------
2022-03-30T11:24:06.922962+08:00 0 [Warning] [MY-011630] [Repl] Plugin group_replication reported: 'Due to a plugin error, some transactions were unable to be certified and
will now rollback.'

--------MySQL被自动设置成read only模式-----------
2022-03-30T11:24:06.923035+08:00 0 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was de
tected.'

--------等待冲突检测,执行before_commit函数失败---------
2022-03-30T11:24:06.923078+08:00 312718417 [ERROR] [MY-011615] [Repl] Plugin group_replication reported: 'Error while waiting for conflict detection procedure to finish on
session 312718417'
2022-03-30T11:24:06.923081+08:00 397259854 [ERROR] [MY-011615] [Repl] Plugin group_replication reported: 'Error while waiting for conflict detection procedure to finish on
session 397259854'
2022-03-30T11:24:06.923165+08:00 312718417 [ERROR] [MY-010207] [Repl] Run function 'before_commit' in plugin 'group_replication' failed
2022-03-30T11:24:06.923163+08:00 397259854 [ERROR] [MY-010207] [Repl] Run function 'before_commit' in plugin 'group_replication' failed

-------由于自身状态ERROR无法加入组,提示修复错误或者重启MGR------
2022-03-30T11:24:08.587817+08:00 397259854 [ERROR] [MY-011601] [Repl] Plugin group_replication reported: 'Transaction cannot be executed while Group Replication is on ERROR
state. Check for errors and restart the plugin'

得了,那就按照提示,重启下MGR吧:

代码语言:javascript
复制
12:00:38 >stop group_replication;
Query OK, 0 rows affected (1.01 sec)

12:00:46 >start group_replication;                                  
Query OK, 0 rows affected (3.36 sec)


12:00:52 >select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST  | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 18142f04-95fe-11ec-aff9-246e96be2d00 | 10.xx.xx.184 |        5562 | ONLINE       | PRIMARY     | 8.0.20         |
| group_replication_applier | 32a10c3f-ce69-11eb-bd13-fa163e311dfa | 10.xxx.1.236 |        5562 | ONLINE       | PRIMARY     | 8.0.20         |
| group_replication_applier | 63a8b9e3-95f9-11ec-bfc3-fa163e06c313 | 10.xxx.1.129 |        5562 | ONLINE       | PRIMARY     | 8.0.20         |
| group_replication_applier | ca12c7a7-95fd-11ec-9979-6c92bf982df2 | 10.xx.xx.21  |        5562 | RECOVERING   | PRIMARY     | 8.0.20         |
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
4 rows in set (0.00 sec)

select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST  | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 18142f04-95fe-11ec-aff9-246e96be2d00 | 10.xx.xx.184 |        5562 | ONLINE       | PRIMARY     | 8.0.20         |
| group_replication_applier | 32a10c3f-ce69-11eb-bd13-fa163e311dfa | 10.xxx.1.236 |        5562 | ONLINE       | PRIMARY     | 8.0.20         |
| group_replication_applier | 63a8b9e3-95f9-11ec-bfc3-fa163e06c313 | 10.xxx.1.129 |        5562 | ONLINE       | PRIMARY     | 8.0.20         |
| group_replication_applier | ca12c7a7-95fd-11ec-9979-6c92bf982df2 | 10.xx.xx.21  |        5562 | ONLINE       | PRIMARY     | 8.0.20         |
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
4 rows in set (0.00 sec)

重启之后,可以看到,ERROR的状态变成了Recovering,说明节点正在执行恢复操作。不一会儿,就都变成Online状态了。

03

总结

MGR本身的自愈功能还是比较强的,多数派的协议保证了事务的强一致。

从运维经验上讲,一般情况下,MGR中比较棘手的问题就是认证冲突,报错内容形如:

ERROR 3101 (HY000): Plugin instructed the server to rollback the current transaction.

目前遇到认证冲突这种问题,可以通过重试的办法来解决。

如果是MGR深度用户,提几个建议:

1、使用MySQL8.0 社区版本的MGR,最好8.0.22以后

2、最好使用MGR单主模式,冲突会少些。如果想要体验多主模式,又遇到了不可解决的问题,可以退而求其次,多主模式下,集中在一个节点进行写入,利用它的故障自愈。

3、建议使用GreatSQL 替代社区版本MGR,修复了官方MGR的很多Bug。项目链接:https://gitee.com/GreatSQL/GreatSQL-Doc

今天内容就到这里吧。

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2022-03-30,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 DBA随笔 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • MySQL 8.0 MGR网络抖动怎么办?
相关产品与服务
云数据库 SQL Server
腾讯云数据库 SQL Server (TencentDB for SQL Server)是业界最常用的商用数据库之一,对基于 Windows 架构的应用程序具有完美的支持。TencentDB for SQL Server 拥有微软正版授权,可持续为用户提供最新的功能,避免未授权使用软件的风险。具有即开即用、稳定可靠、安全运行、弹性扩缩等特点。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档