前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >Redis进阶 - 因异常断电导致的Redis Cluster Fail故障处理

Redis进阶 - 因异常断电导致的Redis Cluster Fail故障处理

作者头像
小小工匠
发布2021-08-17 11:36:23
发布2021-08-17 11:36:23
2.5K00
代码可运行
举报
文章被收录于专栏:小工匠聊架构小工匠聊架构
运行总次数:0
代码可运行

Pre

测试环境,搭建的伪集群

101 : 7001 7002 7003 三个节点 102 : 7004 7005 7006 三个节点

机房异常断电,主机宕机~


现象

Redis Cluster 不可用 ,应用无法正常启动

查看集群信息 ,如下

代码语言:javascript
代码运行次数:0
运行
复制
172.168.15.101:7001> CLUSTER INFO
cluster_state:fail
cluster_slots_assigned:16354
cluster_slots_ok:16354
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:7
cluster_my_epoch:1
cluster_stats_messages_ping_sent:1666
cluster_stats_messages_pong_sent:1063
cluster_stats_messages_sent:2729
cluster_stats_messages_ping_received:1063
cluster_stats_messages_pong_received:1026
cluster_stats_messages_received:2089

划重点 cluster_state:fail cluster_slots_assigned:16354 , 集群状态 fail , 分配的slots 16354 < 16384 , 少了30个slots ,集群不可用。

为了保证集群完整性, 默认情况下当集群16384个槽任何一个没有指派到节点时整个集群不可用。这是对集群完整性的一种保护措施, 保证所有的槽都指派给在线的节点。

可以看到 slot 有未分配的情况, 那如何重新分配这些slots 便是解决问题的关键。


查找未指派的slots

方式一 cluster slots

代码语言:javascript
代码运行次数:0
运行
复制
172.168.15.101:7001> CLUSTER SLOTS
 1) 1) (integer) 5461
    2) (integer) 5591
    3) 1) "172.168.15.101"
    ....
    ...
    ....
    33) 1) (integer) 0
    2) (integer) 5460
    3) 1) "172.168.15.101"
       2) (integer) 7001
       3) "40b3ab3eb00e0107ea702e96231694016fb5c25f"
    4) 1) "172.168.15.102"
       2) (integer) 7006
       3) "b2392a54bc1ed255d9f86ce5315b3c66177bc54c"
172.168.15.101:7001>

太多了,并且这么看也不方便统计,推荐第二种方式


方式二 cluster nodes

代码语言:javascript
代码运行次数:0
运行
复制
172.168.15.101:7001> cluster nodes
f434df4b2a8e8262e91b192fdd4329ac7eaba257 172.168.15.101:7003@17003 master - 0 1589854185127 7 connected 5461-5591 5593-5783 5785-5913 5915-6157 6159-6264 6266-6290 6292-6311 6313-6401 6403-6963 6965-7228 7230-7566 7568-7647 7649-7862 7864-8199 8201-8693 8695-8805 8807-8832 8834-9229 9231-9305 9307-9353 9355-9477 9479-9696 9698-9761 9763-9855 9857-10241 10243-10265 10267-10310 10312-10348 10350-10529 10531-10669 10671-10922
8c27d256907bd17ceed4b0bfc8474eb90e7cf71e 172.168.15.102:7004@17004 slave f434df4b2a8e8262e91b192fdd4329ac7eaba257 0 1589854187127 7 connected
8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d 172.168.15.101:7002@17002 master - 0 1589854186127 2 connected 10923-16383
b2392a54bc1ed255d9f86ce5315b3c66177bc54c 172.168.15.102:7006@17006 slave 40b3ab3eb00e0107ea702e96231694016fb5c25f 0 1589854185000 6 connected
40b3ab3eb00e0107ea702e96231694016fb5c25f 172.168.15.101:7001@17001 myself,master - 0 1589854184000 1 connected 0-5460 [5592-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [5784-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [5914-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [6158-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [6265-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [6291-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [6312-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [6402-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [6964-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [7229-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [7567-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [7648-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [7863-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [8200-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [8694-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [8806-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [8833-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9230-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9306-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9354-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9478-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9697-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9762-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9856-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10242-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10266-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10311-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10349-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10530-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10670-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10973-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [11020-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [11140-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [11144-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [11200-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [11624-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [11802-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [12201-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [12301-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [12681-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [12685-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [13365-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [13676-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [13969-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [13989-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [14395-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [14412-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [15149-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [15611-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [15654-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [15758-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [15778-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [15899-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [16100-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [16105-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [16147-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d]
6d8f2f251fa2d881cae91012088e1d5eb653ebb4 172.168.15.102:7005@17005 slave 8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d 0 1589854186000 5 connected

7002 : 10923-16383

7001: 0-5460

7003 : 5461-5591 5593-5783 5785-5913 5915-6157 6159-6264 6266-6290 6292-6311 6313-6401 6403-6963 6965-7228 7230-7566 7568-7647 7649-7862 7864-8199 8201-8693 8695-8805 8807-8832 8834-9229 9231-9305 9307-9353 9355-9477 9479-9696 9698-9761 9763-9855 9857-10241 10243-10265 10267-10310 10312-10348 10350-10529 10531-10669 10671-10922

缺哪些slot ,可以知道了吧

cluster nodes的格式 随后分析一下 ~~~


计算未指派的slots ,重新添加

看7003 这个master 后面的slot分布情况

代码语言:javascript
代码运行次数:0
运行
复制
5461-5591 5593-5783 5785-5913 5915-6157 6159-6264 6266-6290 6292-6311 6313-6401 6403-6963 6965-7228 7230-7566 7568-7647 7649-7862 7864-8199 8201-8693 8695-8805 8807-8832 8834-9229 9231-9305 9307-9353 9355-9477 9479-9696 9698-9761 9763-9855 9857-10241 10243-10265 10267-10310 10312-10348 10350-10529 10531-10669 10671-10922

缺少 5592 5784 5914 6158 6265 6291 6312 6402 6964 7229 7567 7648 7863 8200 8694 8806 8833 9230 9306 9354 9478 9697 9762 9856 10242 10266 10311 10349 10530 10670

重新分配下

代码语言:javascript
代码运行次数:0
运行
复制
172.168.15.101:7001> CLUSTER ADDSLOTS 5592 5784 5914 6158 6265  6291 6312 6402 6964 7229 7567 7648 7863 8200 8694 8806 8833 9230 9306 9354 9478 9697 9762 9856 10242 10266 10311 10349 10530 10670 
OK
172.168.15.101:7001>

过一会儿,重新查看下

代码语言:javascript
代码运行次数:0
运行
复制
172.168.15.101:7001> CLUSTER INFO
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:7
cluster_my_epoch:1
cluster_stats_messages_ping_sent:2108
cluster_stats_messages_pong_sent:1508
cluster_stats_messages_sent:3616
cluster_stats_messages_ping_received:1508
cluster_stats_messages_pong_received:1468
cluster_stats_messages_update_received:19
cluster_stats_messages_received:2995
172.168.15.101:7001>

OK了


Redisson 初始化失败 (Not all slots are covered! Only 10923 slots are avaliable + Failed to add master: redis://172.168.15.101:7002 for slot ranges: [[10923-16383]]. Reason - cluster_state:fail)

Redisson配置了集群地址

代码语言:javascript
代码运行次数:0
运行
复制
[2020-05-19 10:44:33,539] INFO [localhost-startStop-1] RedissonManager.<clinit>(27) | redisson client begin to init....
[2020-05-19 10:44:36,365] ERROR [localhost-startStop-1] RedissonManager.<clinit>(52) | org.redisson.client.RedisConnectionException: Not all slots are covered! Only 10923 slots are avaliable
        at org.redisson.cluster.ClusterConnectionManager.<init>(ClusterConnectionManager.java:167)
        at org.redisson.config.ConfigSupport.createConnectionManager(ConfigSupport.java:198)
        at org.redisson.Redisson.<init>(Redisson.java:122)
        at org.redisson.Redisson.create(Redisson.java:159)

       .......
         .......
           .......
Caused by: org.redisson.client.RedisException: Failed to add master: redis://172.168.15.101:7002 for slot ranges: [[10923-16383]]. Reason - cluster_state:fail
        at org.redisson.cluster.ClusterConnectionManager$1$1.operationComplete(ClusterConnectionManager.java:223)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)

原因很明确了 redis://172.168.15.101:7002 for slot ranges: [[10923-16383]]. Reason - cluster_state:fail

连上7002端口 (一定要上7002上看,不要再其他端口查看节点信息),重复刚才的操作 。

期间重启了几次节点 ,故障恢复 。

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2020/05/19 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Pre
  • 现象
  • 查找未指派的slots
    • 方式一 cluster slots
    • 方式二 cluster nodes
    • 计算未指派的slots ,重新添加
  • Redisson 初始化失败 (Not all slots are covered! Only 10923 slots are avaliable + Failed to add master: redis://172.168.15.101:7002 for slot ranges: [[10923-16383]]. Reason - cluster_state:fail)
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档