redis容器化验证

上篇文章中对redis operator进行了开发工作,接下来对一些功能点进行验证,文末附上源码地址,欢迎star,顺便点个在看,手动笔芯!

编写一个operator扩展kubernetes能力

本节验证点如下:

1、创建3主3从集群;5主5从验证;

2、升级3主3从集群;升级5主5从验证;

3、主从分配验证(两主不能在一个主机上,一对主从不能在同一主机上)

4、operator高可用验证

5、operator异常场景验证

主从IP分配验证

测试要点

1、 两个master实例尽可能不在同一个主机上;

2、 一对主从尽可能不在同一个主机上。

创建3主3从集群

2个节点

当6个redis pod实例分布到2个节点时,节点分配如下:

masterInstanceIPs: [10.168.167.118 10.168.250.243 10.168.167.119] slaveInstanceIPs: [10.168.250.250 10.168.167.120 10.168.250.236]

前两个 master分配到两个不同主机上;3对主从都不在同一主机上。redis实例node.conf节点信息如下,可见形成的redis集群满足要求:

779f4cb6e2ebf0bcef86d9b3ac024d295cc9b1cc 10.168.167.118:6379 master - 0 1555688720967 1 connected 0-546024f9fba7063dc9a862f829b4f441b4e0f437bb18 10.168.250.250:6379 slave 779f4cb6e2ebf0bcef86d9b3ac024d295cc9b1cc 0 1555688721970 1 connected5bf92e4ee029c6108e41599ddb99b25c249cdbdd 10.168.250.236:6379 slave 5dbb0468b397920f3df8eb3e3a43b8cc9d70b2f3 0 1555688725001 3 connectedf01b4fc660f64f8d61bd831b128e39ad9050a4bb 10.168.167.120:6379 slave 1ae5b3ae3f4361464c8cf353ae9d29cbc3bce60b 0 1555688726008 2 connected1ae5b3ae3f4361464c8cf353ae9d29cbc3bce60b 10.168.250.243:6379 myself,master - 0 0 2 connected 5461-109225dbb0468b397920f3df8eb3e3a43b8cc9d70b2f3 10.168.167.119:6379 master - 0 1555688723988 3 connected 10923-16383

3个节点

当6个redis pod实例分布到3个节点时,节点分配如下:

masterInstanceIPs: [10.168.178.187 10.168.167.77 10.168.250.198] slaveInstanceIPs: [10.168.167.78 10.168.250.232 10.168.178.177]

三个 master分配到三个不同主机上;3对主从都不在同一主机上。redis实例node.conf节点信息如下,可见形成的redis集群满足要求:

c070d90e2c4fe01c0e34dd33477c06da0a3ca878 10.168.250.232:6379 slave 3aff200019418390971c97f5a1c75c4aaf3e1977 0 1555692402290 2 connected3aff200019418390971c97f5a1c75c4aaf3e1977 10.168.167.77:6379 master - 0 1555692402790 2 connected 5461-109226665002017b4b8260e3f36976d190789ec42f136 10.168.167.78:6379 slave 6625980a48eb92ae168e6c52a25c05ae42851bba 0 1555692399269 1 connected6381b59880a42cb93df1995e1157d4af2f615437 10.168.250.198:6379 master - 0 1555692398267 3 connected 10923-163838146a32ea6ae37b0126eb64a2871a1e6b8fbfc32 10.168.178.177:6379 slave 6381b59880a42cb93df1995e1157d4af2f615437 0 1555692401275 3 connected6625980a48eb92ae168e6c52a25c05ae42851bba 10.168.178.187:6379 myself,master - 0 0 1 connected 0-5460

4个节点

当6个redis pod实例分布到4个节点时,节点分配如下:

masterInstanceIPs: [10.168.178.151 10.168.167.79 10.168.246.105]

slaveInstanceIPs: [10.168.250.249 10.168.178.144 10.168.250.251]

三个 master分配到三个不同主机上;3对主从都不在同一主机上。redis实例node.conf节点信息如下,可见形成的redis集群满足要求:

94a43a81710d089af67306d9bcd029fe28a8befa 10.168.250.251:6379 slave b6dce9a15372f7fe366643800662a489b527d8a7 0 1555722632475 3 connectedd267eed81c40addaa08008835415e409a133993f 10.168.167.79:6379 master - 0 1555722634476 2 connected 5461-10922912b274f7f94a599e8e3b689a2a8ca0e0a1d1ae1 10.168.178.144:6379 slave d267eed81c40addaa08008835415e409a133993f 0 1555722636502 2 connectedb6dce9a15372f7fe366643800662a489b527d8a7 10.168.246.105:6379 master - 0 1555722635481 3 connected 10923-163835a5d0b43157810ccc9ddb251500f02549416faec 10.168.250.249:6379 slave b1305ef94692bb782c7ca5624904bb52a092e882 0 1555722637503 1 connectedb1305ef94692bb782c7ca5624904bb52a092e882 10.168.178.151:6379 myself,master - 0 0 1 connected 0-5460

5个节点

当6个redis pod实例分布到5个节点时,节点分配如下:

masterInstanceIPs: [10.168.178.150 10.168.167.80 10.168.246.80] slaveInstanceIPs: [10.168.250.192 10.168.178.135 10.168.214.216]

三个 master分配到三个不同主机上;3对主从都不在同一主机上。redis实例node.conf节点信息如下,可见形成的redis集群满足要求:

2ad5408e096c00de9a2d6b39920aa1a9025e15e1 10.168.178.135:6379 slave 5fcdb2a0de24231f6c6fdf1b68d8bfa0bd268627 0 1555723082762 2 connected2e03c84d7216eab6c500e6776191274e0b968ad0 10.168.246.80:6379 master - 0 1555723083765 3 connected 10923-163832cd8dbe370833fbc430013b0523261cf05da64a5 10.168.214.216:6379 slave 2e03c84d7216eab6c500e6776191274e0b968ad0 0 1555723080733 3 connectedbcaf33f016d4c60ce67055eca15399c3f3c07a18 10.168.250.192:6379 myself,slave 0e6704cce8de78905906e7a966760a777aafb9ab 0 0 0 connected5fcdb2a0de24231f6c6fdf1b68d8bfa0bd268627 10.168.167.80:6379 master - 0 1555723085777 2 connected 5461-109220e6704cce8de78905906e7a966760a777aafb9ab 10.168.178.150:6379 master - 0 1555723084770 1 connected 0-5460

6个节点

当6个redis pod实例分布到6个节点时,节点分配如下:

master IP: [10.168.178.133 10.168.167.115 10.168.131.82]

slave IP: [10.168.214.225 10.168.250.205 10.168.246.84]

三个 master分配到三个不同主机上;3对主从都不在同一主机上。redis实例node.conf节点信息如下,可见形成的redis集群满足要求:

7e43a73fa1cd232c010b58e0c8d7075f111cf488 10.168.167.115:6379 master - 0 1555686504327 2 connected 5461-109226a699e423c8e31422c8e8224e765ec6bcbd9ee2e 10.168.246.84:6379 slave 960cc0c207747ec2fd39ae130d8a6e25f2c5a1dd 0 1555686502321 3 connectede77bd6f8e36d18a3b604bae8411e0ddbdac9ef4d 10.168.178.133:6379 master - 0 1555686503320 1 connected 0-5460ae1d15ef634cc08563ff6de7aa3fe38af04095bf 10.168.250.205:6379 slave 7e43a73fa1cd232c010b58e0c8d7075f111cf488 0 1555686501304 2 connected960cc0c207747ec2fd39ae130d8a6e25f2c5a1dd 10.168.131.82:6379 myself,master - 0 0 3 connected 10923-16383e63cc485885c265cdfc79c484b67d8b31c2a4cc1 10.168.214.225:6379 slave e77bd6f8e36d18a3b604bae8411e0ddbdac9ef4d 0 1555686504741 1 connected

升级3主3从集群

升级前2个节点,升级后4个节点

提供两个节点供调度,创建3主3从redis集群,6个redis pod实例分布到2个节点时,实例信息如下:

master、slave IP分配如下:

masterInstanceIPs: [10.168.167.101 10.168.250.250 10.168.167.102] slaveInstanceIPs: [10.168.250.226 10.168.167.103 10.168.250.194]

可以看到两个master不在同一主机上上。3对主从都不在同一主机上,redis集群node.conf配置如下:

3610c0798c98d49d8f3e94a3911a76e1d63b1397 10.168.167.103:6379 slave 9c8b07a38dcd87afa764418726d80c71f10241e9 0 1555745305921 2 connectedc5960ac6f438ea9e98ed99daa9cf06000c376521 10.168.167.101:6379 myself,master - 0 0 1 connected 0-54609c8b07a38dcd87afa764418726d80c71f10241e9 10.168.250.250:6379 master - 0 1555745304925 2 connected 5461-10922ec39a4510420f480870de6d9aa7f1553e7f83d41 10.168.250.226:6379 slave c5960ac6f438ea9e98ed99daa9cf06000c376521 0 1555745306924 1 connected251b8f8839ed2ae627029ee563c8c1a0a2bb99f4 10.168.250.194:6379 slave 5b9bcb75663c1c84898a3805d26c4188e2a06e2e 0 1555745308935 3 connected5b9bcb75663c1c84898a3805d26c4188e2a06e2e 10.168.167.102:6379 master - 0 1555745307930 3 connected 10923-16383

自动扩容方式:升级3主3从到5主5从,新加两个节点供调度,升级后10个redis pod实例分布到4个节点上时,实例信息如下:

master、slave IP分配如下:

willAddClusterMasterIPs: [10.168.178.142 10.168.246.86] willAddClusterSlaveIPs: [10.168.246.112 10.168.178.167]

可以看到新分配的两个master不在同一主机上,且和旧master也不在同一主机上,5对主从都不在同一主机上

redis集群node.conf配置如下:

50c32a8cc9d8ae3eec2b25f2fb67b8ccd65ea663 10.168.178.167:6379 slave a114f33154186797304e8cd293deeee0dc630cb1 0 1555750930897 8 connected4cb332421893e4d1efe5a0ddc0e6e7871f5fd1bd 10.168.246.112:6379 slave e3b802a9570f11c74716172d66a4163153495c70 0 1555750925827 7 connectedec39a4510420f480870de6d9aa7f1553e7f83d41 10.168.250.226:6379 slave c5960ac6f438ea9e98ed99daa9cf06000c376521 0 1555750924821 1 connected5b9bcb75663c1c84898a3805d26c4188e2a06e2e 10.168.167.102:6379 master - 0 1555750929889 3 connected 13108-16383251b8f8839ed2ae627029ee563c8c1a0a2bb99f4 10.168.250.194:6379 slave 5b9bcb75663c1c84898a3805d26c4188e2a06e2e 0 1555750927335 3 connected9c8b07a38dcd87afa764418726d80c71f10241e9 10.168.250.250:6379 myself,master - 0 0 2 connected 7647-109223610c0798c98d49d8f3e94a3911a76e1d63b1397 10.168.167.103:6379 slave 9c8b07a38dcd87afa764418726d80c71f10241e9 0 1555750926329 2 connectede3b802a9570f11c74716172d66a4163153495c70 10.168.178.142:6379 master - 0 1555750928887 7 connected 0-1091 5461-7646a114f33154186797304e8cd293deeee0dc630cb1 10.168.246.86:6379 master - 0 1555750927876 8 connected 1092-2184 10923-13107c5960ac6f438ea9e98ed99daa9cf06000c376521 10.168.167.101:6379 master - 0 1555750931398 1 connected 2185-5460

升级前3个节点,升级后5个节点

提供三个节点供调度,创建3主3从redis集群,6个redis pod实例分布到3个节点时,实例信息如下:

主从节点分配如下:

masterInstanceIPs: [10.168.167.112 10.168.178.186 10.168.250.223]

slaveInstanceIPs: [10.168.250.241 10.168.167.113 10.168.178.180]

可以看到3个主节点都不在同一主机上,3对主从都不在同主机上,redis集群node.conf如下:

87a6b8ab7f6c90f707e8929fab73f573ce44146d 10.168.167.112:6379 myself,master - 0 0 2 connected 5461-10922d8dbebd3fc2cd68bbcd783fff0830eeadbf00818 10.168.167.113:6379 slave 5f736d89a8236531de191fea79d46050321aa02f 0 1555761224102 1 connected5f736d89a8236531de191fea79d46050321aa02f 10.168.178.186:6379 master - 0 1555761227114 1 connected 0-5460c2b87cbceed314541235ed992767b3eb28e1ab2f 10.168.250.223:6379 master - 0 1555761226110 3 connected 10923-16383e21c4efd9250f89f661881cb927f991b7823af1d 10.168.178.180:6379 slave c2b87cbceed314541235ed992767b3eb28e1ab2f 0 1555761225105 3 connectedbc3777d03dcc6cffd292fa55ba56bbc601d54e33 10.168.250.241:6379 slave 87a6b8ab7f6c90f707e8929fab73f573ce44146d 0 1555761222094 2 connected

新加两个可调度节点,自动扩容方式升级集群到5主5从,节点信息如下:

主从节点分配如下:

willAddClusterMasterIPs: [10.168.214.235 10.168.246.68] willAddClusterSlaveIPs: [10.168.246.66 10.168.214.220]

可以看到5个主节点都不在同一主机上,5对主从都不在同一主机上,redis集群node.conf如下:

2028ce667f834257eafb04a3e03a15d6721919bf 10.168.214.235:6379 master - 0 1555762207343 6 connected 0-1091 5461-764641f2e1a00ff445f53066dbdb6249a99c042a1827 10.168.246.68:6379 master - 0 1555762211358 7 connected 1092-2184 10923-1310767d97381c0fecf7fcabefffdc641918f428398e1 10.168.214.220:6379 slave 41f2e1a00ff445f53066dbdb6249a99c042a1827 0 1555762210356 7 connected87a6b8ab7f6c90f707e8929fab73f573ce44146d 10.168.167.112:6379 myself,master - 0 0 2 connected 7647-10922c2b87cbceed314541235ed992767b3eb28e1ab2f 10.168.250.223:6379 master - 0 1555762210356 3 connected 13108-16383e21c4efd9250f89f661881cb927f991b7823af1d 10.168.178.180:6379 slave c2b87cbceed314541235ed992767b3eb28e1ab2f 0 1555762212862 3 connectedd8dbebd3fc2cd68bbcd783fff0830eeadbf00818 10.168.167.113:6379 slave 5f736d89a8236531de191fea79d46050321aa02f 0 1555762212364 1 connected5f736d89a8236531de191fea79d46050321aa02f 10.168.178.186:6379 master - 0 1555762211357 1 connected 2185-54606628fd47b105a655f883e12b295055fada0ae87a 10.168.246.66:6379 slave 2028ce667f834257eafb04a3e03a15d6721919bf 0 1555762207344 6 connectedbc3777d03dcc6cffd292fa55ba56bbc601d54e33 10.168.250.241:6379 slave 87a6b8ab7f6c90f707e8929fab73f573ce44146d 0 1555762213365 2 connected

升级前4个节点,升级后5个节点

升级3主3从到5主5从,升级前6个redis pod实例分布到4个节点时,节点分配如下:

masterInstanceIPs: [10.168.178.152 10.168.167.81 10.168.246.100] slaveInstanceIPs: [10.168.250.221 10.168.178.131 10.168.250.222]

三个 master分配到三个不同主机上;3对主从都不在同一主机上。redis实例node.conf节点信息如下,可见形成的redis集群满足要求:

1b37538580090ee83ec6ab62eba07e1264ea1606 10.168.250.222:6379 myself,slave b90207e51e809c1c525edf6bdd4cbc9ba695c261 0 0 0 connectedbf976ccfb0b6c592d2f99ce80032d92e2cda8829 10.168.167.81:6379 master - 0 1555724682133 2 connected 5461-10922d1eb3f04698cdf4fd1aa60ae228642dd4ca79a9a 10.168.250.221:6379 slave f9e9fb4c83bef67cf76c49d4f98e0de996f2e80d 0 1555724680109 1 connected16a68a73556de518150b5e59d4e489d4e04f5707 10.168.178.131:6379 slave bf976ccfb0b6c592d2f99ce80032d92e2cda8829 0 1555724683130 2 connectedf9e9fb4c83bef67cf76c49d4f98e0de996f2e80d 10.168.178.152:6379 master - 0 1555724681118 1 connected 0-5460b90207e51e809c1c525edf6bdd4cbc9ba695c261 10.168.246.100:6379 master - 0 1555724683130 3 connected 10923-16383

自动扩容方式:升级到5主5从后,变为5个节点,节点分配如下:

willAddClusterMasterIPs: [10.168.214.204 10.168.246.64] willAddClusterSlaveIPs: [10.168.167.66 10.168.214.192]

由于新加了2个master、2个slave,但只加了一个可调度节点,升级前主从关系不变。升级后的2个新master,其中一个(10.168.214.204)与升级前的3master不在同一节点上,另一个新master(10.168.246.64)与升级前其中一个master(10.168.246.100)在同一节点(10.10.103.61-slave)上。node.conf信息如下:

40317506bc7c31c6e06fa18807ff08fb487ecb21 10.168.214.204:6379 master - 0 1555725642125 6 connected 0-1091 5461-76468e3e6d6aa184f2e8bc2d8b63f720abd66e5f7cf3 10.168.167.66:6379 slave 40317506bc7c31c6e06fa18807ff08fb487ecb21 0 1555725641112 6 connected1b37538580090ee83ec6ab62eba07e1264ea1606 10.168.250.222:6379 myself,slave b90207e51e809c1c525edf6bdd4cbc9ba695c261 0 0 0 connectedb90207e51e809c1c525edf6bdd4cbc9ba695c261 10.168.246.100:6379 master - 0 1555725642643 3 connected 13108-1638316a68a73556de518150b5e59d4e489d4e04f5707 10.168.178.131:6379 slave bf976ccfb0b6c592d2f99ce80032d92e2cda8829 0 1555725636084 2 connectedd1eb3f04698cdf4fd1aa60ae228642dd4ca79a9a 10.168.250.221:6379 slave f9e9fb4c83bef67cf76c49d4f98e0de996f2e80d 0 1555725639103 1 connectedbf976ccfb0b6c592d2f99ce80032d92e2cda8829 10.168.167.81:6379 master - 0 1555725643136 2 connected 7647-1092248b913496d98372be0ee2420657e11acf2352077 10.168.214.192:6379 slave 773e37732962f00eb2e641fd171d42f359cd8d6f 0 1555725638095 7 connectedf9e9fb4c83bef67cf76c49d4f98e0de996f2e80d 10.168.178.152:6379 master - 0 1555725640111 1 connected 2185-5460773e37732962f00eb2e641fd171d42f359cd8d6f 10.168.246.64:6379 master - 0 1555725640611 7 connected 1092-2184 10923-13107

所以建议,创建3主3从集群,就分配3个可调度节点;升级为5主5从时(即加两个master)需要加两个新可调度节点,才能保证主从分配规则。

或者,在scheduler调度策略里:3主3从即使有4个可调度节点,也只选出3个最优的节点去调度6个实例;或者升级时,选择新节点的同时,尽量选择升级前slave所在节点去调度。

升级前后都5个节点

升级3主3从到5主5从,升级前6个redis pod实例分布到5个节点时,节点分配如下:

masterInstanceIPs: [10.168.178.143 10.168.167.82 10.168.246.124] slaveInstanceIPs: [10.168.250.215 10.168.178.137 10.168.214.207]

三个 master分配到三个不同主机上;3对主从都不在同一主机上。redis实例node.conf节点信息如下,可见形成的redis集群满足要求:

870d7a12dd24f9d3eaf046698144e3ef72f5f0a8 10.168.178.137:6379 slave fa935ddfe14874c1f1f0838e75b477bdf83d4fcc 0 1555726310056 2 connectedf4fb098370e9f4a9ae04cc4affd88423152e2cc3 10.168.178.143:6379 myself,master - 0 0 1 connected 0-54607135f67b63a47d5c544a25c5ddb74a77e94ef788 10.168.250.215:6379 slave f4fb098370e9f4a9ae04cc4affd88423152e2cc3 0 1555726316268 1 connecteddb06c2399ce36ed3ac30c234fdf2a316f891efa0 10.168.246.124:6379 master - 0 1555726314141 3 connected 10923-16383c2a5059ef9efa8a7dfd1540606c872d9d4327f47 10.168.214.207:6379 slave db06c2399ce36ed3ac30c234fdf2a316f891efa0 0 1555726311061 3 connectedfa935ddfe14874c1f1f0838e75b477bdf83d4fcc 10.168.167.82:6379 master - 0 1555726315262 2 connected 5461-10922

手动扩容方式:

升级到5主5从后,节点分配如下:

可以看到,新的4个节点有两个调度到了升级前slave所在节点,两个调度到了升级前master所在节点,所以新4个实例master、slave IP分配如下:

willAddClusterMasterIPs: [10.168.214.196 10.168.250.247] willAddClusterSlaveIPs: [10.168.246.110 10.168.167.86]

5个master都不在同一个节点上,5对主从都不在同一节点上:

redis集群node.conf信息如下:

9ac4428fe725dd7c85ed281365fed0f0feea06a9 10.168.246.110:6379 slave ecb8edcef3579f746551da1bdcdec6ef02182abc 0 1555731069605 7 connectedf4fb098370e9f4a9ae04cc4affd88423152e2cc3 10.168.178.143:6379 myself,master - 0 0 1 connected 666-5460870d7a12dd24f9d3eaf046698144e3ef72f5f0a8 10.168.178.137:6379 slave fa935ddfe14874c1f1f0838e75b477bdf83d4fcc 0 1555731073128 2 connecteddb06c2399ce36ed3ac30c234fdf2a316f891efa0 10.168.246.124:6379 master - 0 1555731072628 3 connected 12089-163836f203d39ed42a8feb20e6eaab9ef27beb18c8702 10.168.167.86:6379 slave f63658ed38a46aed883e060d0d067caa9294e2be 0 1555731067591 8 connected7135f67b63a47d5c544a25c5ddb74a77e94ef788 10.168.250.215:6379 slave f4fb098370e9f4a9ae04cc4affd88423152e2cc3 0 1555731074134 1 connectedecb8edcef3579f746551da1bdcdec6ef02182abc 10.168.214.196:6379 master - 0 1555731071622 7 connected 0-665 5461-6127 10923-11588f63658ed38a46aed883e060d0d067caa9294e2be 10.168.250.247:6379 master - 0 1555731073631 8 connected 6128-6627 11589-12088fa935ddfe14874c1f1f0838e75b477bdf83d4fcc 10.168.167.82:6379 master - 0 1555731069097 2 connected 6628-10922c2a5059ef9efa8a7dfd1540606c872d9d4327f47 10.168.214.207:6379 slave db06c2399ce36ed3ac30c234fdf2a316f891efa0 0 1555731072124 3 connected

升级前后都6个节点

升级3主3从到5主5从,升级前6个redis pod实例分布到6个节点时,节点分配如下:

masterInstanceIPs: [10.168.178.162 10.168.167.84 10.168.131.79] slaveInstanceIPs: [10.168.250.238 10.168.131.80 10.168.214.255]

三个 master分配到三个不同主机上;3对主从都不在同一主机上。redis实例node.conf节点信息如下,可见形成的redis集群满足要求:

23ea3630dc8208ca4700dcd9200fca1cd07a9ecf 10.168.214.255:6379 slave f41a5c85475654062f681fc06fb3a60a71778d61 0 1555733790931 3 connectedf74788e37d0c8a4db324c691764248e1ecf5e20f 10.168.167.84:6379 master - 0 1555733788879 2 connected 5461-10922f41a5c85475654062f681fc06fb3a60a71778d61 10.168.131.79:6379 myself,master - 0 0 3 connected 10923-163838ebd8ac3c9edb6643484f63fee18a5dd2d200d09 10.168.178.162:6379 master - 0 1555733787862 1 connected 0-5460a95b3acff14475f7f24bdd7b8b5fa83e010dbdc1 10.168.131.80:6379 slave f74788e37d0c8a4db324c691764248e1ecf5e20f 0 1555733789287 2 connected02493d0efdc34a8810996e1af6d14fc283a9f70b 10.168.250.238:6379 slave 8ebd8ac3c9edb6643484f63fee18a5dd2d200d09 0 1555733789893 1 connected

自动扩容方式:升级到5主5从,节点分配如下:

willAddClusterMasterIPs: [10.168.214.219 10.168.250.195] willAddClusterSlaveIPs: [10.168.246.79 10.168.131.87]

三个 master分配到三个不同主机上;3对主从都不在同一主机上。redis实例node.conf节点信息如下,可见形成的redis集群满足要求:

25ed57f6596e463be58c21c025c4bda33889eaec 10.168.131.87:6379 slave 83b977142d4bc456fa820238e78ec9cb2c8e8d5a 0 1555734547578 8 connected23ea3630dc8208ca4700dcd9200fca1cd07a9ecf 10.168.214.255:6379 slave f41a5c85475654062f681fc06fb3a60a71778d61 0 1555734550023 3 connected9bdcb3c5ef29f540f955a57a8afbe8403b0209c4 10.168.214.219:6379 master - 0 1555734547076 7 connected 0-1091 5461-76468ebd8ac3c9edb6643484f63fee18a5dd2d200d09 10.168.178.162:6379 master - 0 1555734549027 1 connected 2185-5460a95b3acff14475f7f24bdd7b8b5fa83e010dbdc1 10.168.131.80:6379 slave f74788e37d0c8a4db324c691764248e1ecf5e20f 0 1555734548600 2 connected1e2dba5e3b0d0abd3bdb6d898be32b7c6bf394b2 10.168.246.79:6379 slave 9bdcb3c5ef29f540f955a57a8afbe8403b0209c4 0 1555734543025 7 connected83b977142d4bc456fa820238e78ec9cb2c8e8d5a 10.168.250.195:6379 master - 0 1555734545560 8 connected 1092-2184 10923-13107f74788e37d0c8a4db324c691764248e1ecf5e20f 10.168.167.84:6379 master - 0 1555734546579 2 connected 7647-10922f41a5c85475654062f681fc06fb3a60a71778d61 10.168.131.79:6379 myself,master - 0 0 3 connected 13108-1638302493d0efdc34a8810996e1af6d14fc283a9f70b 10.168.250.238:6379 slave 8ebd8ac3c9edb6643484f63fee18a5dd2d200d09 0 1555734549615 1 connected

创建5主5从集群

创建5主5从集群,创建时提供5个节点供调度,实例情况如下:

主从分配如下:

masterInstanceIPs: [10.168.178.145 10.168.167.89 10.168.246.126 10.168.214.209 10.168.250.204] slaveInstanceIPs: [10.168.167.91 10.168.246.119 10.168.214.197 10.168.250.253 10.168.178.164]

5个 master分配到5个不同主机上;5对主从都不在同一主机上。redis实例node.conf节点信息如下,可见形成的redis集群满足要求:

7dc5084f2af4191ab93da7a2315770ac6218be87 10.168.178.164:6379 slave bed7cec1b7b8e7d2dd2ac14348728a0d6651101f 0 1555737697783 5 connected09ba0807f798550b37a96039aec55501b3aa13c4 10.168.250.253:6379 slave b3105a048563aba317dbc8e2310f1a1b53e2794a 0 1555737696274 4 connected2c11b5d320de3e0e17751dd1173d3ff749ab1956 10.168.178.145:6379 master - 0 1555737693755 1 connected 0-3276bed7cec1b7b8e7d2dd2ac14348728a0d6651101f 10.168.250.204:6379 myself,master - 0 0 5 connected 13107-163833095e0263805ea607a2b099ee7d7c668ba387dee 10.168.246.126:6379 master - 0 1555737696778 3 connected 6554-98292ba26f29bc61c99b13091199cdfb31a6df2154eb 10.168.167.89:6379 master - 0 1555737691744 2 connected 3277-65536a3f82c770950ba3b573d860fc63486302c4db1e 10.168.167.91:6379 slave 2c11b5d320de3e0e17751dd1173d3ff749ab1956 0 1555737695774 1 connectedda307d9f7397146823df4db80465217e822b6095 10.168.246.119:6379 slave 2ba26f29bc61c99b13091199cdfb31a6df2154eb 0 1555737698284 2 connectedb3105a048563aba317dbc8e2310f1a1b53e2794a 10.168.214.209:6379 master - 0 1555737692245 4 connected 9830-13106f4bb5b08807b679d48877a65d8b187a22993c08e 10.168.214.197:6379 slave 3095e0263805ea607a2b099ee7d7c668ba387dee 0 1555737698785 3 connected

升级5主5从集群

升级5主5从集群到6主6从(手动分配卡槽):

升级前5个节点,升级时加一个节点,实例情况如下:

新实例主从分配如下:

willAddClusterMasterIPs: [10.168.131.85] willAddClusterSlaveIPs: [10.168.167.92]

6个 master分配到6个不同主机上;6对主从都不在同一主机上。redis实例node.conf节点信息如下,可见形成的redis集群满足要求:

fb184cb0e271e27262595525796d510fb292b7dc 10.168.167.92:6379 slave 39f9cd4053ba1c12f1c421aaced767ced665f112 0 1555738234326 6 connected7dc5084f2af4191ab93da7a2315770ac6218be87 10.168.178.164:6379 slave bed7cec1b7b8e7d2dd2ac14348728a0d6651101f 0 1555738234830 5 connected09ba0807f798550b37a96039aec55501b3aa13c4 10.168.250.253:6379 slave b3105a048563aba317dbc8e2310f1a1b53e2794a 0 1555738234828 4 connected2c11b5d320de3e0e17751dd1173d3ff749ab1956 10.168.178.145:6379 master - 0 1555738235843 1 connected 0-3276bed7cec1b7b8e7d2dd2ac14348728a0d6651101f 10.168.250.204:6379 myself,master - 0 0 5 connected 13107-163833095e0263805ea607a2b099ee7d7c668ba387dee 10.168.246.126:6379 master - 0 1555738236346 3 connected 6554-98292ba26f29bc61c99b13091199cdfb31a6df2154eb 10.168.167.89:6379 master - 0 1555738233804 2 connected 3777-65536a3f82c770950ba3b573d860fc63486302c4db1e 10.168.167.91:6379 slave 2c11b5d320de3e0e17751dd1173d3ff749ab1956 0 1555738235336 1 connectedda307d9f7397146823df4db80465217e822b6095 10.168.246.119:6379 slave 2ba26f29bc61c99b13091199cdfb31a6df2154eb 0 1555738237854 2 connectedb3105a048563aba317dbc8e2310f1a1b53e2794a 10.168.214.209:6379 master - 0 1555738232798 4 connected 10330-13106f4bb5b08807b679d48877a65d8b187a22993c08e 10.168.214.197:6379 slave 3095e0263805ea607a2b099ee7d7c668ba387dee 0 1555738236847 3 connected39f9cd4053ba1c12f1c421aaced767ced665f112 10.168.131.85:6379 master - 0 1555738235843 6 connected 3277-3776 9830-10329

异常场景测试

operator从实例挂掉

redis operator一般为一主多活多实例部署。如果在redis集群创建过程中,因某些故障redis operator挂掉一个非master实例,此时master实例正常运行,不会影响operator正常功能;

如下:起了两个operator-manager实例:

在启动参数中加了leader选举机制:

从日志中可以看出,operator-manager-86d785b5fc-ls8ts为从实例,一直尝试获取锁,但是锁被主operator-manager-86d785b5fc-x7qv5持有。

查看operator-manager-86d785b5fc-x7qv5日志如下:

可以看到该实例正在进行同步redisCluster。

现在进入到operator slave所在的主机,用 docker pause726d1b4c6f98终止掉slave实例。

查看master的日志,可以看到还在正常运行,同步redisCluster事件,不影响operator的正常功能。

operator主实例挂掉

创建集群

因某些故障redis operator挂掉master实例,此时slave实例会升级为master。新master需要根据redisCluster资源对象判断statefulset是否存在:如不存在则创建,开始正常创建初始化流程;如存在则等待pod全部Ready后根据目前集群状态进行创建初始化操作。一般可能出现以下情况

所有实例都没有形成集群

所有实例都是独立的,没有形成redis集群,此时各节点查到node信息里clusterknownnode都为1,需要operator进行初始化集群操作;

创建集群前operator主从实例分别为:

operator-manager-86d785b5fc-9v6m5为主,operator-manager-86d785b5fc-ljt9q为从。

开始创建集群,主实例operator-manager-86d785b5fc-9v6m5在创建3主3从集群时,等待pod实例全部Ready的时候先pause掉operator slave实例(为了测试截图),然后pause掉operator master实例(用docker pause命令终止进程运行):

此时redisCluster对象中status的phase为Creating,表示集群正在创建中:

此时pod实例全部Ready,但是都没有形成集群:

然后unpause operator slave实例,此时slave会获取到锁,成为新的master(operator-manager-86d785b5fc-ljt9q),处理redisCluster事件同步,进行集群初始化工作:

可见slave升级为新master后,对6个pod实例进行初始化,加master、slave,分配卡槽,形成了6实例3主3从集群,且集群状态正常。

同时更新redisCluster对象的状态为Running,集群修复成功:

部分实例形成集群

部分节点已经组成集群;部分实例独立,需要operator将独立的实例加入集群,并分配卡槽。

创建集群前,operator master、slave实例如下:

operator-manager-86d785b5fc-9v6m5为从,operator-manager-86d785b5fc-ljt9q为主。

创建5主5从集群,当operator进行创建集群操作时(即执行了redis-trib create 1.1.1.1:6379)命令后,先pause掉slave实例,然后pause掉master实例:

可以看到此时5个master节点组成集群,且均衡分配了卡槽:

此时redisCluster状态phase为:Creating

然后,unpause掉operator slave实例,新master(operator-manager-86d785b5fc-9v6m5)需要加其余5个节点加入集群当slave:

稍等片刻后,可以看到所有slave实例都加入集群,形成5主5从,redis-trib.rb info 信息如下,集群状态正常,集群修复成功:

redisCluster里phase更新为Running:

升级集群

升级redis集群时,redis operator master挂掉时,新master需要检测当前集群状态,继续进行升级操作,一般可能会出现以下几种情况:

新实例部分ready

新实例部分ready,operator需等待实例全部ready后进行升级操作;

先创建3主3从集群,3主3从信息如下:

升级集群前,operator master、slave实例如下:

operator-manager-86d785b5fc-9v6m5为主,operator-manager-86d785b5fc-ljt9q为从。

手动扩容方式升级到5主5从,在开始扩容后,马上先unpause掉operator slave实例,然后pause掉operator master实例:

slave实例会升级为新master(operator-manager-86d785b5fc-ljt9q),等待pod实例全部Ready后,开始升级升级扩容操作:

升级过程中,phase为Upgrading:

可以看到正在迁移卡槽中:

一段时间后,可以看到以下节点信息,和卡槽分配情况,集群状态正常:

redisCluster里phase更新为Running:

部分实例组成集群

新实例都ready,部分实例组成集群,需要将其余新实例加入到集群,并给新master分配卡槽;

先创建3主3从集群,3主3从信息如下:

升级集群前,operator master、slave实例如下:

operator-manager-86d785b5fc-9v6m5为主,operator-manager-86d785b5fc-ljt9q为从。

手动扩容方式升级到5主5从,在添加第一个新master实例后,先pause掉operator slave实例,再unpause掉operator master实例:

此时看到集群只有四个master:

redisCluster的phase为Upgrading:

然后unpause掉operator slave实例,此时会升级为新master实例(operator-manager-86d785b5fc-ljt9q),进行加第二个master和两个slave、分配卡槽的工作:

正在分配卡槽中:

一段时间后,卡槽分配完毕,升级成功,集群正常:

redisCluster的phase状态更新为Running:

新master都未分配卡槽

新实例都ready,所有实例都组成集群,新master实例都没有分配卡槽,operator需给新master实例分配卡槽;

在有三主三从集群情况下,升级为5主5从集群:

升级前operator-manager-86d785b5fc-9v6m5为operator slave实例,operator-manager-86d785b5fc-ljt9q为operator master实例。采用手动分配卡槽的方式升级为5主5从集群,当节点全部加入集群,准备开始分配卡槽时,先pause掉slave实例(目的是验证),同时pause掉operator master实例:

可以看到此时新masterA和masterB、slave都加入了集群,但是没有给两个新master分配卡槽:

此时redisCluster对象中phase为Upgrading:

然后先unpause operator slave实例,此时该slave会升级为master,即pod实例operator-manager-86d785b5fc-9v6m5:

分配卡槽结束:

更新redisCluster的phase为Running,conditions里有了所有10个实例的详细信息:

集群状态正常:

部分新master未分配卡槽

新实例都ready,所有实例都组成集群,部分新master没有分配卡槽,operator需给该master实例分配卡槽。

在有三主三从集群情况下,升级为5主5从集群:

升级前operator-manager-86d785b5fc-9v6m5为operator slave实例,operator-manager-86d785b5fc-ljt9q为operator master实例。采用手动分配卡槽的方式升级为5主5从集群,当节点全部加入集群,新masterA分配卡槽结束后,新masterB分配卡槽开始前,先pause掉slave实例(目的是验证),同时pause掉operator master实例:

可以看到此时新masterA和masterB、slave都加入了集群,但是没有给两个新master分配卡槽:

此时redisCluster对象中phase为Upgrading:

然后先unpause operator slave实例,此时该slave会升级为master,即pod实例operator-manager-86d785b5fc-9v6m5:

分配卡槽结束:

更新redisCluster的phase为Running:

集群状态正常:

最后附上源码地址:

https://github.com/ll837448792/middleware-operator-manager

• end •

原文发布于微信公众号 - 我的小碗汤(mysmallsoup)

原文发表时间:2019-04-22

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券