集群概念应该不陌生了,多台机器组成,用来解决像存储空间,查询速度,负载等提供一个或多个服务支持!
Redis集群 是一个分布式的一种架构,支持横向扩展,也就是说之前咱们配置的LVS+keepalived需要配置好基础环境,然后加入到集群系统。现在的Redis分布式,是只需要把Redis这个集群配置到当前配置内就可以自动的去工作了~
Redis Cluster设计要点
redis cluster在设计的时候,就考虑到了去中心化,去中间件,也就是说,集群中的每个节点都是平等的关系,都是对等的,每个节点都保存各自的数据和整个集群的状态。每个节点都和其他所有节点连接,而且这些连接保持活跃,这样就保证了我们只需要连接集群中的任意一个节点,就可以获取到其他节点的数据。
那么redis 是如何合理分配这些节点和数据的呢?
Redis 集群没有并使用传统的一致性哈希来分配数据,而是采用另外一种叫做哈希槽 (hash slot)的方式来分配的。redis cluster 默认分配了 16384 个slot,当我们set一个key 时,会用CRC16算法来取模得到所属的slot,然后将这个key 分到哈希槽区间的节点上,具体算法就是:CRC16(key) % 16384。
注意的是:必须要3个以后的主节点,否则在创建集群时会失败,我们在后续会实践到。
所以,我们假设现在有3个节点已经组成了集群,分别是:A, B, C 三个节点,它们可以是一台机器上的三个端口,也可以是三台不同的服务器。那么,采用哈希槽 (hash slot)的方式来分配16384个slot 的话,它们三个节点分别承担的slot 区间是:
节点A覆盖0-5460; 节点B覆盖5461-10922; 节点C覆盖10923-16383. 如下图所示:
那么,现在我想设置一个key ,比如叫my_name:
set my_name zhdya 按照redis cluster的哈希槽算法:CRC16(‘my_name’)%16384 = 2412。 那么就会把这个key 的存储分配到 A 上了。
同样,当我连接(A,B,C)任何一个节点想获取my_name这个key时,也会这样的算法,然后内部跳转到B节点上获取数据。
这种哈希槽的分配方式有好也有坏,好处就是很清晰,比如我想新增一个节点D,redis cluster的这种做法是从各个节点的前面各拿取一部分slot到D上,我会在接下来的实践中实验。大致就会变成这样:
节点A覆盖1365-5460 节点B覆盖6827-10922 节点C覆盖12288-16383 节点D覆盖0-1364,5461-6826,10923-12287 同样删除一个节点也是类似,移动完成后就可以删除这个节点了。
所以redis cluster 就是这样的一个形状:
架构原理图:
误区:
好多人会认为Redis集群上的数据都是一致的,大错特错!Redis 上面的数据是共享式的,也就是A server有的B server不一定有。类似于Raid 5,写入数据可能是A磁盘 可能是B磁盘。你可以正常读取,但真正的存储位置你是不晓得在哪儿的!
因为Redis集群需要ruby的支持,所以我们需要安装最新版本的2.2来提供支持,但是我们使用yum来安装的ruby是2.0的版本,是完全不可以满足咱们最新版的zabbix!所以我们需要使用另外一种方法!
场景设置:
主机名 | IP 地址 |
---|---|
zhdy01 | 192.168.96.129 |
zhdy02 | 192.168.96.135 |
在zhdy01上配置(一定需要关闭iptables,selinux):
配置多端口(模拟多台主redis机器)
cd /etc
vim redis_7000.conf
port 7000
bind 192.168.96.129
daemonize yes
pidfile /var/run/redis_7000.pid
dir /data/redis_data/7000
cluster-enabled yes
cluster-config-file nodes_7000.conf
cluster-node-timeout 10100
appendonly yes
vim redis_7002.conf
port 7002
bind 192.168.96.129
daemonize yes
pidfile /var/run/redis_7002.pid
dir /data/redis_data/7002
cluster-enabled yes
cluster-config-file nodes_7002.conf
cluster-node-timeout 10100
appendonly yes
vim redis_7004.conf
port 7004
bind 192.168.96.129
daemonize yes
pidfile /var/run/redis_7004.pid
dir /data/redis_data/7004
cluster-enabled yes
cluster-config-file nodes_7004.conf
cluster-node-timeout 10100
appendonly yes
----------------------------------
mkdir /data/redis_data
mkdir /data/redis_data/{7000,7002,7004}
redis-server /etc/redis_7000.conf
redis-server /etc/redis_7002.conf
redis-server /etc/redis_7004.conf
在zhdy02上配置(一定需要关闭iptables,selinux):
vim redis_7001.conf
port 7001
bind 192.168.96.135
daemonize yes
pidfile /var/run/redis_7001.pid
dir /data/redis_data/7001
cluster-enabled yes
cluster-config-file nodes_7001.conf
cluster-node-timeout 10100
appendonly yes
vim redis_7003.conf
port 7003
bind 192.168.96.135
daemonize yes
pidfile /var/run/redis_7003.pid
dir /data/redis_data/7003
cluster-enabled yes
cluster-config-file nodes_7003.conf
cluster-node-timeout 10100
appendonly yes
port 7005
bind 192.168.96.135
daemonize yes
pidfile /var/run/redis_7005.pid
dir /data/redis_data/7005
cluster-enabled yes
cluster-config-file nodes_7005.conf
cluster-node-timeout 10100
appendonly yes
------------------------------------------------
mkdir /data/redis_data
mkdir /data/redis_data/{7001,7003,7005}
redis-server /etc/redis_7001.conf
redis-server /etc/redis_7003.conf
redis-server /etc/redis_7005.conf
在 zhdy01 安装ruby2.2 (只需要一台机器上运行)
yum -y groupinstall "Development Tools"
yum -y install gdbm-devel libdb4-devel libffi-devel libyaml libyaml-devel ncurses-devel openssl-devel readline-devel tcl-deve
cd /root/
mkdir -p rpmbuild/{BUILD,BUILDROOT,RPMS,SOURCES,SPECS,SRPMS}
wget http://cache.ruby-lang.org/pub/ruby/2.2/ruby-2.2.3.tar.gz -P rpmbuild/SOURCES
wget https://raw.githubusercontent.com/tjinjin/automate-ruby-rpm/master/ruby22x.spec -P rpmbuild/SPECS
rpmbuild -bb rpmbuild/SPECS/ruby22x.spec
yum -y localinstall rpmbuild/RPMS/x86_64/ruby-2.2.3-1.el7.centos.x86_64.rpm
gem install redis
[[email protected]01 ~]# ruby -v
ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-linux]
cp /usr/local/src/redis-4.0.1/src/redis-trib.rb /usr/bin/
redis-trib.rb create --replicas 1 192.168.96.129:7000 192.168.96.129:7002 192.168.96.129:7004 192.168.96.135:7001 192.168.96.135:7003 192.168.96.135:7005
Using 3 masters:
192.168.96.129:7000
192.168.96.135:7001
192.168.96.129:7002
Adding replica 192.168.96.135:7003 to 192.168.96.129:7000
Adding replica 192.168.96.129:7004 to 192.168.96.135:7001
Adding replica 192.168.96.135:7005 to 192.168.96.129:7002
M: a96e1eab9eb6922078da06558849326a2c15f03b 192.168.96.129:7000
slots:0-5460 (5461 slots) master
M: 2af0c05078876a8e0f6f956592f203fb1a58c5ea 192.168.96.129:7002
slots:10923-16383 (5461 slots) master
S: b95b371907cb5522f7b18b0dc6293f36c211723f 192.168.96.129:7004
replicates ff56e40e852131461a6a018289c49a2cd84cbe0e
M: ff56e40e852131461a6a018289c49a2cd84cbe0e 192.168.96.135:7001
slots:5461-10922 (5462 slots) master
S: 31ef6c7c92adcb92415933aa6134cbb24fe4131d 192.168.96.135:7003
replicates a96e1eab9eb6922078da06558849326a2c15f03b
S: dcb1b5d2e336812825da8a7460b7503b1bf44c81 192.168.96.135:7005
replicates 2af0c05078876a8e0f6f956592f203fb1a58c5ea
Can I set the above configuration? (type 'yes' to accept): yes
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
看到如上两个ok就代表着已经成功配置redis的主从!
redis cluster 为了保证数据的高可用性,加入了主从模式,一个主节点对应一个或多个从节点,主节点提供数据存取,从节点则是从主节点拉取数据备份,当这个主节点挂掉后,就会有这个从节点选取一个来充当主节点,从而保证集群不会挂掉。
上面那个例子里, 集群有ABC三个主节点, 如果这3个节点都没有加入从节点,如果B挂掉了,我们就无法访问整个集群了。A和C的slot也无法访问。
所以我们在集群建立的时候,一定要为每个主节点都添加了从节点, 比如像这样, 集群包含主节点A、B、C, 以及从节点A1、B1、C1, 那么即使B挂掉系统也可以继续正确工作。
B1节点替代了B节点,所以Redis集群将会选择B1节点作为新的主节点,集群将会继续正确地提供服务。 当B重新开启后,它就会变成B1的从节点。
不过需要注意,如果节点B和B1同时挂了,Redis集群就无法继续正确地提供服务了。
我们既然已经配置了分布式的Redis,所以在任意一个节点都可以创建key,或者查看key!
-c 说明以集群的方式登录:
redis-cli -c -h 192.168.96.129 -p 7000
192.168.96.129:7000> set key1 123 //重定向到了 96.135:7001
-> Redirected to slot [9189] located at 192.168.96.135:7001
OK
192.168.96.135:7001> set key2 qwe //重定向到了 96.129:7000
-> Redirected to slot [4998] located at 192.168.96.129:7000
OK
192.168.96.129:7000> set key2 qweas //创建到了本机上
OK
192.168.96.129:7000> set key2 qweasa
OK
192.168.96.129:7000> set key3 sqweasa
OK
192.168.96.129:7000> set key4 asd1 //重定向到了 96.129:7002
-> Redirected to slot [13120] located at 192.168.96.129:7002
OK
---------------------------------------------------------------
查看所存储的值:
192.168.96.129:7002> get key4
"asd1"
192.168.96.129:7002> get key2
-> Redirected to slot [4998] located at 192.168.96.129:7000
"qweasa"
192.168.96.129:7000> get key1
-> Redirected to slot [9189] located at 192.168.96.135:7001
"123"
-----------------------------------------------------------------
查看主从的状态:
[[email protected]01 data]# redis-trib.rb check 192.168.96.129:7002
>>> Performing Cluster Check (using node 192.168.96.129:7002)
M: 2af0c05078876a8e0f6f956592f203fb1a58c5ea 192.168.96.129:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: dcb1b5d2e336812825da8a7460b7503b1bf44c81 192.168.96.135:7005
slots: (0 slots) slave
replicates 2af0c05078876a8e0f6f956592f203fb1a58c5ea
M: a96e1eab9eb6922078da06558849326a2c15f03b 192.168.96.129:7000
slots:0-5460 (5461 slots) master
1 additional replica(s)
M: ff56e40e852131461a6a018289c49a2cd84cbe0e 192.168.96.135:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 31ef6c7c92adcb92415933aa6134cbb24fe4131d 192.168.96.135:7003
slots: (0 slots) slave
replicates a96e1eab9eb6922078da06558849326a2c15f03b
S: b95b371907cb5522f7b18b0dc6293f36c211723f 192.168.96.129:7004
slots: (0 slots) slave
replicates ff56e40e852131461a6a018289c49a2cd84cbe0e
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
------------------------------------------------------------------
cluster nodes //列出节点信息(如下我们可以看到,谁是谁的主,谁是谁的从)
192.168.96.129:7000> cluster nodes
2af0c05078876a8e0f6f956592f203fb1a58c5ea 192.168.96.129:7002@17002 master - 0 1508164558846 2 connected 10923-16383
b95b371907cb5522f7b18b0dc6293f36c211723f 192.168.96.129:7004@17004 slave ff56e40e852131461a6a018289c49a2cd84cbe0e 0 1508164557831 4 connected
a96e1eab9eb6922078da06558849326a2c15f03b 192.168.96.129:7000@17000 myself,master - 0 1508164558000 1 connected 0-5460
ff56e40e852131461a6a018289c49a2cd84cbe0e 192.168.96.135:7001@17001 master - 0 1508164555000 4 connected 5461-10922
31ef6c7c92adcb92415933aa6134cbb24fe4131d 192.168.96.135:7003@17003 slave a96e1eab9eb6922078da06558849326a2c15f03b 0 1508164557000 5 connected
dcb1b5d2e336812825da8a7460b7503b1bf44c81 192.168.96.135:7005@17005 slave 2af0c05078876a8e0f6f956592f203fb1a58c5ea 0 1508164556816 6 connected
-------------------------------------------------------------------
cluster info //查看集群信息
192.168.96.129:7000> CLUSTER INFO
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:1290
cluster_stats_messages_pong_sent:1293
cluster_stats_messages_sent:2583
cluster_stats_messages_ping_received:1288
cluster_stats_messages_pong_received:1290
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:2583
-----------------------------------------------------------------------
有时候我们的Redis遇到瓶颈,需要再次添加一台或者多台server去解决问题,增加了新的机器如何加入到分布式呢?
例如我们在zhdy02 上面再次增加一个 redis_7007.conf
cluster meet 192.168.96.135 7007 //添加节点
192.168.96.129:7000> cluster nodes //查看到7007这个端口直接是给的主的身份
509d62a47ec57b12297a6427307e916dadbf86a8 192.168.96.135:7007@17007 master - 0 1508165098611 0 connected
我们再次增加一个机器 也同样的加进来,待会再次看一下状态:
192.168.96.129:7000> cluster nodes
a06bdcef46ec7b373308019711067cf613f3cad0 192.168.96.135:7006@17006 master - 0 1508165286000 0 connected
这样我们就可以确认,添加的新机器全部都是默认指定为 主 的一个状态!但是不太合理,我们需要怎么手动修改呢?
例如我们把刚刚添加的7006设置为7007的从:
格式:cluster replicate node_id //将当前节点设置为指定节点的从
首先我们先要登录到7006,这样才满足当前节点
[[email protected]01 data]# redis-cli -c -h 192.168.96.135 -p 7006
然后使用 192.168.96.135:7006> CLUSTER NODES 查看 node_id 然后复制7007的node_id
查看:
192.168.96.135:7006> CLUSTER NODES (只复制了如下两行,也就是从7006上面我们可以看出对于的node_id就是7007的)
a06bdcef46ec7b373308019711067cf613f3cad0 192.168.96.135:7006@17006 myself,slave 509d62a47ec57b12297a6427307e916dadbf86a8 0 1508165704000 0 connected
509d62a47ec57b12297a6427307e916dadbf86a8 192.168.96.135:7007@17007 master - 0 1508165707000 7 connected
-------------------------------------------------------------------
cluster forget node_id //移除某个节点
移出某个节点:
①主机的状态为 master 不可以移出(需要先改变状态为某个主节点的从节点)然后就可以移出
②移出从主机需要确认你当前登录的节点不是你要移出的从节点
移出刚刚的7006从节点:
192.168.96.135:7007> CLUSTER FORGET a06bdcef46ec7b373308019711067cf613f3cad0
OK
-----------------------------------------------------------------------
cluster saveconfig //保存配置文件
192.168.96.135:7007> CLUSTER SAVECONFIG
OK
然后我们就可以在每个数据目录中看到配置数据:
[[email protected] etc]# cat /data/redis_data/7001/nodes_7001.conf
下面,我们先来模拟其中一台Master主服务器挂掉的情况,那就7004挂掉吧:
[[email protected]01 ~]# ps -ef|grep redis //查看PID
root 2840 1 0 20:46 ? 00:00:01 redis-server 192.168.96.129:7000 [cluster]
root 2845 1 0 20:46 ? 00:00:01 redis-server 192.168.96.129:7002 [cluster]
root 2850 1 0 20:46 ? 00:00:01 redis-server 192.168.96.129:7004 [cluster]
root 2870 2531 0 20:51 pts/0 00:00:00 grep --color=auto redis
[[email protected]01 ~]# redis-trib.rb check 192.168.96.129:7002 //查看各个状态
>>> Performing Cluster Check (using node 192.168.96.129:7002)
M: 2af0c05078876a8e0f6f956592f203fb1a58c5ea 192.168.96.129:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: dcb1b5d2e336812825da8a7460b7503b1bf44c81 192.168.96.135:7005
slots: (0 slots) slave
replicates 2af0c05078876a8e0f6f956592f203fb1a58c5ea
M: b95b371907cb5522f7b18b0dc6293f36c211723f 192.168.96.129:7004
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 31ef6c7c92adcb92415933aa6134cbb24fe4131d 192.168.96.135:7003
slots: (0 slots) slave
replicates a96e1eab9eb6922078da06558849326a2c15f03b
S: ff56e40e852131461a6a018289c49a2cd84cbe0e 192.168.96.135:7001
slots: (0 slots) slave
replicates b95b371907cb5522f7b18b0dc6293f36c211723f
M: a96e1eab9eb6922078da06558849326a2c15f03b 192.168.96.129:7000
slots:0-5460 (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[[email protected]01 ~]# kill 2850 //把7004干掉
[[email protected]01 ~]# ps -ef|grep redis //只剩下7000 和7002了
root 2840 1 0 20:46 ? 00:00:02 redis-server 192.168.96.129:7000 [cluster]
root 2845 1 0 20:46 ? 00:00:02 redis-server 192.168.96.129:7002 [cluster]
root 2874 2531 0 20:52 pts/0 00:00:00 grep --color=auto redis
[[email protected]01 ~]# redis-trib.rb check 192.168.96.129:7002 //再看一下状态(三个主两个从 之前作为从的7001顺利的成为了主)
>>> Performing Cluster Check (using node 192.168.96.129:7002)
M: 2af0c05078876a8e0f6f956592f203fb1a58c5ea 192.168.96.129:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: dcb1b5d2e336812825da8a7460b7503b1bf44c81 192.168.96.135:7005
slots: (0 slots) slave
replicates 2af0c05078876a8e0f6f956592f203fb1a58c5ea
S: 31ef6c7c92adcb92415933aa6134cbb24fe4131d 192.168.96.135:7003
slots: (0 slots) slave
replicates a96e1eab9eb6922078da06558849326a2c15f03b
M: ff56e40e852131461a6a018289c49a2cd84cbe0e 192.168.96.135:7001
slots:5461-10922 (5462 slots) master
0 additional replica(s)
M: a96e1eab9eb6922078da06558849326a2c15f03b 192.168.96.129:7000
slots:0-5460 (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
好,安装前面的理论,7004主节点挂掉了,那么这个时候,7004的从节点只有7001一个,肯定7001就会被选举称Master节点了。
OK。我们再来模拟 7004节点重新启动了的情况,那么它还会自动加入到集群中吗?那么,7004这个节点上充当什么角色呢? 我们试一下:
重新启动 7004 节点:
[[email protected]01 ~]# redis-server /etc/redis_7004.conf //再次启动
2877:C 17 Oct 20:53:11.215 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2877:C 17 Oct 20:53:11.215 # Redis version=4.0.1, bits=64, commit=00000000, modified=0, pid=2877, just started
2877:C 17 Oct 20:53:11.215 # Configuration loaded
[[email protected]01 ~]# redis-trib.rb check 192.168.96.129:7002
>>> Performing Cluster Check (using node 192.168.96.129:7002)
M: 2af0c05078876a8e0f6f956592f203fb1a58c5ea 192.168.96.129:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: dcb1b5d2e336812825da8a7460b7503b1bf44c81 192.168.96.135:7005
slots: (0 slots) slave
replicates 2af0c05078876a8e0f6f956592f203fb1a58c5ea
S: b95b371907cb5522f7b18b0dc6293f36c211723f 192.168.96.129:7004
slots: (0 slots) slave
replicates ff56e40e852131461a6a018289c49a2cd84cbe0e
S: 31ef6c7c92adcb92415933aa6134cbb24fe4131d 192.168.96.135:7003
slots: (0 slots) slave
replicates a96e1eab9eb6922078da06558849326a2c15f03b
M: ff56e40e852131461a6a018289c49a2cd84cbe0e 192.168.96.135:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
M: a96e1eab9eb6922078da06558849326a2c15f03b 192.168.96.129:7000
slots:0-5460 (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
你看,7004节点启动起来了,它却作为了 7001 的从节点了。
《更多信息》