一文带你揭开Redis复制原理的神秘面纱

数据和云

发布于 2019-09-17 14:49:30

5460

发布于 2019-09-17 14:49:30

文章被收录于专栏：数据和云

墨墨导读：本文在依托Redis主从环境下，针对访问的数据一致性进行分析，解开Redis复制原理的神秘面纱。‍

Redis作为一个非结构化的内存数据库，在某些应用场景具备相应优势，在实际的场景设计中也得到广泛的关注和使用。但是，大部分企业的Redis数据库架构为单机运行，没有设计容灾复制，这样对于Redis的容错特性没有发挥出来，而且无持久化情况下，数据存在丢失风险。

特别是以一些微服务企业，Redis作为热点数据层，面对传统数据库的请求较少。因此对于缓存依赖性就很高，一旦出现缓存崩掉，所有的数据就会流入到传统数据库，对于高并发情况下，这样的性能反应就会很差。

因此，本文在依托Redis主从环境下，针对访问的数据一致性进行分析，解开Redis复制原理的神秘面纱。

Redis架构

开篇以Redis的架构出发，这也是分析Redis数据一致性的基础前提，对Redis的架构进而了解后，分析数据一致性的实现原理。本文着重与Redis集群与主从复制进行对比分析。

Redis的常规架构方式有以下几种：

Ø 单机单实例运行

Ø 系统HA主从复制

Ø Redis 集群

Ø 系统HA Redis 集群

图1，Redis架构类型

Redis 集群

在很多企业中没有Redis集群，但是至少做了主从复制。有了主从复制，当主节点挂掉的时候，可以让从节点过来进行接管，这样服务可以继续运行。如果没有此操作，那么要恢复业务，就需要等主节点进行数据恢复和重启，不仅耗时较长，同时影响业务的连续性。

Redis 集群提供了以下两个好处：

Ø 将数据自动切分（split）到多个节点的能力。

Ø 当集群中的一部分节点失效或者无法进行通讯时，仍然可以继续处理命令请求的能力。

Codis 是redis的集群方案之一，欣慰的是它是国内自己的工程师开发的。

图2，国产codis集群架构

Redis 主从复制

主从复制的诞生，就是为了存在单节点故障情况下，可以进行快速转移，使得业务可以正常运作。Redis 集群对节点使用了主从复制功能：集群中的每个节点都有 1 个至 N 个复制节点（replica），其中一个复制节点为主节点（master），而其余的 N-1 个复制节点为从节点（slave）。

复制虽然解决了数据多副本的问题，但是同时也存在多副本一致性的难题。在此之前，构建一套主从复制模型，针对其运行进行剖析。

对于主从复制的搭建也是非常简单的，为了方便演示下面简述一下其搭建过程。

· 主服务器IP：127.0.0.1

· 主服务器端口：6379

· 从服务器IP：127.0.0.1

· 主服务器端口：6380

1）整理conf配置文件

复制一份conf配置文件给从库使用，方便后期从库的配置管理

[redis@albert redis-5.0.4]$ cp redis.conf redis.conf6380

2）同步复制配置

备注：仅在从库上进行设置

################################# REPLICATION #################################
 
# Master-Replica replication. Use replicaof to make a Redis instance a copy of
# another Redis server. A few things to understand ASAP about Redis replication.
#
#   +------------------+      +---------------+
#   |      Master      | ---> |    Replica    |
#   | (receive writes) |      |  (exact copy) |
#   +------------------+      +---------------+
#
# 1) Redis replication is asynchronous, but you can configure a master to
#    stop accepting writes if it appears to be not connected with at least
#    a given number of replicas.
# 2) Redis replicas are able to perform a partial resynchronization with the
#    master if the replication link is lost for a relatively small amount of
#    time. You may want to configure the replication backlog size (see the next
#    sections of this file) with a sensible value depending on your needs.
# 3) Replication is automatic and does not need user intervention. After a
#    network partition replicas automatically try to reconnect to masters
#    and resynchronize with them.
#
# replicaof <masterip> <masterport>
# slaveof <masterip> <masterport>
  slaveof 127.0.0.1 6379

3）启动从库实例

[redis@albert src]$ ./redis-server redis.conf6380
18828:C 04 Aug 2019 10:52:27.743 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
18828:C 04 Aug 2019 10:52:27.744 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=18828, just started
18828:C 04 Aug 2019 10:52:27.744 # Configuration loaded
                _._                                                
           _.-``__ ''-._                                           
      _.-``    `.  `_.  ''-._           Redis 5.0.4 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                 
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6380
 |    `-._   `._    /     _.-'    |     PID: 18828
  `-._    `-._  `-./  _.-'    _.-'                                 
 |`-._`-._    `-.__.-'    _.-'_.-'|                                 
 |    `-._`-._        _.-'_.-'    |           http://redis.io      
  `-._    `-._`-.__.-'_.-'    _.-'                                 
 |`-._`-._    `-.__.-'    _.-'_.-'|                                
 |    `-._`-._        _.-'_.-'    |                                 
  `-._    `-._`-.__.-'_.-'    _.-'                                 
      `-._    `-.__.-'    _.-'                                     
          `-._        _.-'                                         
              `-.__.-'                                             
 
18828:S 04 Aug 2019 10:52:27.746 # Server initialized
18828:S 04 Aug 2019 10:52:27.746 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
18828:S 04 Aug 2019 10:52:27.746 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
18828:S 04 Aug 2019 10:52:27.746 * Ready to accept connections
18828:S 04 Aug 2019 10:52:27.746 * Connecting to MASTER 127.0.0.1:6379
18828:S 04 Aug 2019 10:52:27.746 * MASTER <-> REPLICA sync started
18828:S 04 Aug 2019 10:52:27.746 * Non blocking connect for SYNC fired the event.
18828:S 04 Aug 2019 10:52:27.746 * Master replied to PING, replication can continue...
18828:S 04 Aug 2019 10:52:27.747 * Partial resynchronization not possible (no cached master)
18828:S 04 Aug 2019 10:52:27.748 * Full resync from master: fc71b19242e8145254ba7751d346a8f4bb4c53c6:0
18828:S 04 Aug 2019 10:52:27.788 * MASTER <-> REPLICA sync: receiving 175 bytes from master
18828:S 04 Aug 2019 10:52:27.788 * MASTER <-> REPLICA sync: Flushing old data
18828:S 04 Aug 2019 10:52:27.788 * MASTER <-> REPLICA sync: Loading DB in memory
18828:S 04 Aug 2019 10:52:27.788 * MASTER <-> REPLICA sync: Finished with success

与此同时，可以在主库的日志中，观察到新加入的从库信息

18661:M 04 Aug 2019 10:52:27.747 * Replica 127.0.0.1:6380 asks for synchronization
18661:M 04 Aug 2019 10:52:27.747 * Full resync requested by replica 127.0.0.1:6380
18661:M 04 Aug 2019 10:52:27.747 * Starting BGSAVE for SYNC with target: disk
18661:M 04 Aug 2019 10:52:27.747 * Background saving started by pid 18832
18832:C 04 Aug 2019 10:52:27.757 * DB saved on disk
18832:C 04 Aug 2019 10:52:27.757 * RDB: 4 MB of memory used by copy-on-write
18661:M 04 Aug 2019 10:52:27.788 * Background saving terminated with success
18661:M 04 Aug 2019 10:52:27.788 * Synchronization with replica 127.0.0.1:6380 succeeded

4）查看同步信息

从库：

[redis@albert src]$ ./redis-cli -p 6380
127.0.0.1:6380>
127.0.0.1:6380>
127.0.0.1:6380> INFO replication
# Replication
role:slave
master_host:127.0.0.1
master_port:6379
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_repl_offset:252
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:fc71b19242e8145254ba7751d346a8f4bb4c53c6
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:252
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:252

主库：

[redis@albert src]$ ./redis-cli -p 6379
127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6380,state=online,offset=364,lag=0
master_replid:fc71b19242e8145254ba7751d346a8f4bb4c53c6
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:364
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:364

做一个同步验证

主库发送：

127.0.0.1:6379> set kebo 24
OK

从库接收：

127.0.0.1:6380> get kebo
"24"

Redis的复制功能中，不仅有主从复制，还存在从从复制。从从同步是为了减轻主节点同步的压力，也类似Oracle ADG中的级联设置。

下面详细介绍Redis实现复制的原理过程

场景一，数据初始化

当从数据库启动后，会向主数据库发送SYNC命令，同时主数据库接收到SYNC命令后开始在后台进行快照（持久化），将保存的快照数据发送到从数据库，同时在执行快照期间缓存命令，同时发送到从数据库，这个过程就是数据初始化。初始化完成后，主数据库接收到命令缓存都会同步从数据库，这样形成基本的数据一致性。

从库请求sync同步：

33570:S 26 Aug 2019 11:54:48.918 * Ready to accept connections
33570:S 26 Aug 2019 11:54:48.918 * Connecting to MASTER 127.0.0.1:6379
33570:S 26 Aug 2019 11:54:48.918 * MASTER <-> REPLICA sync started
33570:S 26 Aug 2019 11:54:48.918 * Non blocking connect for SYNC fired the event.
33570:S 26 Aug 2019 11:54:48.918 * Master replied to PING, replication can continue...
33570:S 26 Aug 2019 11:54:48.918 * Trying a partial resynchronization (request fc71b19242e8145254ba7751d346a8f4bb4c53c6:2533).
33570:S 26 Aug 2019 11:54:48.920 * Full resync from master: b9e0f41a523e078a6a88ae274f204777775ab4dc:0
33570:S 26 Aug 2019 11:54:48.920 * Discarding previously cached master state.
33570:S 26 Aug 2019 11:54:49.003 * MASTER <-> REPLICA sync: receiving 188 bytes from master
33570:S 26 Aug 2019 11:54:49.003 * MASTER <-> REPLICA sync: Flushing old data
33570:S 26 Aug 2019 11:54:49.003 * MASTER <-> REPLICA sync: Loading DB in memory
33570:S 26 Aug 2019 11:54:49.004 * MASTER <-> REPLICA sync: Finished with success

主库发生RDB以及缓存命令到从库：

33565:M 26 Aug 2019 11:54:48.918 * Replica 127.0.0.1:6380 asks for synchronization
33565:M 26 Aug 2019 11:54:48.918 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'fc71b19242e8145254ba7751d346a8f4bb4c53c6', my replication IDs are '1e531f295fc2dcf986a18889e8f8c3b6e6fdc7b6' and '0000000000000000000000000000000000000000')
33565:M 26 Aug 2019 11:54:48.918 * Starting BGSAVE for SYNC with target: disk
33565:M 26 Aug 2019 11:54:48.919 * Background saving started by pid 33574
33574:C 26 Aug 2019 11:54:48.929 * DB saved on disk
33574:C 26 Aug 2019 11:54:48.929 * RDB: 4 MB of memory used by copy-on-write
33565:M 26 Aug 2019 11:54:49.002 * Background saving terminated with success
33565:M 26 Aug 2019 11:54:49.003 * Synchronization with replica 127.0.0.1:6380 succeeded

场景二，中断后同步

如果由于某种故障导致了从数据库暂停了与主数据库的同步后，从数据库重新连接到主数据库，主数据库只需要将中断期间执行的命令传送到从库（基于命令记录格式），这样就可以继续进行同步，保持数据一致性。

注：该特效在Redis 2.8版本后实现，2.6版本之前均需要重新初始化。

简易过程如下：

1）模拟从库中断，将其kill掉

[redis@albert src]$ ps -ef | grep redis
redis     33565  33500  0 11:54 pts/4    00:00:01 ./redis-server *:6379
redis     33570  33476  0 11:54 pts/5    00:00:01 ./redis-server 127.0.0.1:6380
redis     33744  33688  0 12:03 pts/0    00:00:00 ./redis-server 127.0.0.1:6382
[redis@albert src]$
[redis@albert src]$ kill -9 33744

2）主库接收到中断信息

33565:M 26 Aug 2019 12:03:17.736 * Replica 127.0.0.1:6382 asks for synchronization
33565:M 26 Aug 2019 12:03:17.736 * Partial resynchronization request from 127.0.0.1:6382 accepted. Sending 714 bytes of backlog starting from offset 1.
33565:M 26 Aug 2019 12:13:43.494 # Connection with replica 127.0.0.1:6382 lost.

3）根据偏移量进行同步

33565:M 26 Aug 2019 12:14:12.019 * Replica 127.0.0.1:6382 asks for synchronization
33565:M 26 Aug 2019 12:14:12.019 * Partial resynchronization request from 127.0.0.1:6382 accepted. Sending 436 bytes of backlog starting from offset 1315.

场景三，增量复制

由于Redis同步的是命令集合，主数据库记录那些对自己状态发生变更的指令记录到本地内存buffer中，通过异步的方式将buffer中的指令同步到从数据库，从数据库利用这些指令执行，以保持与主数据库一样的状态，也即是通过应用命令的偏移量反馈到主数据库，让主数据库不断发送buffer指令。

偏移量的查看：

role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6380,state=online,offset=3724,lag=1
slave1:ip=127.0.0.1,port=6382,state=online,offset=3724,lag=1
master_replid:b9e0f41a523e078a6a88ae274f204777775ab4dc
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:3724
second_repl_offset:-1
repl_backlog_active:1
 role:slave
master_host:127.0.0.1
master_port:6379
master_link_status:up
master_last_io_seconds_ago:7
master_sync_in_progress:0
slave_repl_offset:3752
slave_priority:100
slave_read_only:1

数据一致性验证

Redis提供了两种数据同步模式，以保持主数据库与从数据库的一致性。

Ø 完整性同步

Ø 部分同步

完整性同步，也就是上文说到的数据初始化过程，将主数据库中所存储的所有数据全部发送到从数据库。

部分同步，对应着上文提及的增量复制，即只将部分数据发送到从数据库。

测试验证

增加从库实例

[redis@albert src]$ cp redis.conf6380 redis.conf6382

调整配置文件

启动新从库

[redis@albert src]$ ./redis-server redis.conf6382
33744:C 26 Aug 2019 12:03:17.731 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
33744:C 26 Aug 2019 12:03:17.731 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=33744, just started
33744:C 26 Aug 2019 12:03:17.731 # Configuration loaded
                _._                                                
           _.-``__ ''-._                                           
      _.-``    `.  `_.  ''-._           Redis 5.0.4 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                  
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6382
 |    `-._   `._    /     _.-'    |     PID: 33744
  `-._    `-._  `-./  _.-'    _.-'                                  
 |`-._`-._    `-.__.-'    _.-'_.-'|                                
 |    `-._`-._        _.-'_.-'    |           http://redis.io      
  `-._    `-._`-.__.-'_.-'    _.-'                                 
 |`-._`-._    `-.__.-'    _.-'_.-'|                                 
 |    `-._`-._        _.-'_.-'    |                                
  `-._    `-._`-.__.-'_.-'    _.-'                                 
      `-._    `-.__.-'    _.-'                                     
          `-._        _.-'                                         
              `-.__.-'                                             
 
33744:S 26 Aug 2019 12:03:17.734 # Server initialized
33744:S 26 Aug 2019 12:03:17.734 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
33744:S 26 Aug 2019 12:03:17.734 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
33744:S 26 Aug 2019 12:03:17.734 * DB loaded from disk: 0.000 seconds
33744:S 26 Aug 2019 12:03:17.734 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
33744:S 26 Aug 2019 12:03:17.734 * Ready to accept connections
33744:S 26 Aug 2019 12:03:17.735 * Connecting to MASTER 127.0.0.1:6379
33744:S 26 Aug 2019 12:03:17.735 * MASTER <-> REPLICA sync started
33744:S 26 Aug 2019 12:03:17.735 * Non blocking connect for SYNC fired the event.
33744:S 26 Aug 2019 12:03:17.735 * Master replied to PING, replication can continue...
33744:S 26 Aug 2019 12:03:17.736 * Trying a partial resynchronization (request b9e0f41a523e078a6a88ae274f204777775ab4dc:1).
33744:S 26 Aug 2019 12:03:17.736 * Successful partial resynchronization with master.
33744:S 26 Aug 2019 12:03:17.736 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.

场景一：主库关机

主库从状态:

# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6380,state=online,offset=6986,lag=1
slave1:ip=127.0.0.1,port=6382,state=online,offset=6986,lag=0

主库写入

127.0.0.1:6379> mset a 1 b 2 c 3 d 4
OK

从库1：

127.0.0.1:6380> mget a b c d
1) "1"
2) "2"
3) "3"
4) "4"

从库2：

127.0.0.1:6382> mget a b c d
1) "1"
2) "2"
3) "3"
4) "4"

关闭主库：

127.0.0.1:6379> shutdown

从库状态：

33570:S 26 Aug 2019 13:23:41.429 * Connecting to MASTER 127.0.0.1:6379
33570:S 26 Aug 2019 13:23:41.429 * MASTER <-> REPLICA sync started
33570:S 26 Aug 2019 13:23:41.429 # Error condition on socket for SYNC: Connection refused
33570:S 26 Aug 2019 13:23:42.441 * Connecting to MASTER 127.0.0.1:6379
33570:S 26 Aug 2019 13:23:42.442 * MASTER <-> REPLICA sync started
33570:S 26 Aug 2019 13:23:42.442 # Error condition on socket for SYNC: Connection refused
33570:S 26 Aug 2019 13:23:43.455 * Connecting to MASTER 127.0.0.1:6379
33570:S 26 Aug 2019 13:23:43.456 * MASTER <-> REPLICA sync started
33570:S 26 Aug 2019 13:23:43.456 # Error condition on socket for SYNC: Connection refused
33570:S 26 Aug 2019 13:23:44.466 * Connecting to MASTER 127.0.0.1:6379
33570:S 26 Aug 2019 13:23:44.466 * MASTER <-> REPLICA sync started
33570:S 26 Aug 2019 13:23:44.466 # Error condition on socket for SYNC: Connection refused
33570:S 26 Aug 2019 13:23:45.470 * Connecting to MASTER 127.0.0.1:6379
33570:S 26 Aug 2019 13:23:45.471 * MASTER <-> REPLICA sync started
33570:S 26 Aug 2019 13:23:45.471 # Error condition on socket for SYNC: Connection refused
33570:S 26 Aug 2019 13:23:46.474 * Connecting to MASTER 127.0.0.1:6379
33570:S 26 Aug 2019 13:23:46.475 * MASTER <-> REPLICA sync started
33570:S 26 Aug 2019 13:23:46.475 # Error condition on socket for SYNC: Connection refused
33570:S 26 Aug 2019 13:23:47.478 * Connecting to MASTER 127.0.0.1:6379
33570:S 26 Aug 2019 13:23:47.478 * MASTER <-> REPLICA sync started
33570:S 26 Aug 2019 13:23:47.478 # Error condition on socket for SYNC: Connection refused
33570:S 26 Aug 2019 13:23:48.481 * Connecting to MASTER 127.0.0.1:6379
33570:S 26 Aug 2019 13:23:48.481 * MASTER <-> REPLICA sync started
33570:S 26 Aug 2019 13:23:48.481 # Error condition on socket for SYNC: Connection refused
33570:S 26 Aug 2019 13:23:49.485 * Connecting to MASTER 127.0.0.1:6379
33570:S 26 Aug 2019 13:23:49.485 * MASTER <-> REPLICA sync started
33570:S 26 Aug 2019 13:23:49.485 # Error condition on socket for SYNC: Connection refused
33570:S 26 Aug 2019 13:23:50.488 * Connecting to MASTER 127.0.0.1:6379

从库开始检测主库的连接，发送sync请求

重启主库：

34781:M 26 Aug 2019 13:24:05.899 * DB loaded from disk: 0.000 seconds
34781:M 26 Aug 2019 13:24:05.899 * Ready to accept connections
34781:M 26 Aug 2019 13:24:06.544 * Replica 127.0.0.1:6380 asks for synchronization
34781:M 26 Aug 2019 13:24:06.544 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'b9e0f41a523e078a6a88ae274f204777775ab4dc', my replication IDs are 'aac4a31754592820422c3ba7c8244f31c39f067f' and '0000000000000000000000000000000000000000')
34781:M 26 Aug 2019 13:24:06.544 * Starting BGSAVE for SYNC with target: disk
34781:M 26 Aug 2019 13:24:06.545 * Background saving started by pid 34785
34785:C 26 Aug 2019 13:24:06.555 * DB saved on disk
34785:C 26 Aug 2019 13:24:06.555 * RDB: 4 MB of memory used by copy-on-write
34781:M 26 Aug 2019 13:24:06.601 * Background saving terminated with success
34781:M 26 Aug 2019 13:24:06.602 * Synchronization with replica 127.0.0.1:6380 succeeded
34781:M 26 Aug 2019 13:24:06.642 * Replica 127.0.0.1:6382 asks for synchronization
34781:M 26 Aug 2019 13:24:06.642 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'b9e0f41a523e078a6a88ae274f204777775ab4dc', my replication IDs are 'd41f950be7593a93620b1fd872b7552668f0b355' and '0000000000000000000000000000000000000000')
34781:M 26 Aug 2019 13:24:06.642 * Starting BGSAVE for SYNC with target: disk
34781:M 26 Aug 2019 13:24:06.643 * Background saving started by pid 34786
34786:C 26 Aug 2019 13:24:06.644 * DB saved on disk
34786:C 26 Aug 2019 13:24:06.644 * RDB: 4 MB of memory used by copy-on-write
34781:M 26 Aug 2019 13:24:06.701 * Background saving terminated with success
34781:M 26 Aug 2019 13:24:06.701 * Synchronization with replica 127.0.0.1:6382 succeeded

重启主数据库后，开始向两个发送同步数据，建立连接

从库信息：

33570:S 26 Aug 2019 13:24:06.544 * Master replied to PING, replication can continue...33570:S 26 Aug 2019 13:24:06.544 * Master replied to PING, replication can continue...
33570:S 26 Aug 2019 13:24:06.544 * Trying a partial resynchronization (request b9e0f41a523e078a6a88ae274f204777775ab4dc:7603).
33570:S 26 Aug 2019 13:24:06.545 * Full resync from master: d41f950be7593a93620b1fd872b7552668f0b355:0
33570:S 26 Aug 2019 13:24:06.545 * Discarding previously cached master state.
33570:S 26 Aug 2019 13:24:06.601 * MASTER <-> REPLICA sync: receiving 240 bytes from master
33570:S 26 Aug 2019 13:24:06.601 * MASTER <-> REPLICA sync: Flushing old data
33570:S 26 Aug 2019 13:24:06.601 * MASTER <-> REPLICA sync: Loading DB in memory
33570:S 26 Aug 2019 13:24:06.601 * MASTER <-> REPLICA sync: Finished with success

从库接收主数据库缓存，并刷新缓存数据。

场景一：从库关机

从库关机：

127.0.0.1:6382> shutdown
not connected>

主数据库信息：

34781:M 26 Aug 2019 13:24:06.701 * Synchronization with replica 127.0.0.1:6382 succeeded
34781:M 26 Aug 2019 13:29:45.037 # Connection with replica 127.0.0.1:6382 lost.
 # Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6380,state=online,offset=546,lag=1

可以在主库看出，目前存活的从数据库只有一个

主数据库写入缓存：

127.0.0.1:6379> set slavetest 0851
OK

从数据库1同步到缓存：

127.0.0.1:6380> keys *
1) "test02"
2) "c"
3) "d"
4) "a"
5) "test"
6) "kebo"
7) "slavetest"
8) "b"
9) "redisfast"
127.0.0.1:6380> get slavetest
"0851"

重启从数据库2：

34855:S 26 Aug 2019 13:34:39.561 * DB loaded from disk: 0.000 seconds
34855:S 26 Aug 2019 13:34:39.561 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
34855:S 26 Aug 2019 13:34:39.561 * Ready to accept connections
34855:S 26 Aug 2019 13:34:39.561 * Connecting to MASTER 127.0.0.1:6379
34855:S 26 Aug 2019 13:34:39.561 * MASTER <-> REPLICA sync started
34855:S 26 Aug 2019 13:34:39.561 * Non blocking connect for SYNC fired the event.
34855:S 26 Aug 2019 13:34:39.561 * Master replied to PING, replication can continue...
34855:S 26 Aug 2019 13:34:39.562 * Trying a partial resynchronization (request d41f950be7593a93620b1fd872b7552668f0b355:888).
34855:S 26 Aug 2019 13:34:39.562 * Successful partial resynchronization with master.
34855:S 26 Aug 2019 13:34:39.562 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.

从库启动后，加载本地数据，开始向主数据库进行请求同步，这里同步起点是根据从数据库的offset开始。

主库信息:

34781:M 26 Aug 2019 13:24:06.701 * Synchronization with replica 127.0.0.1:6382 succeeded
34781:M 26 Aug 2019 13:29:45.037 # Connection with replica 127.0.0.1:6382 lost.
34781:M 26 Aug 2019 13:34:39.562 * Replica 127.0.0.1:6382 asks for synchronization
34781:M 26 Aug 2019 13:34:39.562 * Partial resynchronization request from 127.0.0.1:6382 accepted. Sending 56 bytes of backlog starting from offset 888.

当从数据库重新连上主数据库时，从数据库会通过PSYNC命令将自己的复制偏移量offset发送给主数据库，主数据库会根据这个复制偏移量来决定对从数据库执行何种同步操作。以达到所有数据库的偏移量一致。

127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6380,state=online,offset=1139,lag=1
slave1:ip=127.0.0.1,port=6382,state=online,offset=1139,lag=1
master_replid:d41f950be7593a93620b1fd872b7552668f0b355
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1139
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1139

总结

Redis在主从数据库之间的复制是异步的，这意味着，主数据库执行完客户端请求的命令后会立即将命令发送给请求的客户端，并同步到从数据库，而不会等待从数据库接收到命令后再返回给客户端，因此该情况就会存在一定的数据不一致性风险，也就是Redis不保证强一致性，而是通过从库策略追赶与主数据库的数据差异，以保障主从状态一直。在Redis中也可以设置参数来强行让从库数据同步后，主库才能继续写入，这样的配置对主库访问要求较高，因此会影响主库的性能。

在主从同步中，如果出现数量差异较大，Redis会根据从库的offset进行选择全量、增量的恢复。

一主多从的配置下，会造成主数据库的性能压力，Redis可以利用哨兵或者集群的方案进行分担主库压力，但是两者的偏重点不一样。哨兵是持续监控节点状态，当发生节点故障时，可进行快速转移，将从库变为主数据库。而Redis集群是实现Redis水平扩展的方案，将数据分布到各个集群节点中，考虑多数据副本和访问。

出处：墨天轮（https://www.modb.pro/db/6415，复制到网页中打开或者点击“阅读原文”）

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2019-09-10，如有侵权请联系 cloudcommunity@tencent.com 删除

云数据库 Redis