Redis 为什么这么快？（9）

兜兜毛毛

发布于 2020-03-19 10:39:15

6450

发布于 2020-03-19 10:39:15

文章被收录于专栏：兜兜毛毛

redis-benchmark -t set,lpush -n 100000 -q

SET: 38550.50 requests per second  //每秒处理3.8万多次的set请求
LPUSH: 37821.48 requests per second //每秒处理3.7万多次lpush请求

### 脚本执行次数
redis-benchmark -n 100000 -q script load"redis.call('set','foo','bar')"

script loadredis.call('set','foo','bar'): 37050.76 requests per second

横轴：连接数；纵轴：QPS

根据官方的数据，Redis的QPS可以达到10万左右（每秒请求数）。

Redis 为什么这么快？

纯内存结构
单线程
多路复用

内存

KV结构的内存数据库，时间复杂度O(1)。

单线程

单线程有什么好处呢？

没有创建线程、销毁线程带来的消耗
避免了上下文切换导致的CPU消耗
避免了线程之间带来的竞争问题，例如加锁、释放锁、死锁等等

异步非阻塞

异步非阻塞I/O，多路复用处理并发连接

Redis 为什么是单线程的？

只使用单线程不是白白浪费了CPU的资源吗？

https://redis.io/topics/faq

因为单线程已经够用了，CPU不是redis的瓶颈。Redis的瓶颈最有可能是机器内存或者网络带宽。既然单线程容易实现，而且CPU不会成为瓶颈，那就顺理成章地采用单线程的方案了。

单线程为什么这么快？

因为Redis是基于内存的操作，我们先从内存开始说起。

虚拟存储器（虚拟内存VitualMemory）

名词解释：主存：内存；辅存：磁盘（硬盘）

计算机主存（内存）可看作一个由M个连续的字节大小的单元组成的数组，每个字节有一个唯一的地址，这个地址叫做物理地址（PA）。早期的计算机中，如果CPU需要内存，使用物理寻址，直接访问主存储器。

这种方式有几个弊端：

1、在多用户多任务操作系统中，所有的进程共享主存，如果每个进程都独占一块物理地址空间，主存很快就会被用完。我们希望在不同的时刻，不同的进程可以共用同一块物理地址空间。

2、如果所有进程都是直接访问物理内存，那么一个进程就可以修改其他进程的内存数据，导致物理地址空间被破坏，程序运行就会出现异常。

为了解决这些问题，我们就想了一个办法，在CPU和主存之间增加一个中间层。CPU不再使用物理地址访问，而是访问一个虚拟地址，由这个中间层把地址转换成物理地址，最终获得数据。这个中间层就叫做虚拟存储器（VirtualMemory）。

在每一个进程开始创建的时候，都会分配一段虚拟地址，然后通过虚拟地址和物理地址的映射来获取真实数据，这样进程就不会直接接触到物理地址，甚至不知道自己调用的哪块物理地址的数据。

目前，大多数操作系统都使用了虚拟内存，如Windows系统的虚拟内存、Linux系统的交换空间等等。Windows的虚拟内存（pagefile.sys）是磁盘空间的一部分。

在32位的系统上，虚拟地址空间大小是2^32bit=4G。在64位系统上，最大虚拟地址空间大小是多少？是不是2^64bit=1024*1014TB=1024PB=16EB？实际上没有用到64位，因为用不到这么大的空间，而且会造成很大的系统开销。Linux一般用低48位来表示虚拟地址空间，也就是2^48bit=256T。

cat /proc/cpuinfo

addresssizes:40bitsphysical,48bitsvirtual

实际的物理内存可能远远小于虚拟内存的大小。

总结：引入虚拟内存，可以提供更大的地址空间，并且地址空间是连续的，使得程序编写、链接更加简单。并且可以对物理内存进行隔离，不同的进程操作互不影响。还可以通过把同一块物理内存映射到不同的虚拟地址空间实现内存共享。

用户空间和内核空间

为了避免用户进程直接操作内核，保证内核安全，操作系统将虚拟内存划分为两部分，一部分是内核空间（Kernel-space）/ˈkɜːnl/，一部分是用户空间（User-space）。

内核是操作系统的核心，独立于普通的应用程序，可以访问受保护的内存空间，也有访问底层硬件设备的权限。

内核空间中存放的是内核代码和数据，而进程的用户空间中存放的是用户程序的代码和数据。不管是内核空间还是用户空间，它们都处于虚拟空间中，都是对物理地址的映射。

当进程运行在内核空间时就处于内核态，而进程运行在用户空间时则处于用户态。

进程在内核空间以执行任意命令，调用系统的一切资源；在用户空间只能执行简单的运算，不能直接调用系统资源，必须通过系统接口（又称systemcall），才能向内核发出指令。

top命令：

user代表CPU消耗在User space 的时间百分比； sys代表CPU消耗在Kernel space的时间百分比；

进程切换（上下文切换）

多任务操作系统是怎么实现运行远大于CPU数量的任务个数的？当然，这些任务实际上并不是真的在同时运行，而是因为系统通过时间片分片算法，在很短的时间内，将CPU轮流分配给它们，造成多任务同时运行的错觉。

为了控制进程的执行，内核必须有能力挂起正在CPU上运行的进程，并恢复以前挂起的某个进程的执行。这种行为被称为进程切换。

什么叫上下文？

在每个任务运行前，CPU都需要知道任务从哪里加载、又从哪里开始运行，也就是说，需要系统事先帮它设置好CPU寄存器和程序计数器(ProgramCounter)，这个叫做CPU的上下文。

而这些保存下来的上下文，会存储在系统内核中，并在任务重新调度执行时再次加载进来。这样就能保证任务原来的状态不受影响，让任务看起来还是连续运行。

在切换上下文的时候，需要完成一系列的工作，这是一个很消耗资源的操作。

进程的阻塞

正在运行的进程由于提出系统服务请求（如I/O操作），但因为某种原因未得到操作系统的立即响应，该进程只能把自己变成阻塞状态，等待相应的事件出现后才被唤醒。进程在阻塞状态不占用CPU资源。

文件描述符FD

Linux系统将所有设备都当作文件来处理，而Linux用文件描述符来标识每个文件对象。

文件描述符（FileDescriptor）是内核为了高效管理已被打开的文件所创建的索引，用于指向被打开的文件，所有执行I/O操作的系统调用都通过文件描述符；文件描述符是一个简单的非负整数，用以表明每个被进程打开的文件。

Linux系统里面有三个标准文件描述符。

0：标准输入（键盘）；1：标准输出（显示器）；2：标准错误输出（显示器）。

传统I/O数据拷贝

当应用程序执行read系统调用读取文件描述符（FD）的时候，如果这块数据已经存在于用户进程的页内存中，就直接从内存中读取数据。如果数据不存在，则先将数据从磁盘加载数据到内核缓冲区中，再从内核缓冲区拷贝到用户进程的页内存中。（两次拷贝，两次user和kernel的上下文切换）。

BlockingI/O（I/O的阻塞到底阻塞在哪里？）

当使用read或write对某个文件描述符进行过读写时，如果当前FD不可读，系统就不会对其他的操作做出响应。从设备复制数据到内核缓冲区是阻塞的，从内核缓冲区拷贝到用户空间，也是阻塞的，直到copycomplete，内核返回结果，用户进程才解除block的状态。

为了解决阻塞的问题，我们有几个思路。

1、在服务端创建多个线程或者使用线程池，但是在高并发的情况下需要的线程会很多，系统无法承受，而且创建和释放线程都需要消耗资源。

2、由请求方定期轮询，在数据准备完毕后再从内核缓存缓冲区复制数据到用户空间（非阻塞式I/O），这种方式会存在一定的延迟。

I/O多路复用（I/OMultiplexing)

I/O指的是网络I/O。

多路：指的是多个TCP连接（Socket或Channel）。

复用：指的是复用一个或多个线程。

它的基本原理就是不再由应用程序自己监视连接，而是由内核替应用程序监视文件描述符。

客户端在操作的时候，会产生具有不同事件类型的socket。在服务端，I/O多路复用程序（I/OMultiplexingModule）会把消息放入队列中，然后通过文件事件分派器（FileeventDispatcher），转发到不同的事件处理器中。

多路复用有很多的实现，以select为例，当用户进程调用了多路复用器，进程会被阻塞。内核会监视多路复用器负责的所有socket，当任何一个socket的数据准备好了，多路复用器就会返回。这时候用户进程再调用read操作，把数据从内核缓冲区拷贝到用户空间。

所以，I/O多路复用的特点是通过一种机制一个进程能同时等待多个文件描述符，而这些文件描述符（套接字描述符）其中的任意一个进入读就绪（readable）状态，select()函数就可以返回。

Redis的多路复用，提供了select,epoll,evport,kqueue几种选择，在编译的时候来选择一种。源码ae.c

/* Include the best multiplexing layer supported by this system.
 * The following should be ordered by performances, descending. */
#ifdef HAVE_EVPORT
#include "ae_evport.c"
#else
    #ifdef HAVE_EPOLL
    #include "ae_epoll.c"
    #else
        #ifdef HAVE_KQUEUE
        #include "ae_kqueue.c"
        #else
        #include "ae_select.c"
        #endif
    #endif
#endif

evport是Solaris系统内核提供支持的；
epoll是LINUX系统内核提供支持的；
kqueue是Mac系统提供支持的；
select是POSIX提供的，一般的操作系统都有支撑（保底方案）；

源码ae_epoll.c、ae_select.c、ae_kqueue.c、ae_evport.c

内存回收

Reids所有的数据都是存储在内存中的，在某些情况下需要对占用的内存空间进行回收。内存回收主要分为两类，一类是key过期，一类是内存使用达到上限（max_memory）触发内存淘汰。

过期策略

要实现key过期，我们有几种思路。

定时过期（主动淘汰）

每个设置过期时间的key都需要创建一个定时器，到过期时间就会立即清除。该策略可以立即清除过期的数据，对内存很友好；但是会占用大量的CPU资源去处理过期的数据，从而影响缓存的响应时间和吞吐量。

惰性过期（被动淘汰）

只有当访问一个key时，才会判断该key是否已过期，过期则清除。该策略可以最大化地节省CPU资源，却对内存非常不友好。极端情况可能出现大量的过期key没有再次被访问，从而不会被清除，占用大量内存。

例如String，在getCommand里面会调用expireIfNeeded

db.c expireIfNeeded(redisDb *db,robj *key)

/* This function is called when we are going to perform some operation
 * in a given key, but such key may be already logically expired even if
 * it still exists in the database. The main way this function is called
 * is via lookupKey*() family of functions.
 *
 * The behavior of the function depends on the replication role of the
 * instance, because slave instances do not expire keys, they wait
 * for DELs from the master for consistency matters. However even
 * slaves will try to have a coherent return value for the function,
 * so that read commands executed in the slave side will be able to
 * behave like if the key is expired even if still present (because the
 * master has yet to propagate the DEL).
 *
 * In masters as a side effect of finding a key which is expired, such
 * key will be evicted from the database. Also this may trigger the
 * propagation of a DEL/UNLINK command in AOF / replication stream.
 *
 * The return value of the function is 0 if the key is still valid,
 * otherwise the function returns 1 if the key is expired. */
int expireIfNeeded(redisDb *db, robj *key) {
    if (!keyIsExpired(db,key)) return 0;

    /* If we are running in the context of a slave, instead of
     * evicting the expired key from the database, we return ASAP:
     * the slave key expiration is controlled by the master that will
     * send us synthesized DEL operations for expired keys.
     *
     * Still we try to return the right information to the caller,
     * that is, 0 if we think the key should be still valid, 1 if
     * we think the key is expired at this time. */
    if (server.masterhost != NULL) return 1;

    /* Delete the key */
    server.stat_expiredkeys++;
    propagateExpire(db,key,server.lazyfree_lazy_expire);
    notifyKeyspaceEvent(NOTIFY_EXPIRED,
        "expired",key,db->id);
    return server.lazyfree_lazy_expire ? dbAsyncDelete(db,key) :
                                         dbSyncDelete(db,key);
}

第二种情况，每次写入key时，发现内存不够，调用activeExpireCycle释放一部分内存。

expire.c activeExpireCycle(inttype)

/* Try to expire a few timed out keys. The algorithm used is adaptive and
 * will use few CPU cycles if there are few expiring keys, otherwise
 * it will get more aggressive to avoid that too much memory is used by
 * keys that can be removed from the keyspace.
 *
 * No more than CRON_DBS_PER_CALL databases are tested at every
 * iteration.
 *
 * This kind of call is used when Redis detects that timelimit_exit is
 * true, so there is more work to do, and we do it more incrementally from
 * the beforeSleep() function of the event loop.
 *
 * Expire cycle type:
 *
 * If type is ACTIVE_EXPIRE_CYCLE_FAST the function will try to run a
 * "fast" expire cycle that takes no longer than EXPIRE_FAST_CYCLE_DURATION
 * microseconds, and is not repeated again before the same amount of time.
 *
 * If type is ACTIVE_EXPIRE_CYCLE_SLOW, that normal expire cycle is
 * executed, where the time limit is a percentage of the REDIS_HZ period
 * as specified by the ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC define. */

void activeExpireCycle(int type) {
    /* This function has some global state in order to continue the work
     * incrementally across calls. */
    static unsigned int current_db = 0; /* Last DB tested. */
    static int timelimit_exit = 0;      /* Time limit hit in previous call? */
    static long long last_fast_cycle = 0; /* When last fast cycle ran. */

    int j, iteration = 0;
    int dbs_per_call = CRON_DBS_PER_CALL;
    long long start = ustime(), timelimit, elapsed;

    /* When clients are paused the dataset should be static not just from the
     * POV of clients not being able to write, but also from the POV of
     * expires and evictions of keys not being performed. */
    if (clientsArePaused()) return;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST) {
        /* Don't start a fast cycle if the previous cycle did not exit
         * for time limit. Also don't repeat a fast cycle for the same period
         * as the fast cycle total duration itself. */
        if (!timelimit_exit) return;
        if (start < last_fast_cycle + ACTIVE_EXPIRE_CYCLE_FAST_DURATION*2) return;
        last_fast_cycle = start;
    }

    /* We usually should test CRON_DBS_PER_CALL per iteration, with
     * two exceptions:
     *
     * 1) Don't test more DBs than we have.
     * 2) If last time we hit the time limit, we want to scan all DBs
     * in this iteration, as there is work to do in some DB and we don't want
     * expired keys to use memory for too much time. */
    if (dbs_per_call > server.dbnum || timelimit_exit)
        dbs_per_call = server.dbnum;

    /* We can use at max ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC percentage of CPU time
     * per iteration. Since this function gets called with a frequency of
     * server.hz times per second, the following is the max amount of
     * microseconds we can spend in this function. */
    timelimit = 1000000*ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC/server.hz/100;
    timelimit_exit = 0;
    if (timelimit <= 0) timelimit = 1;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST)
        timelimit = ACTIVE_EXPIRE_CYCLE_FAST_DURATION; /* in microseconds. */

    /* Accumulate some global stats as we expire keys, to have some idea
     * about the number of keys that are already logically expired, but still
     * existing inside the database. */
    long total_sampled = 0;
    long total_expired = 0;

    for (j = 0; j < dbs_per_call && timelimit_exit == 0; j++) {
        int expired;
        redisDb *db = server.db+(current_db % server.dbnum);

        /* Increment the DB now so we are sure if we run out of time
         * in the current DB we'll restart from the next. This allows to
         * distribute the time evenly across DBs. */
        current_db++;

        /* Continue to expire if at the end of the cycle more than 25%
         * of the keys were expired. */
        do {
            unsigned long num, slots;
            long long now, ttl_sum;
            int ttl_samples;
            iteration++;

            /* If there is nothing to expire try next DB ASAP. */
            if ((num = dictSize(db->expires)) == 0) {
                db->avg_ttl = 0;
                break;
            }
            slots = dictSlots(db->expires);
            now = mstime();

            /* When there are less than 1% filled slots getting random
             * keys is expensive, so stop here waiting for better times...
             * The dictionary will be resized asap. */
            if (num && slots > DICT_HT_INITIAL_SIZE &&
                (num*100/slots < 1)) break;

            /* The main collection cycle. Sample random keys among keys
             * with an expire set, checking for expired ones. */
            expired = 0;
            ttl_sum = 0;
            ttl_samples = 0;

            if (num > ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP)
                num = ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP;

            while (num--) {
                dictEntry *de;
                long long ttl;

                if ((de = dictGetRandomKey(db->expires)) == NULL) break;
                ttl = dictGetSignedIntegerVal(de)-now;
                if (activeExpireCycleTryExpire(db,de,now)) expired++;
                if (ttl > 0) {
                    /* We want the average TTL of keys yet not expired. */
                    ttl_sum += ttl;
                    ttl_samples++;
                }
                total_sampled++;
            }
            total_expired += expired;

            /* Update the average TTL stats for this database. */
            if (ttl_samples) {
                long long avg_ttl = ttl_sum/ttl_samples;

                /* Do a simple running average with a few samples.
                 * We just use the current estimate with a weight of 2%
                 * and the previous estimate with a weight of 98%. */
                if (db->avg_ttl == 0) db->avg_ttl = avg_ttl;
                db->avg_ttl = (db->avg_ttl/50)*49 + (avg_ttl/50);
            }

            /* We can't block forever here even if there are many keys to
             * expire. So after a given amount of milliseconds return to the
             * caller waiting for the other active expire cycle. */
            if ((iteration & 0xf) == 0) { /* check once every 16 iterations. */
                elapsed = ustime()-start;
                if (elapsed > timelimit) {
                    timelimit_exit = 1;
                    server.stat_expired_time_cap_reached_count++;
                    break;
                }
            }
            /* We don't repeat the cycle if there are less than 25% of keys
             * found expired in the current DB. */
        } while (expired > ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP/4);
    }

    elapsed = ustime()-start;
    latencyAddSampleIfNeeded("expire-cycle",elapsed/1000);

    /* Update our estimate of keys existing but yet to be expired.
     * Running average with this sample accounting for 5%. */
    double current_perc;
    if (total_sampled) {
        current_perc = (double)total_expired/total_sampled;
    } else
        current_perc = 0;
    server.stat_expired_stale_perc = (current_perc*0.05)+
                                     (server.stat_expired_stale_perc*0.95);
}

定期过期 源码：server.h

/* Redis database representation. There are multiple databases identified
 * by integers from 0 (the default database) up to the max configured
 * database. The database number is the 'id' field in the structure. */
typedef struct redisDb {
    dict *dict;                 /* 所有的健值对 */
    dict *expires;              /* 设置了过期时间的键值对 */
    dict *blocking_keys;        /* Keys with clients waiting for data (BLPOP)*/
    dict *ready_keys;           /* Blocked keys that received a PUSH */
    dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
    int id;                     /* Database ID */
    long long avg_ttl;          /* Average TTL, just for stats */
    list *defrag_later;         /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;

每隔一定的时间，会扫描一定数量的数据库的expires字典中一定数量的key，并清除其中已过期的key。该策略是前两者的一个折中方案。通过调整定时扫描的时间间隔和每次扫描的限定耗时，可以在不同情况下使得CPU和内存资源达到最优的平衡效果。

Redis中同时使用了惰性过期和定期过期两种过期策略。

如果都不过期，Redis内存满了怎么办？

写入提示OOM错误信息，不影响读取。

淘汰策略

Redis的内存淘汰策略，是指当内存使用达到最大内存极限时，需要使用淘汰算法来决定清理掉哪些数据，以保证新数据的存入。

redis.conf

#maxmemory-policy noeviction

#volatile-lru -> EvictusingapproximatedLRUamongthekeyswithanexpireset.
#allkeys-lru -> EvictanykeyusingapproximatedLRU.
#volatile-lfu -> EvictusingapproximatedLFUamongthekeyswithanexpireset.
#allkeys-lfu -> EvictanykeyusingapproximatedLFU.
#volatile-random -> Removearandomkeyamongtheoneswithanexpireset.
#allkeys-random -> Removearandomkey,anykey.
#volatile-ttl -> Removethekeywiththenearestexpiretime(minorTTL)
#noeviction -> Don'tevictanything,justreturnanerroronwriteoperations.

LRU，LeastRecentlyUsed：最近最少使用。判断最近被使用的时间，目前最远的数据优先被淘汰。

LFU，LeastFrequentlyUsed，最不常用，4.0版本新增。

策略	含义
volatile-lru	根据LRU算法删除设置了超时属性（expire）的键，直到腾出足够内存为止。如果没有可删除的键对象，回退到noeviction策略。
allkeys-lru	根据LRU算法删除键，不管数据有没有设置超时属性，直到腾出足够内存为止。
volatile-lfu	在带有过期时间的键中选择最不常用的。
allkeys-lfu	在所有的键中选择最不常用的，不管数据有没有设置超时属性。
volatile-random	在带有过期时间的键中随机选择。
allkeys-random	随机删除所有键，直到腾出足够内存为止。
volatile-ttl	根据键值对象的ttl属性，删除最近将要过期数据。如果没有，回退到noeviction策略。
noeviction	默认策略，不会删除任何数据，拒绝所有写入操作并返回客户端错误信息（error）OOMcommandnotallowedwhenusedmemory，此时Redis只响应读操作。

如果没有符合前提条件的key被淘汰，那么volatile-lru、volatile-random、volatile-ttl相当于noeviction（不做内存回收）。

建议使用volatile-lru，在保证正常服务的情况下，优先删除最近最少使用的key。

如果基于传统LRU算法实现RedisLRU会有什么问题？

需要额外的数据结构存储，消耗内存。

RedisLRU对传统的LRU算法进行了改良，通过随机采样来调整算法的精度。

如果淘汰策略是LRU，则根据配置的采样值maxmemory_samples（默认是5个）,随机从数据库中选择m个key,淘汰其中热度最低的key对应的缓存数据。所以采样参数m配置的数值越大,就越能精确的查找到待淘汰的缓存数据,但是也消耗更多的CPU计算,执行效率降低。

如何找出热度最低的数据？

Redis中所有对象结构都有一个lru字段,且使用了unsigned的低24位，这个字段用来记录对象的热度。对象被创建时会记录lru值。在被访问的时候也会更新lru的值。但是不是获取系统当前的时间戳，而是设置为全局变量server.lruclock的值。

源码：server.h

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
} robj;

server.lruclock的值怎么来的？

Redis中有个定时处理的函数serverCron，默认每100毫秒调用函数updateCachedTime更新一次全局变量的server.lruclock的值，它记录的是当前unix时间戳。

源码：server.c

/* We take a cached value of the unix time in the global state because with
 * virtual memory and aging there is to store the current time in objects at
 * every object access, and accuracy is not needed. To access a global var is
 * a lot faster than calling time(NULL) */
void updateCachedTime(void) {
    time_t unixtime = time(NULL);
    atomicSet(server.unixtime,unixtime);
    server.mstime = mstime();

    /* To get information about daylight saving time, we need to call localtime_r
     * and cache the result. However calling localtime_r in this context is safe
     * since we will never fork() while here, in the main thread. The logging
     * function will call a thread safe version of localtime that has no locks. */
    struct tm tm;
    localtime_r(&server.unixtime,&tm);
    server.daylight_active = tm.tm_isdst;
}

为什么不获取精确的时间而是放在全局变量中？不会有延迟的问题吗？

这样函数lookupKey中更新数据的lru热度值时,就不用每次调用系统函数time，可以提高执行效率。

OK，当对象里面已经有了LRU字段的值，就可以评估对象的热度了。

函数estimateObjectIdleTime评估指定对象的lru热度，思想就是对象的lru值和全局的server.lruclock的差值越大（越久没有得到更新），该对象热度越低。

源码 evict.c

/* Given an object returns the min number of milliseconds the object was never
 * requested, using an approximated LRU algorithm. */
unsigned long long estimateObjectIdleTime(robj *o) {
    unsigned long long lruclock = LRU_CLOCK();
    if (lruclock >= o->lru) {
        return (lruclock - o->lru) * LRU_CLOCK_RESOLUTION;
    } else {
        return (lruclock + (LRU_CLOCK_MAX - o->lru)) *
                    LRU_CLOCK_RESOLUTION;
    }
}

server.lruclock只有24位，按秒为单位来表示才能存储194天。当超过24bit能表示的最大时间的时候，它会从头开始计算。

server.h

#define LRU_CLOCK_MAX ((1<<LRU_BITS)-1) /* Max value of obj->lru */

在这种情况下，可能会出现对象的lru大于server.lruclock的情况，如果这种情况出现那么就两个相加而不是相减来求最久的key。

为什么不用常规的哈希表+双向链表的方式实现？需要额外的数据结构，消耗资源。而RedisLRU算法在sample为10的情况下，已经能接近传统LRU算法了。

除了消耗资源之外，传统LRU还有什么问题？

如图，假设A在10秒内被访问了5次，而B在10秒内被访问了3次。因为B最后一次被访问的时间比A要晚，在同等的情况下，A反而先被回收。

LFU

server.h

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
} robj;

当这24bits用作LFU时，其被分为两部分：

高16位用来记录访问时间（单位为分钟，ldt，lastdecrementtime）
低8位用来记录访问频率，简称counter（logc，logisticcounter）

counter是用基于概率的对数计数器实现的，8位可以表示百万次的访问频率。

对象被读写的时候，lfu的值会被更新。db.c——lookupKey

/* Update LFU when an object is accessed.
 * Firstly, decrement the counter if the decrement time is reached.
 * Then logarithmically increment the counter, and update the access time. */
void updateLFU(robj *val) {
    unsigned long counter = LFUDecrAndReturn(val);
    counter = LFULogIncr(counter);
    val->lru = (LFUGetTimeInMinutes()<<8) | counter;
}

增长的速率由，lfu-log-factor越大，counter增长的越慢

redis.conf配置文件

#lfu-log-factor10

如果计数器只会递增不会递减，也不能体现对象的热度。没有被访问的时候，计数器怎么递减呢？

减少的值由衰减因子lfu-decay-time（分钟）来控制，如果值是1的话，N分钟没有访问就要减少N。

#lfu-decay-time 1

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

如有侵权请联系 cloudcommunity@tencent.com 删除

云数据库 Redis

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

登录后参与评论

0 条评论

热度