深入理解Redis数据淘汰策略

　　在 redis 中，允许用户设置最大使用内存大小 server.maxmemory，在内存限定的情况下是很有用的。譬如，在一台 8G 机子上部署了 4 个 redis 服务点，每一个服务点分配 1.5G 的内存大小，减少内存紧张的情况，由此获取更为稳健的服务。
　　redis中当内存超过限制时，按照配置的策略，淘汰掉相应的kv，使得内存可以继续留有足够的空间保存新的数据。redis 确定驱逐某个键值对后，会删除这个数据并，并将这个数据变更消息发布到本地（AOF 持久化）和从机（主从连接）。
　　redis的conf文件中有对该机制的一份很好的解释：　　

# Don't use more memory than the specified amount of bytes.
# When the memory limit is reached Redis will try to remove keys
# accordingly to the eviction policy selected (see maxmemmory-policy).
#
# If Redis can't remove keys according to the policy, or if the policy is
# set to 'noeviction', Redis will start to reply with errors to commands
# that would use more memory, like SET, LPUSH, and so on, and will continue
# to reply to read-only commands like GET.
#
# This option is usually useful when using Redis as an LRU cache, or to set
# an hard memory limit for an instance (using the 'noeviction' policy).
#
# WARNING: If you have slaves attached to an instance with maxmemory on,
# the size of the output buffers needed to feed the slaves are subtracted
# from the used memory count, so that network problems / resyncs will
# not trigger a loop where keys are evicted, and in turn the output
# buffer of slaves is full with DELs of keys evicted triggering the deletion
# of more keys, and so forth until the database is completely emptied.
#
# In short... if you have slaves attached it is suggested that you set a lower
# limit for maxmemory so that there is some free RAM on the system for slave
# output buffers (but this is not needed if the policy is 'noeviction').
#
# maxmemory <bytes>

注意：在redis按照master-slave使用时，其maxmeory应设置的比实际物理内存稍小一些，给slave output buffer留有足够的空间。

　　redis 提供 6种数据淘汰策略：　　

# maxmemory <bytes>

# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory
# is reached. You can select among five behaviors:
#
# volatile-lru -> remove the key with an expire set using an LRU algorithm
# allkeys-lru -> remove any key accordingly to the LRU algorithm
# volatile-random -> remove a random key with an expire set
# allkeys-random -> remove a random key, any key
# volatile-ttl -> remove the key with the nearest expire time (minor TTL)
# noeviction -> don't expire at all, just return an error on write operations
#
# Note: with any of the above policies, Redis will return an error on write
# operations, when there are not suitable keys for eviction.
#
# At the date of writing this commands are: set setnx setex append
# incr decr rpush lpush rpushx lpushx linsert lset rpoplpush sadd
# sinter sinterstore sunion sunionstore sdiff sdiffstore zadd zincrby
# zunionstore zinterstore hset hsetnx hmset hincrby incrby decrby
# getset mset msetnx exec sort
#
# The default is:
#
# maxmemory-policy noeviction

# LRU and minimal TTL algorithms are not precise algorithms but approximated
# algorithms (in order to save memory), so you can tune it for speed or
# accuracy. For default Redis will check five keys and pick the one that was
# used less recently, you can change the sample size using the following
# configuration directive.
#
# The default of 5 produces good enough results. 10 Approximates very closely
# true LRU but costs a bit more CPU. 3 is very fast but not very accurate.
#
# maxmemory-samples 5

volatile-lru:从设置了过期时间的数据集中，选择最近最久未使用的数据释放；
allkeys-lru:从数据集中(包括设置过期时间以及未设置过期时间的数据集中)，选择最近最久未使用的数据释放；
volatile-random:从设置了过期时间的数据集中，随机选择一个数据进行释放；
allkeys-random:从数据集中(包括了设置过期时间以及未设置过期时间)随机选择一个数据进行入释放；
volatile-ttl：从设置了过期时间的数据集中，选择马上就要过期的数据进行释放操作；
noeviction：不删除任意数据(但redis还会根据引用计数器进行释放),这时如果内存不够时，会直接返回错误。

　　默认的内存策略是noeviction，在Redis中LRU算法是一个近似算法，默认情况下，Redis随机挑选5个键，并且从中选取一个最近最久未使用的key进行淘汰，在配置文件中可以通过maxmemory-samples的值来设置redis需要检查key的个数,但是栓查的越多，耗费的时间也就越久,但是结构越精确(也就是Redis从内存中淘汰的对象未使用的时间也就越久~),设置多少，综合权衡。

其缓存管理功能，由redis.c文件中的freeMemoryIfNeeded函数实现。如果maxmemory被设置，则在每次进行命令执行之前，该函数均被调用，用以判断是否有足够内存可用，释放内存或返回错误。如果没有找到足够多的内存，程序主逻辑将会阻止设置了REDIS_COM_DENYOOM flag的命令执行，对其返回command not allowed when used memory > ‘maxmemory’的错误消息。

int freeMemoryIfNeeded(void) {
    size_t mem_used, mem_tofree, mem_freed;
int slaves = listLength(server.slaves);

/* Remove the size of slaves output buffers and AOF buffer from the
 * count of used memory. */ 计算占用内存大小时，并不计算slave output buffer和aof buffer，因此maxmemory应该比实际内存小，为这两个buffer留足空间。
    mem_used = zmalloc_used_memory();
if (slaves) {
        listIter li;
        listNode *ln;

        listRewind(server.slaves,&li);
while((ln = listNext(&li))) {
            redisClient *slave = listNodeValue(ln);
            unsigned long obuf_bytes = getClientOutputBufferMemoryUsage(slave);
if (obuf_bytes > mem_used)
                mem_used = 0;
else
                mem_used -= obuf_bytes;
        }
    }
if (server.appendonly) {
        mem_used -= sdslen(server.aofbuf);
        mem_used -= sdslen(server.bgrewritebuf);
    }

/* Check if we are over the memory limit. */
if (mem_used <= server.maxmemory) return REDIS_OK;

if (server.maxmemory_policy == REDIS_MAXMEMORY_NO_EVICTION)
return REDIS_ERR; /* We need to free memory, but policy forbids. */

/* Compute how much memory we need to free. */
    mem_tofree = mem_used - server.maxmemory;
    mem_freed = 0;
while (mem_freed < mem_tofree) {
int j, k, keys_freed = 0;

for (j = 0; j < server.dbnum; j++) {
long bestval = 0; /* just to prevent warning */
            sds bestkey = NULL;
            struct dictEntry *de;
            redisDb *db = server.db+j;
            dict *dict;

if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM)
            {
                dict = server.db[j].dict;
            } else {
                dict = server.db[j].expires;
            }
if (dictSize(dict) == 0) continue;

/* volatile-random and allkeys-random policy */
if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM ||
server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_RANDOM)
            {
                de = dictGetRandomKey(dict);
                bestkey = dictGetEntryKey(de);
            }//如果是random delete,则从dict中随机选一个key

/* volatile-lru and allkeys-lru policy */
else if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
            {
for (k = 0; k < server.maxmemory_samples; k++) {
                    sds thiskey;
long thisval;
                    robj *o;

                    de = dictGetRandomKey(dict);
                    thiskey = dictGetEntryKey(de);
/* When policy is volatile-lru we need an additonal lookup
 * to locate the real key, as dict is set to db->expires. */
if (server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
                        de = dictFind(db->dict, thiskey); //因为dict->expires维护的数据结构里并没有记录该key的最后访问时间
                    o = dictGetEntryVal(de);
                    thisval = estimateObjectIdleTime(o);

/* Higher idle time is better candidate for deletion */
if (bestkey == NULL || thisval > bestval) {
                        bestkey = thiskey;
                        bestval = thisval;
                    }
                }//为了减少运算量,redis的lru算法和expire淘汰算法一样，都是非最优解，lru算法是在相应的dict中，选择maxmemory_samples(默认设置是3)份key，挑选其中lru的，进行淘汰
            }

/* volatile-ttl */
else if (server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_TTL) {
for (k = 0; k < server.maxmemory_samples; k++) {
                    sds thiskey;
long thisval;

                    de = dictGetRandomKey(dict);
                    thiskey = dictGetEntryKey(de);
                    thisval = (long) dictGetEntryVal(de);

/* Expire sooner (minor expire unix timestamp) is better
 * candidate for deletion */
if (bestkey == NULL || thisval < bestval) {
                        bestkey = thiskey;
                        bestval = thisval;
                    }
                }//注意ttl实现和上边一样，都是挑选出maxmemory_samples份进行挑选
            }

/* Finally remove the selected key. */
if (bestkey) {
long long delta;

                robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
                propagateExpire(db,keyobj); //将del命令扩散给slaves
/* We compute the amount of memory freed by dbDelete() alone.
 * It is possible that actually the memory needed to propagate
 * the DEL in AOF and replication link is greater than the one
 * we are freeing removing the key, but we can't account for
 * that otherwise we would never exit the loop.
 *
 * AOF and Output buffer memory will be freed eventually so
 * we only care about memory used by the key space. */
                delta = (long long) zmalloc_used_memory();
                dbDelete(db,keyobj);
                delta -= (long long) zmalloc_used_memory();
                mem_freed += delta;
server.stat_evictedkeys++;
                decrRefCount(keyobj);
                keys_freed++;

/* When the memory to free starts to be big enough, we may
 * start spending so much time here that is impossible to
 * deliver data to the slaves fast enough, so we force the
 * transmission here inside the loop. */
if (slaves) flushSlavesOutputBuffers();
            }
        }//在所有的db中遍历一遍，然后判断删除的key释放的空间是否足够
if (!keys_freed) return REDIS_ERR; /* nothing to free... */
    }
return REDIS_OK;
}

注意：此函数是在执行特定命令之前进行调用的，并且在当前占用内存低于限制后即返回OK。因此可能在后续执行命令后，redis占用的内存就超过了maxmemory的限制。因此,maxmemory是redis执行命令所需保证的最大内存占用，而非redis实际的最大内存占用。（在不考虑slave buffer和aof buffer的前提下）

LRU 数据淘汰机制
　　在服务器配置中保存了 lru 计数器 server.lrulock，会定时（redis 定时程序 serverCorn()）更新，server.lrulock 的值是根据 server.unixtime 计算出来的。
　　另外，从 struct redisObject 中可以发现，每一个 redis 对象都会设置相应的 lru。可以想象的是，每一次访问数据的时候，会更新 redisObject.lru。
　　LRU 数据淘汰机制是这样的：　　

在数据集中随机挑选几个键值对，取出其中 lru 最小的键值对淘汰。所以，你会发现，redis
并不是保证取得所有数据集中最近最少使用（LRU）的键值对，而只是随机挑选的几个键值对中的。

在redis.h中声明的redisObj定义的如下:

#define REDIS_LRU_BITS 24
#define REDIS_LRU_CLOCK_MAX ((1<<REDIS_LRU_BITS)-1) /* Max value of obj->lru */
#define REDIS_LRU_CLOCK_RESOLUTION 1000 /* LRU clock resolution in ms */
typedef struct redisObject {<br>　　//存放的对象类型
unsigned type:4;
//内容编码
unsigned encoding:4;
//与server.lruclock的时间差值
unsigned lru:REDIS_LRU_BITS; /* lru time (relative to server.lruclock) */\
//引用计数算法使用的引用计数器
int refcount;
//数据指针
void *ptr;
} robj;

　　从redisObject结构体的定义中可以看出，在Redis中存放的对象不仅会有一个引用计数器，还会存在一个server.lruclock,这个变量会在定时器中每次刷新时，调用getLRUClock获取当前系统的毫秒数，作为LRU时钟数,该计数器总共占用24位,最大可以表示的值为24个1，即((1<< REDIS_LRU_BITS) - 1)=2^24 - 1,单位是毫秒，你可以算一下这么多毫秒，可以表示多少年~~
　　server.lruclock在redis.c中运行的定时器中进行更新操作,代码如下(redis.c中的定时器被配置中100ms执行一次)

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    .....
    run_with_period(100) trackOperationsPerSecond();

    /* We have just REDIS_LRU_BITS bits per object for LRU information.
     * So we use an (eventually wrapping) LRU clock.
     *
     * Note that even if the counter wraps it's not a big problem,
     * everything will still work but some object will appear younger
     * to Redis. However for this to happen a given object should never be
     * touched for all the time needed to the counter to wrap, which is
     * not likely.
     *
     * Note that you can change the resolution altering the
     * REDIS_LRU_CLOCK_RESOLUTION define. */
    server.lruclock = getLRUClock();
    ....
 return 1000/server.hz;
}

　　看到这，再看看Redis中创建对象时，如何对redisObj中的unsigned lru进行赋值操作的，代码位于object.c中,如下所示

robj *createObject(int type, void *ptr) {
    robj *o = zmalloc(sizeof(*o));
    o->type = type;
    o->encoding = REDIS_ENCODING_RAW;
    o->ptr = ptr;
    o->refcount = 1;
//很关键的一步，Redis中创建的每一个对象，都记录下该对象的LRU时钟
/* Set the LRU to the current lruclock (minutes resolution). */
    o->lru = LRU_CLOCK();
return o;
}

　　该代码中最为关键的一句就是o->lru=LRU_CLOCK(),这是一个定义，看一下这个宏定义的实现,代码如下所示

#define LRU_CLOCK() ((1000/server.hz <= REDIS_LRU_CLOCK_RESOLUTION) ? server.lruclock : getLRUClock())

　　其中REDIS_LRU_CLOCK_RESOLUTION为1000,可以自已在配置文件中进行配置，表示的是LRU算法的精度,在这里我们就可以看到server.lruclock的用处了，如果定时器执行的频率高于LRU算法的精度时，可以直接将server.lruclock直接在对象创建时赋值过去，避免了函数调用的内存开销以及时间开销~
　　有了上述的基础，下面就是最为关键的部份了，REDIS中LRU算法，这里以volatile-lru为例(选择有过期时间的数据集进行淘汰)，在Redis中命令的处理时，会调用processCommand函数，在ProcessCommand函数中，当在配置文件中配置了maxmemory时，会调用freeMemoryIfNeeded函数，释放不用的内存空间,
　　以下是freeMemoryIfNeeded函数的关于LRU相关部份的源代码，其他代码类似　

//不同的策略，操作的数据集不同
if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM)
{
    dict = server.db[j].dict;
} else {//操作的是设置了过期时间的key集
    dict = server.db[j].expires;
}
if (dictSize(dict) == 0) continue;

/* volatile-random and allkeys-random policy */
//随机选择进行淘汰
if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM ||
server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_RANDOM)
{
    de = dictGetRandomKey(dict);
    bestkey = dictGetKey(de);
}

/* volatile-lru and allkeys-lru policy */
//具体的LRU算法
else if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
{
    struct evictionPoolEntry *pool = db->eviction_pool;

while(bestkey == NULL) {
//选择随机样式，并从样本中作用LRU算法选择需要淘汰的数据
        evictionPoolPopulate(dict, db->dict, db->eviction_pool);
/* Go backward from best to worst element to evict. */
for (k = REDIS_EVICTION_POOL_SIZE-1; k >= 0; k--) {
if (pool[k].key == NULL) continue;
            de = dictFind(dict,pool[k].key);
            sdsfree(pool[k].key);
//将pool+k+1之后的元素向前平移一个单位
            memmove(pool+k,pool+k+1,
                sizeof(pool[0])*(REDIS_EVICTION_POOL_SIZE-k-1));
/* Clear the element on the right which is empty
 * since we shifted one position to the left. */
            pool[REDIS_EVICTION_POOL_SIZE-1].key = NULL;
            pool[REDIS_EVICTION_POOL_SIZE-1].idle = 0;
//选择了需要淘汰的数据
if (de) {
                bestkey = dictGetKey(de);
break;
            } else {
/* Ghost... */
continue;
            }
        }
    }
}

　　看了上面的代码，也许你还在奇怪，说好的，LRU算法去哪去了呢，再看看这个函数evictionPoolPopulate的实现吧　　

#define EVICTION_SAMPLES_ARRAY_SIZE 16
void evictionPoolPopulate(dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
int j, k, count;
//EVICTION_SAMPLES_ARRAY_SIZE最大样本数，默认16
    dictEntry *_samples[EVICTION_SAMPLES_ARRAY_SIZE];
    dictEntry **samples;
//如果我们在配置文件中配置的samples小于16，则直接使用EVICTION_SAMPLES_ARRAY_SIZE
if (server.maxmemory_samples <= EVICTION_SAMPLES_ARRAY_SIZE) {
        samples = _samples;
    } else {
        samples = zmalloc(sizeof(samples[0])*server.maxmemory_samples);
    }

#if 1 /* Use bulk get by default. */
//从样本集中随机获取server.maxmemory_samples个数据，存放在
count = dictGetRandomKeys(sampledict,samples,server.maxmemory_samples);
#else
count = server.maxmemory_samples;
for (j = 0; j < count; j++) samples[j] = dictGetRandomKey(sampledict);
#endif

for (j = 0; j < count; j++) {
        unsigned long long idle;
        sds key;
        robj *o;
        dictEntry *de;
        de = samples[j];
        key = dictGetKey(de);
if (sampledict != keydict) de = dictFind(keydict, key);
        o = dictGetVal(de);
//计算LRU时间
        idle = estimateObjectIdleTime(o);
        k = 0;
//选择de在pool中的正确位置,按升序进行排序,升序的依据是其idle时间
while (k < REDIS_EVICTION_POOL_SIZE &&
               pool[k].key &&
               pool[k].idle < idle) k++;
if (k == 0 && pool[REDIS_EVICTION_POOL_SIZE-1].key != NULL) {
/* Can't insert if the element is < the worst element we have
 * and there are no empty buckets. */
continue;
        } else if (k < REDIS_EVICTION_POOL_SIZE && pool[k].key == NULL) {
/* Inserting into empty position. No setup needed before insert. */
        } else {
//移动元素，memmove,还有空间可以插入新元素
if (pool[REDIS_EVICTION_POOL_SIZE-1].key == NULL) {
                memmove(pool+k+1,pool+k,
                    sizeof(pool[0])*(REDIS_EVICTION_POOL_SIZE-k-1));
            } else {//已经没有空间插入新元素时，将第一个元素删除
/* No free space on right? Insert at k-1 */
                k--;
/* Shift all elements on the left of k (included) to the
 * left, so we discard the element with smaller idle time. */
//以下操作突出了第K个位置
                sdsfree(pool[0].key);
                memmove(pool,pool+1,sizeof(pool[0])*k);
            }
        }
//在第K个位置插入
        pool[k].key = sdsdup(key);
        pool[k].idle = idle;
    }
//执行到此之后，pool中存放的就是按idle time升序排序
if (samples != _samples) zfree(samples);
}

　　看了上面的代码，LRU时钟的计算并没有包括在内，那么在看一下LRU算法的时钟计算代码吧,LRU时钟计算代码在object.c中的estimateObjectIdleTime这个函数中，代码如下~~
　　

//精略估计LRU时间
unsigned long long estimateObjectIdleTime(robj *o) {
unsigned long long lruclock = LRU_CLOCK();
if (lruclock >= o->lru) {
return (lruclock - o->lru) * REDIS_LRU_CLOCK_RESOLUTION;
    } else {//这种情况一般不会发生，发生时证明redis中键的保存时间已经wrap了
return (lruclock + (REDIS_LRU_CLOCK_MAX - o->lru)) *
                    REDIS_LRU_CLOCK_RESOLUTION;
    }
}

TTL 数据淘汰机制
　　redis 数据集数据结构中保存了键值对过期时间的表，即 redisDb.expires。和 LRU 数据淘汰机制类似。
　　TTL 数据淘汰机制是这样的：　　

从过期时间的表中随机挑选几个键值对，取出其中 ttl 最大的键值对淘汰。同样你会发现，redis
并不是保证取得所有过期时间的表中最快过期的键值对，而只是随机挑选的几个键值对中的。

参考引用

秒客网

深入理解Redis数据淘汰策略

相关文章