在 redis 中,允许用户设置最大使用内存大小 server.maxmemory,redis 内存数据集大小达到设置的最大使用内存大小后,就会施行数据淘汰策略。redis 提供 6种数据淘汰策:
# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory
# is reached? You can select among five behavior:
#
# volatile-lru -> remove the key with an expire set using an LRU algorithm
# allkeys-lru -> remove any key accordingly to the LRU algorithm
# volatile-random -> remove a random key with an expire set
# allkeys->random -> remove a random key, any key
# volatile-ttl -> remove the key with the nearest expire time (minor TTL)
# noeviction -> don't expire at all, just return an error on write operations
如果使用volatile-lru, volatile-random 和 volatile-ttl这三种策略时没有找到符合条件的key,那么默认动作与noeviction相同
redis 根据以上策略确定驱逐某个键值对后,会删除这个数据并,并将这个数据变更消息发布到本地(AOF 持久化)和从机(主从连接)
LRU数据淘汰机制
与LRU机制相关的结构和变量:
1: // redisServer 保存了 lru 计数器2: struct redisServer {
3: ...4: unsigned lruclock:22; /* Clock incrementing every minute, for LRU */5: ...6: };7:8: // 每一个 redis 对象都保存了 lru
9: #define REDIS_LRU_CLOCK_MAX ((1<<21)-1) /* Max value of obj->lru */
10: #define REDIS_LRU_CLOCK_RESOLUTION 10 /* LRU clock resolution in seconds */
11: typedef struct redisObject {12: // 刚刚好 32 bits
13:14: // 对象的类型,字符串/列表/集合/哈希表
15: unsigned type:4;
16: // 未使用的两个位
17: unsigned notused:2; /* Not used */18: // 编码的方式,redis 为了节省空间,提供多种方式来保存一个数据
19: // 譬如:“123456789” 会被存储为整数 123456789
20: unsigned encoding:4;
21: unsigned lru:22; /* lru time (relative to server.lruclock) */22:23: // 引用数
24: int refcount;
25:26: // 数据指针
27: void *ptr;
28: } robj;
在服务器配置中保存了 lru 计数器 server.lrulock,会定时(redis 定时程序 serverCorn())将其更新
1: int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {2: /* We take a cached value of the unix time in the global state because
3: * with virtual memory and aging there is to store the current time4: * in objects at every object access, and accuracy is not needed.5: * To access a global var is faster than calling time(NULL) */6: // 将 UNIX 时间保存在服务器状态中,减少对 time(NULL) 的调用,加速。
7: server.unixtime = time(NULL);
8:9: // 对执行命令的时间进行采样分析
10: run_with_period(100) trackOperationsPerSecond();11:12: /* We have just 22 bits per object for LRU information.
13: * So we use an (eventually wrapping) LRU clock with 10 seconds resolution.14: * 2^22 bits with 10 seconds resoluton is more or less 1.5 years.15: *16: * Note that even if this will wrap after 1.5 years it's not a problem,17: * everything will still work but just some object will appear younger18: * to Redis. But for this to happen a given object should never be touched19: * for 1.5 years.20: *21: * Note that you can change the resolution altering the22: * REDIS_LRU_CLOCK_RESOLUTION define.23: */24: // 更新服务器的 LRU 时间
25: updateLRUClock();26: //................................
27: }28:29:30:31: /*
32: * 更新服务器的 LRU 时间33: */34: void updateLRUClock(void) {35: server.lruclock = (server.unixtime/REDIS_LRU_CLOCK_RESOLUTION) &36: REDIS_LRU_CLOCK_MAX;37: }
其中用到的两个宏:
以下是对lru的说明:1: /* The actual Redis Object */2: #define REDIS_LRU_CLOCK_MAX ((1<<21)-1) /* Max value of obj->lru */
3: #define REDIS_LRU_CLOCK_RESOLUTION 10 /* LRU clock resolution in seconds */
We have just 22 bits per object for LRU information.
* So we use an (eventually wrapping) LRU clock with 10 seconds resolution.
* 2^22 bits with 10 seconds resolution is more or less 1.5 years.
数据淘汰具体实现
在每次执行客户端的命令时都会检查内存使用是否已经达到最大限制大小,即:server.maxmemory,一旦达到就选择适当的策略进行数据淘汰
首先在processCommand中判断是否设置了server.maxmemory:
1: // 执行客户端 C 的命令2: int processCommand(redisClient *c) {
3: /* Handle the maxmemory directive.
4: *5: * First we try to free some memory if possible (if there are volatile6: * keys in the dataset). If there are not the only thing we can do7: * is returning an error. */8: // 如果内存不足,尝试释放无用内存
9: // 如果命令可能占用大量内存,而且释放内存失败的话,
10: // 那么抛出错误命令
11: if (server.maxmemory) {
12: int retval = freeMemoryIfNeeded();
13: if ((c->cmd->flags & REDIS_CMD_DENYOOM) && retval == REDIS_ERR) {
14: flagTransaction(c);15: addReply(c, shared.oomerr);16: return REDIS_OK;
17: }18: }19: // ...............................
20: }
如果设置了此选项,则调用函数freeMemoryIfNeeded()进行数据淘汰:
1:2: /* ============================ Maxmemory directive ======================== */
3:4: /* This function gets called when 'maxmemory' is set on the config file to limit
5: * the max memory used by the server, before processing a command.6: *7: * The goal of the function is to free enough memory to keep Redis under the8: * configured memory limit.9: *10: * The function starts calculating how many bytes should be freed to keep11: * Redis under the limit, and enters a loop selecting the best keys to12: * evict accordingly to the configured policy.13: *14: * If all the bytes needed to return back under the limit were freed the15: * function returns REDIS_OK, otherwise REDIS_ERR is returned, and the caller16: * should block the execution of commands that will result in more memory17: * used by the server.18: */19: // 根据算法,查找并释放数据库中的过期键,从而释放更多内存
20: int freeMemoryIfNeeded(void) {21: size_t mem_used, mem_tofree, mem_freed;22: int slaves = listLength(server.slaves);
23:24: /* 计算占用内存大小时,并不计算slave output buffer和aof buffer,
25: * 因此maxmemory应该比实际内存小,为这两个buffer留足空间 */26: mem_used = zmalloc_used_memory();27: /*从机回复空间大小*/
28: if (slaves) {
29: listIter li;30: listNode *ln;31:32: listRewind(server.slaves,&li);33: while((ln = listNext(&li))) {
34: redisClient *slave = listNodeValue(ln);35: unsigned long obuf_bytes = getClientOutputBufferMemoryUsage(slave);36: if (obuf_bytes > mem_used)
37: mem_used = 0;38: else
39: mem_used -= obuf_bytes;40: }41: }42:43: // 缩小 AOF 重写缓存
44: if (server.aof_state != REDIS_AOF_OFF) {
45: mem_used -= sdslen(server.aof_buf);46: mem_used -= aofRewriteBufferSize();47: }48:49: /* 判断是否超过设置内存大小 */
50: if (mem_used <= server.maxmemory) return REDIS_OK;51:52: if (server.maxmemory_policy == REDIS_MAXMEMORY_NO_EVICTION)
53: return REDIS_ERR; /* We need to free memory, but policy forbids. */54:55: /* 需要释放的数据大小. */
56: mem_tofree = mem_used - server.maxmemory;57: mem_freed = 0;58: while (mem_freed < mem_tofree) {
59: int j, k, keys_freed = 0;
60:61: // 根据所使用的不同算法,释放数据库中过期键占用的空间
62: for (j = 0; j < server.dbnum; j++) {
63: long bestval = 0; /* just to prevent warning */64: sds bestkey = NULL;65: struct dictEntry *de;
66: redisDb *db = server.db+j;67: dict *dict;68:69: if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
70: server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM)71: {72: dict = server.db[j].dict;73: } else { /*如果选择在已经设置过期时间的数据中进行淘汰,选择不同的数据集74: dict = server.db[j].expires;75: }76: if (dictSize(dict) == 0) continue;77:78: /* volatile-random and allkeys-random policy */79: // 随机算法
80: if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM ||
81: server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_RANDOM)82: {83: de = dictGetRandomKey(dict);84: bestkey = dictGetKey(de);85: }86:87: /* volatile-lru and allkeys-lru policy */
88: // LRU 算法
89: else if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||90: server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)91: {92: //erver.maxmemory_samples 为随机挑选键值对的个数(样本个数)
93: for (k = 0; k < server.maxmemory_samples; k++) {
94: sds thiskey;95: long thisval;
96: robj *o;97: //随机选择键值对
98: de = dictGetRandomKey(dict);99: thiskey = dictGetKey(de);100: /* When policy is volatile-lru we need an additional lookup
101: * to locate the real key, as dict is set to db->expires. */102: if (server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
103: de = dictFind(db->dict, thiskey);104: o = dictGetVal(de);105: //计算空闲时间
106: thisval = estimateObjectIdleTime(o);107:108: /* 拥有较长空闲时间的键值对将被淘汰 */
109: if (bestkey == NULL || thisval > bestval) {
110: bestkey = thiskey;111: bestval = thisval;112: }113: }114: }115:116: /* volatile-ttl */
117: // TTL 算法
118: else if (server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_TTL) {119: for (k = 0; k < server.maxmemory_samples; k++) {
120: sds thiskey;121: long thisval;
122:123: de = dictGetRandomKey(dict);124: thiskey = dictGetKey(de);125: thisval = (long) dictGetVal(de);
126:127: /* Expire sooner (minor expire unix timestamp) is better
128: * candidate for deletion */129: if (bestkey == NULL || thisval < bestval) {
130: bestkey = thiskey;131: bestval = thisval;132: }133: }134: }135:136: /* Finally remove the selected key. */
137: // 移除键
138: if (bestkey) {
139: long long delta;140:141: robj *keyobj = createStringObject(bestkey,sdslen(bestkey));142: //向slave从机以及AOF持久化通知要淘汰的键值对
143: propagateExpire(db,keyobj);144: /* We compute the amount of memory freed by dbDelete() alone.
145: * It is possible that actually the memory needed to propagate146: * the DEL in AOF and replication link is greater than the one147: * we are freeing removing the key, but we can't account for148: * that otherwise we would never exit the loop.149: *150: * AOF and Output buffer memory will be freed eventually so151: * we only care about memory used by the key space. */152: delta = (long long) zmalloc_used_memory();153: dbDelete(db,keyobj);154: //计算释放的内存大小
155: delta -= (long long) zmalloc_used_memory();156: mem_freed += delta;157: server.stat_evictedkeys++;158: decrRefCount(keyobj);159: keys_freed++;160:161: //及时将slave从机回复空间里的数据发送给对方
162: if (slaves) flushSlavesOutputBuffers();
163: }164: }165: if (!keys_freed) return REDIS_ERR; /* nothing to free... */166: }167: return REDIS_OK;
168: }169:
这个函数中主要做了以下工作:
·计算实际内存大小(不要计算从机回复空间和AOF占用的内存大小,此处主要为解决之前版本死循环bug,在之后版本中解决此bugThis fixes issue)
·通过上面实际内存大小以及maxmemory计算出需要淘汰的数据大小mem_tofree
·根据所使用的不同算法,释放数据库中过期键占用的空间
3.0 中的LRU淘汰策略
在3.0之前的LRU淘汰策略中,freeMemoryIfNeeded函数是随机选择server.maxmemory_samples个样本,并在这些样本中根据lLRU策略进行数据淘汰。这是一种近似的LRU算法(Approximated LRU algorithm)。但是在3.0的版本中,为了使其准确性更加接近于真正的LRU算法,redis在每个redisDB结构中维持了一个名为eviction_pool的pool
1:2: /* Redis database representation. There are multiple databases identified
3: * by integers from 0 (the default database) up to the max configured4: * database. The database number is the 'id' field in the structure. */5: typedef struct redisDb {6:7: // 数据库键空间,保存着数据库中的所有键值对
8: dict *dict; /* The keyspace for this DB */
9:10: // 键的过期时间,字典的键为键,字典的值为过期事件 UNIX 时间戳
11: dict *expires; /* Timeout of keys with a timeout set */
12:13: // 正处于阻塞状态的键
14: dict *blocking_keys; /* Keys with clients waiting for data (BLPOP) */
15:16: // 可以解除阻塞的键
17: dict *ready_keys; /* Blocked keys that received a PUSH */
18:19: // 正在被 WATCH 命令监视的键
20: dict *watched_keys; /* WATCHED keys for MULTI/EXEC CAS */
21:22: struct evictionPoolEntry *eviction_pool; /* Eviction pool of keys */23:24: // 数据库号码
25: int id; /* Database ID */26:27: // 数据库的键的平均 TTL ,统计信息
28: long long avg_ttl; /* Average TTL, just for stats */29:30: } redisDb;
pool中的成员为evictionPoolEntry类型
1: /* To improve the quality of the LRU approximation we take a set of keys
2: * that are good candidate for eviction across freeMemoryIfNeeded() calls.3: *4: * Entries inside the eviciton pool are taken ordered by idle time, putting5: * greater idle times to the right (ascending order).6: *7: * Empty entries have the key pointer set to NULL. */8: #define REDIS_EVICTION_POOL_SIZE 169: struct evictionPoolEntry {
10: unsigned long long idle; /* 对象空闲时间大小. */11: sds key; /* Key name. */
12: };
在这个pool中,各成员按照其空闲时间大小idle进行从小到大排序
而在函数freeMemoryIfNeeded,redis不再是简单地选取几个样本来进行lru大小的比较,而是会在pool中进行更新,并且选择出最适合的淘汰数据
具体实现如下:
1: else if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||2: server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)3: { //进行eviction_pool的更新
4: struct evictionPoolEntry *pool = db->eviction_pool;
5:6: while(bestkey == NULL) {
7: evictionPoolPopulate(dict, db->dict, db->eviction_pool);8: /* 寻找pool中最适合(idle时间最大的key). */
9: for (k = REDIS_EVICTION_POOL_SIZE-1; k >= 0; k--) {
10: if (pool[k].key == NULL) continue;11: de = dictFind(dict,pool[k].key);12:13: /*删除pool中被选中的entry. */
14: sdsfree(pool[k].key);15: /* Shift all elements on its right to left. */
16: memmove(pool+k,pool+k+1,17: sizeof(pool[0])*(REDIS_EVICTION_POOL_SIZE-k-1));
18: /* Clear the element on the right which is empty
19: * since we shifted one position to the left. */20: pool[REDIS_EVICTION_POOL_SIZE-1].key = NULL;21: pool[REDIS_EVICTION_POOL_SIZE-1].idle = 0;22:23: /* If the key exists, is our pick. Otherwise it is
24: * a ghost and we need to try the next element. */25: if (de) {
26: bestkey = dictGetKey(de);27: break;
28: } else {
29: /* Ghost... */
30: continue;
31: }32: }33: }34: }
更新pool的工作在函数evictionPoolPopulate中进行
1: #define EVICTION_SAMPLES_ARRAY_SIZE 162: void evictionPoolPopulate(dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {3: int j, k, count;
4: dictEntry *_samples[EVICTION_SAMPLES_ARRAY_SIZE];5: dictEntry **samples;6:7: /* Try to use a static buffer: this function is a big hit...
8: * Note: it was actually measured that this helps. */9: if (server.maxmemory_samples <= EVICTION_SAMPLES_ARRAY_SIZE) {
10: samples = _samples;11: } else {
12: samples = zmalloc(sizeof(samples[0])*server.maxmemory_samples);
13: }14:15: #if 1 /* Use bulk get by default. */16: count = dictGetRandomKeys(sampledict,samples,server.maxmemory_samples);17: #else //随机获取maxmemory_samples个样本18: count = server.maxmemory_samples;19: for (j = 0; j < count; j++) samples[j] = dictGetRandomKey(sampledict);
20: #endif21: //将每个随机获取的样本与原pool中的成员进行比较
22: for (j = 0; j < count; j++) {
23: unsigned long long idle;24: sds key;25: robj *o;26: dictEntry *de;27:28: de = samples[j];29: key = dictGetKey(de);30: /* If the dictionary we are sampling from is not the main
31: * dictionary (but the expires one) we need to lookup the key32: * again in the key dictionary to obtain the value object. */33: if (sampledict != keydict) de = dictFind(keydict, key);
34: o = dictGetVal(de);35: //计算空闲时间
36: idle = estimateObjectIdleTime(o);37:38: //尝试将其插入到pool中,首先寻找到第一个空bucket或者比其idle小的bucket
39: k = 0;40: while (k < REDIS_EVICTION_POOL_SIZE &&
41: pool[k].key &&42: pool[k].idle < idle) k++;43: if (k == 0 && pool[REDIS_EVICTION_POOL_SIZE-1].key != NULL) {
44: /* Can't insert if the element is < the worst element we have
45: * and there are no empty buckets. */46: continue;
47: } else if (k < REDIS_EVICTION_POOL_SIZE && pool[k].key == NULL) {48: /* Inserting into empty position. No setup needed before insert. */
49: } else {
50: /* Inserting in the middle. Now k points to the first element
51: * greater than the element to insert. */52: if (pool[REDIS_EVICTION_POOL_SIZE-1].key == NULL) {
53: /* Free space on the right? Insert at k shifting
54: * all the elements from k to end to the right. */55: memmove(pool+k+1,pool+k,56: sizeof(pool[0])*(REDIS_EVICTION_POOL_SIZE-k-1));
57: } else {
58: /* No free space on right? Insert at k-1 */
59: k--;60: /* Shift all elements on the left of k (included) to the
61: * left, so we discard the element with smaller idle time. */62: sdsfree(pool[0].key);63: memmove(pool,pool+1,sizeof(pool[0])*k);
64: }65: }66: pool[k].key = sdsdup(key);67: pool[k].idle = idle;68: }69: if (samples != _samples) zfree(samples);
70: }
以上便是3.0版本中LRU淘汰策略的大致过程