在Redis主从复制架构中,如果master出现了故障,则需要人工将slave提升为master,同时,通知应用侧更新master的地址。这样方式比较低效,对应用侧影响较大。
为了解决这个问题,Redis 2.8中推出了自己的高可用方案Redis Sentinel。
Redis Sentinel架构图如下:
默认情况下,每个Sentinel节点会以每秒一次的频率对Redis节点和其它的Sentinel节点发送PING命令,并通过节点的回复来判断节点是否在线。
如果在down-after-millisecondes毫秒内,没有收到有效的回复,则会判定该节点为主观下线。
如果该节点为master,则该Sentinel节点会通过sentinel is-master-down-by-addr命令向其它sentinel节点询问对该节点的判断,如果超过<quorum>个数的节点判定master不可达,则该sentinel节点会将master判断为客观下线。
这个时候,各个Sentinel会进行协商,选举出一个领头Sentinel,由该领头Sentinel对master节点进行故障转移操作。
故障转移包含如下三个操作:
1. 在所有的slave服务器中,挑选出一个slave,并将其转换为master。
2. 让其它slave服务器,改为复制新的master。
3. 将旧master设置为新master的slave,这样,当旧的master重新上线时,它会成为新master的slave。
以上的所有操作对业务都是透明的,当新的master上线后,Sentinel会自动将这个变化实时通知给业务方。
那么,业务侧又该如何配置,才能扑捉到这个变化呢?
其实,这个主要取决于Redis客户端工具是否支持Redis Sentinel,对于支持的客户端工具来说,如Jedis,
只需将连接字符串设置为Sentinel地址即可。
下面,给出了一个测试代码,并模拟了master发生故障,业务侧是如何处理的?
代码如下:
package com.victor_02; import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashSet;
import java.util.Set; import org.apache.commons.pool2.impl.GenericObjectPoolConfig; import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisSentinelPool; public class JedisSentinelTest { public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub Set<String> sentinels = new HashSet<String>();
sentinels.add("192.168.244.10:26379");
sentinels.add("192.168.244.10:26380");
sentinels.add("192.168.244.10:26381"); JedisSentinelPool jedisSentinelPool = new JedisSentinelPool("mymaster", sentinels);
Jedis jedis = null;
while (true) {
Thread.sleep(1000); try {
jedis = jedisSentinelPool.getResource(); Date now = new Date();
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");
String format_now = dateFormat.format(now); jedis.set("hello", "world");
String value = jedis.get("hello");
System.out.println(format_now + ' ' + value);
} catch (Exception e) {
System.out.println(e);
} finally {
if (jedis != null)
try {
jedis.close();
} catch (Exception e) {
System.out.println(e);
}
}
} }
}
模拟故障:
# ./redis-cli -p
127.0.0.1:> shutdown
上述代码的输出如下:
四月 16, 2017 10:39:44 下午 redis.clients.jedis.JedisSentinelPool initSentinels
信息: Trying to find master from available Sentinels...
四月 16, 2017 10:39:44 下午 redis.clients.jedis.JedisSentinelPool initSentinels
信息: Redis master running at 192.168.244.10:6380, starting Sentinel listeners...
四月 16, 2017 10:39:44 下午 redis.clients.jedis.JedisSentinelPool initPool
信息: Created JedisPool to master at 192.168.244.10:6380
2017/04/16 22:39:45 world
2017/04/16 22:39:46 world
2017/04/16 22:39:47 world
2017/04/16 22:39:48 world
2017/04/16 22:39:49 world
2017/04/16 22:39:50 world
redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: Software caused connection abort: recv failed
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
四月 16, 2017 10:40:21 下午 redis.clients.jedis.JedisSentinelPool initPool
信息: Created JedisPool to master at 192.168.244.10:6381
四月 16, 2017 10:40:21 下午 redis.clients.jedis.JedisSentinelPool initPool
信息: Created JedisPool to master at 192.168.244.10:6381
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisException: Could not return the resource to the pool
2017/04/16 22:40:22 world
2017/04/16 22:40:23 world
2017/04/16 22:40:24 world
2017/04/16 22:40:25 world
2017/04/16 22:40:26 world
从上述输出可以看出,在master发生故障前,业务侧最后一次正常处理(22:39:50),到再次正常处理是(22:40:22),中间经过了32s。
而其中30s被用来判断master节点是否主观下线(由down-after-milliseconds来指定),整个切换的过程还是比较高效的。