在CenOS 6.7 linux环境下搭建Redis 集群环境
1、下载最新的Redis版本
本人下载的Redis版本是3.2.1版本,下载之后,解压,编译(make); 具体操作可以参考我的博文:Redis 学习笔记1:CentOS 6.7下安装Redis
编译后的redis目录在 /usr/local/redis-3.2.1
2、新建6个目录
[root@itcast01 local]# mkdir 7000 7001 7002 7003 7004 7005
7005 目录当中。
[root@itcast01 local]# cp -rf redis-3.2.1/* 7001
[root@itcast01 local]# cp -rf redis-3.2.1/* 7001
[root@itcast01 local]# cp -rf redis-3.2.1/* 7002
[root@itcast01 local]# cp -rf redis-3.2.1/* 7003
[root@itcast01 local]# cp -rf redis-3.2.1/* 7004
[root@itcast01 local]# cp -rf redis-3.2.1/* 7005
3、配置文件redis.conf
port 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
4、依次启动各个Redis服务
[root@itcast01 src]# ./redis-server ../redis.conf
5、安装Redis 集群需要的 Ruby 工具
[root@itcast01 src]# yum install ruby
[root@itcast01 src]# yum install rubygems
[root@itcast01 src]# gem install redis
6、创建Redis cluster集群
[root@itcast01 src]# ./redis-trib.rb create --replicas 1 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005
create。
[root@itcast01 src]# ./redis-trib.rb check 127.0.0.1:7000
[root@itcast01 src]# ./redis-trib.rb check 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
S: 9d81b0624b5080e5304165b07c2ef69a011ec28e 127.0.0.1:7000
slots: (0 slots) slave
replicates 88d693578dd0bdaca9e32422565c624790961bc9
S: ebbb23e3206c46eb64035af1a4381e2bb20a0a20 127.0.0.1:7004
slots: (0 slots) slave
replicates 20fbccf06841f7aa699b97bff72ece2f96599236
S: b62090a2fd65e1aa4d7053e78c1ff192bd152eb9 127.0.0.1:7005
slots: (0 slots) slave
replicates fbcce8fbcf22bb2d6b6f6f56e27b864210087213
M: fbcce8fbcf22bb2d6b6f6f56e27b864210087213 127.0.0.1:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
M: 20fbccf06841f7aa699b97bff72ece2f96599236 127.0.0.1:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
M: 88d693578dd0bdaca9e32422565c624790961bc9 127.0.0.1:7003
slots:0-5460 (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@itcast01 src]# ./redis-trib.rb info 127.0.0.1:7000
127.0.0.1:7002 (fbcce8fb...) -> 10 keys | 5461 slots | 1 slaves.
127.0.0.1:7001 (20fbccf0...) -> 6 keys | 5462 slots | 1 slaves.
127.0.0.1:7003 (88d69357...) -> 6 keys | 5461 slots | 1 slaves.
[OK] 22 keys in 3 masters.
它会把所有的master信息输出,包括这个master上有几个缓存key,几个salve ,所有master上的key合计,以及平均每个slot上有多少个key.
[root@itcast01 src]# ./redis-trib.rb help
输出如下:
[root@itcast01 src]# ./redis-trib.rb help
Usage: redis-trib <command> <options> <arguments ...> call host:port command arg arg .. arg
del-node host:port node_id
set-timeout host:port milliseconds
rebalance host:port
--threshold <arg>
--use-empty-masters
--simulate
--auto-weights
--weight <arg>
--pipeline <arg>
--timeout <arg>
help (show this help)
reshard host:port
--slots <arg>
--from <arg>
--to <arg>
--pipeline <arg>
--timeout <arg>
--yes
create host1:port1 ... hostN:portN
--replicas <arg>
info host:port
import host:port
--from <arg>
--copy
--replace
fix host:port
--timeout <arg>
add-node new_host:new_port existing_host:existing_port
--master-id <arg>
--slave
check host:port For check, fix, reshard, del-node, set-timeout you can specify the host and port of any working node in the cluster.
上面多次出现slot这个词,下面略做解释下:
7、redis-cli 客户端操作
[root@itcast01 src]# ./redis-cli -c -h localhost -p 7000
注意 -c 参数,表示进入cluster集群模式,随便添加一个缓存试试。
[root@itcast01 src]# ./redis-cli -c -h localhost -p 7000
localhost:7000> set cc 123
-> Redirected to slot [700] located at 127.0.0.1:7003
OK
127.0.0.1:7003>
注意第二行的输出,表示cc这个缓存通过计算后,落在700这个slot,最终定位在7003这个端口对应的节点上 (注:因为7000是slave,7003是master,只有master才能写入)如果是在7003上面重复上面的操作时,不会出现上面的操作。
8、FailOver测试
[root@itcast01 src]# ./redis-trib.rb check 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
S: 9d81b0624b5080e5304165b07c2ef69a011ec28e 127.0.0.1:7000
slots: (0 slots) slave
replicates 88d693578dd0bdaca9e32422565c624790961bc9
S: ebbb23e3206c46eb64035af1a4381e2bb20a0a20 127.0.0.1:7004
slots: (0 slots) slave
replicates 20fbccf06841f7aa699b97bff72ece2f96599236
S: b62090a2fd65e1aa4d7053e78c1ff192bd152eb9 127.0.0.1:7005
slots: (0 slots) slave
replicates fbcce8fbcf22bb2d6b6f6f56e27b864210087213
M: fbcce8fbcf22bb2d6b6f6f56e27b864210087213 127.0.0.1:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
M: 20fbccf06841f7aa699b97bff72ece2f96599236 127.0.0.1:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
M: 88d693578dd0bdaca9e32422565c624790961bc9 127.0.0.1:7003
slots:0-5460 (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
从输出上看出,7000是7003(88d693578dd0bdaca9e32422565c624790961bc9)的slave ,现在我们人工把7003的redis进程给kill掉,然后观察7000的终端输出:
3342:S 21 Jul 09:43:39.831 * Connecting to MASTER 127.0.0.1:7003
3342:S 21 Jul 09:43:39.831 * MASTER <-> SLAVE sync started
3342:S 21 Jul 09:43:39.831 # Error condition on socket for SYNC: Connection refused
3342:S 21 Jul 09:43:40.135 * Marking node 88d693578dd0bdaca9e32422565c624790961bc9 as failing (quorum reached).
3342:S 21 Jul 09:43:40.135 # Start of election delayed for 720 milliseconds (rank #0, offset 2241).
3342:S 21 Jul 09:43:40.135 # Cluster state changed: fail
3342:S 21 Jul 09:43:40.841 * Connecting to MASTER 127.0.0.1:7003
3342:S 21 Jul 09:43:40.841 * MASTER <-> SLAVE sync started
3342:S 21 Jul 09:43:40.841 # Error condition on socket for SYNC: Connection refused
3342:S 21 Jul 09:43:40.942 # Starting a failover election for epoch 10.
3342:S 21 Jul 09:43:40.965 # Failover election won: I'm the new master.
3342:S 21 Jul 09:43:40.965 # configEpoch set to 10 after successful failover
3342:M 21 Jul 09:43:40.965 * Discarding previously cached master state.
3342:M 21 Jul 09:43:40.966 # Cluster state changed: ok
第6行表明由于70003宕机,cluster状态已经切换到fail状态 ,第5行表示发起枚举 , 第11行表示7000端口对应的节点当选为新的master,用redis-cli check一下:
[root@itcast01 src]# ./redis-trib.rb check 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 9d81b0624b5080e5304165b07c2ef69a011ec28e 127.0.0.1:7000
slots:0-5460 (5461 slots) master
0 additional replica(s)
S: ebbb23e3206c46eb64035af1a4381e2bb20a0a20 127.0.0.1:7004
slots: (0 slots) slave
replicates 20fbccf06841f7aa699b97bff72ece2f96599236
S: b62090a2fd65e1aa4d7053e78c1ff192bd152eb9 127.0.0.1:7005
slots: (0 slots) slave
replicates fbcce8fbcf22bb2d6b6f6f56e27b864210087213
M: fbcce8fbcf22bb2d6b6f6f56e27b864210087213 127.0.0.1:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
M: 20fbccf06841f7aa699b97bff72ece2f96599236 127.0.0.1:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
9、cluster扩容
[root@itcast01 local]# mkdir 7006 7007
[root@itcast01 local]# cp -rf redis-3.2.1/* 7006
[root@itcast01 local]# cp -rf redis-3.2.1/* 7007
[root@itcast01 src]# ./redis-trib.rb add-node 127.0.0.1:7006 127.0.0.1:7000
注:第一个参数为新节点的“”ip:端口“”,第二个参数为集群中的任一有效的节点。一切顺利的话,输出如下:
[root@itcast01 src]# ./redis-trib.rb add-node 127.0.0.1:7006 127.0.0.1:7000
>>> Adding node 127.0.0.1:7006 to cluster 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 9d81b0624b5080e5304165b07c2ef69a011ec28e 127.0.0.1:7000
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: ebbb23e3206c46eb64035af1a4381e2bb20a0a20 127.0.0.1:7004
slots: (0 slots) slave
replicates 20fbccf06841f7aa699b97bff72ece2f96599236
S: b62090a2fd65e1aa4d7053e78c1ff192bd152eb9 127.0.0.1:7005
slots: (0 slots) slave
replicates fbcce8fbcf22bb2d6b6f6f56e27b864210087213
M: fbcce8fbcf22bb2d6b6f6f56e27b864210087213 127.0.0.1:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
M: 20fbccf06841f7aa699b97bff72ece2f96599236 127.0.0.1:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 88d693578dd0bdaca9e32422565c624790961bc9 127.0.0.1:7003
slots: (0 slots) slave
replicates 9d81b0624b5080e5304165b07c2ef69a011ec28e
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 127.0.0.1:7006 to make it join the cluster.
[OK] New node added correctly.
我们用redis-tirb check确认下状态
[root@itcast01 src]# ./redis-trib.rb check 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 9d81b0624b5080e5304165b07c2ef69a011ec28e 127.0.0.1:7000
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: ebbb23e3206c46eb64035af1a4381e2bb20a0a20 127.0.0.1:7004
slots: (0 slots) slave
replicates 20fbccf06841f7aa699b97bff72ece2f96599236
S: b62090a2fd65e1aa4d7053e78c1ff192bd152eb9 127.0.0.1:7005
slots: (0 slots) slave
replicates fbcce8fbcf22bb2d6b6f6f56e27b864210087213
M: fbcce8fbcf22bb2d6b6f6f56e27b864210087213 127.0.0.1:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
M: 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc 127.0.0.1:7006
slots: (0 slots) master
0 additional replica(s)
M: 20fbccf06841f7aa699b97bff72ece2f96599236 127.0.0.1:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 88d693578dd0bdaca9e32422565c624790961bc9 127.0.0.1:7003
slots: (0 slots) slave
replicates 9d81b0624b5080e5304165b07c2ef69a011ec28e
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
上面的输出已经说明7006是新的master节点了,继续添加新节点。 用下面的命令把7007当成slave加入.
[root@itcast01 src]# ./redis-trib.rb add-node --slave --master-id 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc 127.0.0.1:7007 127.0.0.1:7000
输出如下:
[root@itcast01 src]# ./redis-trib.rb check 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 9d81b0624b5080e5304165b07c2ef69a011ec28e 127.0.0.1:7000
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: ebbb23e3206c46eb64035af1a4381e2bb20a0a20 127.0.0.1:7004
slots: (0 slots) slave
replicates 20fbccf06841f7aa699b97bff72ece2f96599236
S: b62090a2fd65e1aa4d7053e78c1ff192bd152eb9 127.0.0.1:7005
slots: (0 slots) slave
replicates fbcce8fbcf22bb2d6b6f6f56e27b864210087213
M: fbcce8fbcf22bb2d6b6f6f56e27b864210087213 127.0.0.1:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: bf84939e7e6b066d3d9caf7aae1e1b8e7ca2522c 127.0.0.1:7007
slots: (0 slots) slave
replicates 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc
M: 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc 127.0.0.1:7006
slots: (0 slots) master
1 additional replica(s)
M: 20fbccf06841f7aa699b97bff72ece2f96599236 127.0.0.1:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 88d693578dd0bdaca9e32422565c624790961bc9 127.0.0.1:7003
slots: (0 slots) slave
replicates 9d81b0624b5080e5304165b07c2ef69a011ec28e
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
说明7007已经是7006的slave.
10、reshared 重新划分slot.
[root@itcast01 src]# ./redis-trib.rb info 127.0.0.1:7000
127.0.0.1:7000 (9d81b062...) -> 8 keys | 5461 slots | 1 slaves.
127.0.0.1:7002 (fbcce8fb...) -> 11 keys | 5461 slots | 1 slaves.
127.0.0.1:7006 (8e35ebeb...) -> 0 keys | 0 slots | 1 slaves.
127.0.0.1:7001 (20fbccf0...) -> 7 keys | 5462 slots | 1 slaves.
[OK] 26 keys in 4 masters.
0.00 keys per slot on average.
用下面的命令可以重新分配slot
[root@itcast01 src]# ./redis-trib.rb reshard 127.0.0.1:7000
reshard后面的IP:port,只要是在cluster中的有效节点即可。
[root@itcast01 src]# ./redis-trib.rb reshard 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000) .... M: 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc 127.0.0.1:7006
slots: (0 slots) master
1 additional replica(s)
.... >>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 1000 #这里输入要移动多少slot
What is the receiving node ID? 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc #这里输入目标节点的ID
Please enter all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes IDs.
Source node #1:all # 将所有node当做源节点
Moving slot 6455 from 20fbccf06841f7aa699b97bff72ece2f96599236
Moving slot 6456 from 20fbccf06841f7aa699b97bff72ece2f96599236
Moving slot 6457 from 20fbccf06841f7aa699b97bff72ece2f96599236
Moving slot 6458 from 20fbccf06841f7aa699b97bff72ece2f96599236
Moving slot 6459 from 20fbccf06841f7aa699b97bff72ece2f96599236
Moving slot 6460 from 20fbccf06841f7aa699b97bff72ece2f96599236
Do you want to proceed with the proposed reshard plan (yes/no)? yes #确认执行 ....
Moving slot 12191 from 127.0.0.1:7002 to 127.0.0.1:7006:
Moving slot 12192 from 127.0.0.1:7002 to 127.0.0.1:7006:
Moving slot 12193 from 127.0.0.1:7002 to 127.0.0.1:7006:
Moving slot 12194 from 127.0.0.1:7002 to 127.0.0.1:7006:
注:第一个交互询问,填写多少slot移动时,要好好想想,如果填成16384,则将所有slot都移动到一个固定节点上,会导致更加不均衡!建议每次移动500~1000,这样对线上的影响比较小。
reshard可以多次操作,直到达到期望的分布为止(注:个人觉得redis的reshard这里有点麻烦,要移动多少slot需要人工计算,如果能提供一个参数之类,让16384个slot自动平均分配就好了),调整完成后,可以再看看分布情况:
[root@itcast01 src]# ./redis-trib.rb info 127.0.0.1:7000
127.0.0.1:7000 (9d81b062...) -> 12 keys | 7005 slots | 1 slaves.
127.0.0.1:7002 (fbcce8fb...) -> 8 keys | 4189 slots | 1 slaves.
127.0.0.1:7006 (8e35ebeb...) -> 3 keys | 1000 slots | 1 slaves.
127.0.0.1:7001 (20fbccf0...) -> 3 keys | 4190 slots | 1 slaves.
11、删除节点
[root@itcast01 src]# ./redis-trib.rb del-node 127.0.0.1:7006 8e35ebeb7325c79b81e1beee03cc5e56e8334fdc
del-node后面的ip:port只要是cluster中有效节点即可,最后一个参数为目标节点的id
[root@itcast01 src]# ./redis-trib.rb del-node 127.0.0.1:7006 bf84939e7e6b066d3d9caf7aae1e1b8e7ca2522c
>>> Removing node bf84939e7e6b066d3d9caf7aae1e1b8e7ca2522c from cluster 127.0.0.1:7006
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
删除节点后,也会关闭对应的redis服务.