机房意外断电断网不得不预防,下面模拟测试某机房断电断网,B机房断电断网后A机房可正常提供服务,A机房断电断网后可能需要强制重启继续提供服务了,目前查看数据都还在,暂时没验证是否有数据丢失,小试了一把。。。
大概架构
A机房
192.168.70.214:27017 (为primary,其他节点都是SECONDARY)
192.168.70.215:27017
192.168.70.216:27017
B机房
192.168.71.214:27017
192.168.71.215:27017
下面重点记录下A机房断电断网,B机房怎么重启继续提供服务,其实主要就3步,见粗体
repset:SECONDARY> use admin
switched to db admin
repset:SECONDARY> cfg=rs.conf()
{
"_id" : "repset",
"version" : 79,
"members" : [
{
"_id" : 4,
"host" : "192.168.71.214:27017",
"priority" : 10
},
{
"_id" : 5,
"host" : "192.168.71.215:27017",
"priority" : 9
},
{
"_id" : 7,
"host" : "192.168.70.214:27017",
"priority" : 11
},
{
"_id" : 8,
"host" : "192.168.70.215:27017",
"priority" : 6
},
{
"_id" : 13,
"host" : "192.168.70.216:27017",
"priority" : 5
}
]
}
repset:SECONDARY> cfg.members = [cfg.members[0], cfg.members[1]]
[
{
"_id" : 4,
"host" : "192.168.71.214:27017",
"priority" : 10
},
{
"_id" : 5,
"host" : "192.168.71.215:27017",
"priority" : 9
}
]
repset:SECONDARY> rs.reconfig(cfg, {force :true })
{ "ok" : 1 }
repset:SECONDARY> rs.status()
{
"set" : "repset",
"date" : ISODate("2018-11-09T03:45:29Z"),
"myState" : 1,
"members" : [
{
"_id" : 4,
"name" : "192.168.71.214:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 69133,
"optime" : Timestamp(1541663971000, 1),
"optimeDate" : ISODate("2018-11-08T07:59:31Z"),
"self" : true
},
{
"_id" : 5,
"name" : "192.168.71.215:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 8,
"optime" : Timestamp(1541663971000, 1),
"optimeDate" : ISODate("2018-11-08T07:59:31Z"),
"lastHeartbeat" : ISODate("2018-11-09T03:45:29Z"),
"pingMs" : 0
}
],
"ok" : 1
}
repset:PRIMARY>
repset:PRIMARY> use test
switched to db test
repset:PRIMARY> show collections
system.indexes
test
testdb
repset:PRIMARY> rs.conf()
{
"_id" : "repset",
"version" : 38342,
"members" : [
{
"_id" : 4,
"host" : "192.168.71.214:27017",
"priority" : 10
},
{
"_id" : 5,
"host" : "192.168.71.215:27017",
"priority" : 9
}
]
}
从上面观察验证:新选的PRIMARY可以持续提供服务了,之前的数据都还在,解决了单机房断电断网的故障。
欢迎各位提供指导,共同成长。