一、验证OSD
1.1 osd状态
运行状态有:up,in,out,down
正常状态的OSD为up且in
当OSD故障时,守护进程offline,在5分钟内,集群仍会将其标记为up和in,这是为了防止网络抖动
如果5分钟内仍未恢复,则会标记为down和out。此时该OSD上的PG开始迁移。这个5分钟的时间间隔可以通过mon_osd_down_out_interval配置项修改
当故障的OSD重新上线以后,会触发新的数据再平衡
当集群有noout标志位时,则osd下线不会导致数据重平衡
OSD每隔6s会互相验证状态。并每隔120s向mon报告一次状态。
容量状态:nearfull,full
1.2 常用指令
[root@ceph2 ~]# ceph osd stat
9 osds: 9 up, 9 in; 32 remapped pgs # 显示OSD状态
[root@ceph2 ~]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS # 报告osd使用量 0 hdd 0.01500 1.00000 15348M 112M 15236M 0.74 0.58 48 2 hdd 0.01500 1.00000 15348M 112M 15236M 0.73 0.57 40 1 hdd 0.01500 1.00000 15348M 114M 15234M 0.75 0.58 39 3 hdd 0.01500 1.00000 15348M 269M 15079M 1.76 1.37 256 6 hdd 0.01500 1.00000 15348M 208M 15140M 1.36 1.07 248 5 hdd 0.01500 1.00000 15348M 228M 15120M 1.49 1.17 238 8 hdd 0.01500 1.00000 15348M 245M 15103M 1.60 1.25 266 4 hdd 0.01500 1.00000 15348M 253M 15095M 1.65 1.29 245 7 hdd 0.01500 1.00000 15348M 218M 15130M 1.42 1.11 228 3 hdd 0.01500 1.00000 15348M 269M 15079M 1.76 1.37 256 6 hdd 0.01500 1.00000 15348M 208M 15140M 1.36 1.07 248 5 hdd 0.01500 1.00000 15348M 228M 15120M 1.49 1.17 238 8 hdd 0.01500 1.00000 15348M 245M 15103M 1.60 1.25 266 4 hdd 0.01500 1.00000 15348M 253M 15095M 1.65 1.29 245 7 hdd 0.01500 1.00000 15348M 218M 15130M 1.42 1.11 228 TOTAL 134G 1765M 133G 1.28 MIN/MAX VAR: 0.57/1.37 STDDEV: 0.36
[root@ceph2 ~]# ceph osd find osd.0
{ "osd": 0, "ip": "172.25.250.11:6800/185671", #查找指定osd位置 "crush_location": { "host": "ceph2-ssd", "root": "ssd-root" } }
1.3 osd的心跳参数
osd_heartbeat_interval # osd之间传递心跳的间隔时间
osd_heartbeat_grace # 一个osd多久没心跳,就会被集群认为它down了
mon_osd_min_down_reporters # 确定一个osd状态为down的最少报告来源osd数
mon_osd_min_down_reports # 一个OSD必须重复报告一个osd状态为down的次数
mon_osd_down_out_interval # 当osd停止响应多长时间,将其标记为down和out
mon_osd_report_timeout # monitor宣布失败osd为down前的等待时间
osd_mon_report_interval_min # 一个新的osd加入集群时,等待多长时间,开始向monitor报告osd_mon_report_interval_max # monitor允许osd报告的最大间隔,超时就认为它down了
osd_mon_heartbeat_interval # osd向monitor报告心跳的时间
二、管理OSD容量
当集群容量达到mon_osd_nearfull_ratio的值时,集群会进入HEALTH_WARN状态。这是为了在达到full_ratio之前,提醒添加OSD。默认设置为0.85,即85%
当集群容量达到mon_osd_full_ratio的值时,集群将停止写入,但允许读取。集群会进入到HEALTH_ERR状态。默认为0.95,即95%。这是为了防止当一个或多个OSD故障时仍留有余地能重平衡数据
2.1 设置
[root@ceph2 ~]# ceph osd set-nearfull-ratio 0.75
osd set-nearfull-ratio 0.75
[root@ceph2 ~]# ceph osd set-full-ratio 0.85
osd set-full-ratio 0.85
[root@ceph2 ~]# ceph osd dump
crush_version 43 full_ratio 0.85 backfillfull_ratio 0.9 nearfull_ratio 0.75
[root@ceph2 ~]# ceph daemon osd.0 config show|grep full_ratio
"mon_osd_backfillfull_ratio": "0.900000", "mon_osd_full_ratio": "0.950000", "mon_osd_nearfull_ratio": "0.850000", "osd_failsafe_full_ratio": "0.970000", "osd_pool_default_cache_target_full_ratio": "0.800000",
[root@ceph2 ~]# ceph daemon osd.0 config show|grep full_ratio
"mon_osd_backfillfull_ratio": "0.900000", "mon_osd_full_ratio": "0.950000", "mon_osd_nearfull_ratio": "0.850000", "osd_failsafe_full_ratio": "0.970000", "osd_pool_default_cache_target_full_ratio": "0.800000",
[root@ceph2 ~]# ceph tell osd.* injectargs --mon_osd_full_ratio 0.85
[root@ceph2 ~]# ceph daemon osd.0 config show|grep full_ratio
"mon_osd_backfillfull_ratio": "0.900000", "mon_osd_full_ratio": "0.850000", "mon_osd_nearfull_ratio": "0.850000", "osd_failsafe_full_ratio": "0.970000", "osd_pool_default_cache_target_full_ratio": "0.800000",
三、集群状态full的问题
3.1 设置集群状态为full
[root@ceph2 ~]# ceph osd set full
full is set
[root@ceph2 ~]# ceph -s
cluster: id: 35a91e48-8244-4e96-a7ee-980ab989d20d health: HEALTH_WARN full flag(s) set services: mon: 3 daemons, quorum ceph2,ceph3,ceph4 mgr: ceph4(active), standbys: ceph2, ceph3 mds: cephfs-1/1/1 up {0=ceph2=up:active}, 1 up:standby osd: 9 osds: 9 up, 9 in; 32 remapped pgs flags full rbd-mirror: 1 daemon active data: pools: 14 pools, 536 pgs objects: 220 objects, 240 MB usage: 1768 MB used, 133 GB / 134 GB avail pgs: 508 active+clean 28 active+clean+remapped #pg有问题 io: client: 2558 B/s rd, 0 B/s wr, 2 op/s rd, 0 op/s wr
3.2 取消full状态
[root@ceph2 ~]# ceph osd unset full
full is unset
[root@ceph2 ~]# ceph -s
cluster: id: 35a91e48-8244-4e96-a7ee-980ab989d20d health: HEALTH_ERR full ratio(s) out of order Reduced data availability: 32 pgs inactive, 32 pgs peering, 32 pgs stale Degraded data redundancy: 32 pgs unclean #PG也有问题 services: mon: 3 daemons, quorum ceph2,ceph3,ceph4 mgr: ceph4(active), standbys: ceph2, ceph3 mds: cephfs-1/1/1 up {0=ceph2=up:active}, 1 up:standby osd: 9 osds: 9 up, 9 in rbd-mirror: 1 daemon active data: pools: 14 pools, 536 pgs objects: 221 objects, 240 MB usage: 1780 MB used, 133 GB / 134 GB avail pgs: 5.970% pgs not active 504 active+clean 32 stale+peering io: client: 4911 B/s rd, 0 B/s wr, 5 op/s rd, 0 op/s wr
查看,去的定是一个存储池ssdpool的问题
3.3 删除ssdpool
[root@ceph2 ~]# ceph osd pool delete ssdpool ssdpool --yes-i-really-really-mean-it
[root@ceph2 ~]# ceph -s
cluster: id: 35a91e48-8244-4e96-a7ee-980ab989d20d health: HEALTH_ERR full ratio(s) out of order services: mon: 3 daemons, quorum ceph2,ceph3,ceph4 mgr: ceph4(active), standbys: ceph2, ceph3 mds: cephfs-1/1/1 up {0=ceph2=up:active}, 1 up:standby osd: 9 osds: 9 up, 9 in rbd-mirror: 1 daemon active data: pools: 13 pools, 504 pgs objects: 221 objects, 241 MB usage: 1772 MB used, 133 GB / 134 GB avail pgs: 504 active+clean io: client: 341 B/s rd, 0 op/s rd, 0 op/s wr
[root@ceph2 ~]# ceph osd unset full
[root@ceph2 ceph]# ceph -s
cluster: id: 35a91e48-8244-4e96-a7ee-980ab989d20d health: HEALTH_ERR full ratio(s) out of order #依然不起作用 services: mon: 3 daemons, quorum ceph2,ceph3,ceph4 mgr: ceph4(active), standbys: ceph2, ceph3 mds: cephfs-1/1/1 up {0=ceph2=up:active}, 1 up:standby osd: 9 osds: 9 up, 9 in rbd-mirror: 1 daemon active data: pools: 13 pools, 504 pgs objects: 221 objects, 241 MB usage: 1773 MB used, 133 GB / 134 GB avail pgs: 504 active+clean io: client: 2046 B/s rd, 0 B/s wr, 2 op/s rd, 0 op/s wr
[root@ceph2 ceph]# ceph health detail
HEALTH_ERR full ratio(s) out of order OSD_OUT_OF_ORDER_FULL full ratio(s) out of order full_ratio (0.85) < backfillfull_ratio (0.9), increased #发现是在前面配置full_ratio导致小于backfillfull_ratio
3.4 重设full_ratio
[root@ceph2 ceph]# ceph osd set-full-ratio 0.95
osd set-full-ratio 0.95
[root@ceph2 ceph]# ceph osd set-nearfull-ratio 0.9
osd set-nearfull-ratio 0.9
[root@ceph2 ceph]# ceph osd dump
epoch 325 fsid 35a91e48-8244-4e96-a7ee-980ab989d20d created 2019-03-16 12:39:22.552356 modified 2019-03-28 10:54:42.035882 flags sortbitwise,recovery_deletes,purged_snapdirs crush_version 46 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.9 require_min_compat_client jewel min_compat_client jewel require_osd_release luminous pool 1 'testpool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 190 flags hashpspool stripe_width 0 application rbd snap 1 'testpool-snap-20190316' 2019-03-16 22:27:34.150433 snap 2 'testpool-snap-2' 2019-03-16 22:31:15.430823 pool 5 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 191 flags hashpspool stripe_width 0 application rbd removed_snaps [1~13] pool 6 'rbdmirror' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 192 flags hashpspool stripe_width 0 application rbd removed_snaps [1~7] pool 7 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 176 flags hashpspool stripe_width 0 application rgw pool 8 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 178 flags hashpspool stripe_width 0 application rgw pool 9 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 180 flags hashpspool stripe_width 0 application rgw pool 10 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 182 flags hashpspool stripe_width 0 application rgw pool 11 'xiantao.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 194 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 12 'xiantao.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 196 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 13 'xiantao.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 198 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 14 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 214 flags hashpspool stripe_width 0 application cephfs pool 15 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 214 flags hashpspool stripe_width 0 application cephfs pool 16 'test' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 last_change 280 flags hashpspool stripe_width 0 application rbd max_osd 9 osd.0 up in weight 1 up_from 314 up_thru 315 down_at 313 last_clean_interval [308,312) 172.25.250.11:6808/1141125 172.25.250.11:6809/1141125 172.25.250.11:6810/1141125 172.25.250.11:6811/1141125 exists,up 745dce53-1c63-4c50-b434-d441038dafe4 osd.1 up in weight 1 up_from 315 up_thru 315 down_at 313 last_clean_interval [310,312) 172.25.250.13:6805/592704 172.25.250.13:6806/592704 172.25.250.13:6807/592704 172.25.250.13:6808/592704 exists,up a7562276-6dfd-4803-b248-a7cbdb64ebec osd.2 up in weight 1 up_from 314 up_thru 315 down_at 313 last_clean_interval [308,312) 172.25.250.12:6800/94300 172.25.250.12:6801/94300 172.25.250.12:6802/94300 172.25.250.12:6803/94300 exists,up bbef1a00-3a31-48a0-a065-3a16b9edc3b1 osd.3 up in weight 1 up_from 315 up_thru 315 down_at 314 last_clean_interval [308,312) 172.25.250.11:6800/1140952 172.25.250.11:6801/1140952 172.25.250.11:6802/1140952 172.25.250.11:6803/1140952 exists,up e934a4fb-7125-4e85-895c-f66cc5534ceb osd.4 up in weight 1 up_from 315 up_thru 315 down_at 313 last_clean_interval [310,312) 172.25.250.13:6809/592702 172.25.250.13:6810/592702 172.25.250.13:6811/592702 172.25.250.13:6812/592702 exists,up e2c33bb3-02d2-4cce-85e8-25c419351673 osd.5 up in weight 1 up_from 314 up_thru 315 down_at 313 last_clean_interval [308,312) 172.25.250.12:6804/94301 172.25.250.12:6805/94301 172.25.250.12:6806/94301 172.25.250.12:6807/94301 exists,up d299e33c-0c24-4cd9-a37a-a6fcd420a529 osd.6 up in weight 1 up_from 315 up_thru 315 down_at 314 last_clean_interval [308,312) 172.25.250.11:6804/1140955 172.25.250.11:6805/1140955 172.25.250.11:6806/1140955 172.25.250.11:6807/1140955 exists,up debe7f4e-656b-48e2-a0b2-bdd8613afcc4 osd.7 up in weight 1 up_from 314 up_thru 315 down_at 313 last_clean_interval [309,312) 172.25.250.13:6801/592699 172.25.250.13:6802/592699 172.25.250.13:6803/592699 172.25.250.13:6804/592699 exists,up 8c403679-7530-48d0-812b-72050ad43aae osd.8 up in weight 1 up_from 315 up_thru 315 down_at 313 last_clean_interval [310,312) 172.25.250.12:6808/94302 172.25.250.12:6810/94302 172.25.250.12:6811/94302 172.25.250.12:6812/94302 exists,up bb73edf8-ca97-40c3-a727-d5fde1a9d1d9
3.5 再次尝试
[root@ceph2 ceph]# ceph osd unset full
full is unset
[root@ceph2 ceph]# ceph -s
cluster: id: 35a91e48-8244-4e96-a7ee-980ab989d20d health: HEALTH_OK #成功 services: mon: 3 daemons, quorum ceph2,ceph3,ceph4 mgr: ceph4(active), standbys: ceph2, ceph3 mds: cephfs-1/1/1 up {0=ceph2=up:active}, 1 up:standby osd: 9 osds: 9 up, 9 in rbd-mirror: 1 daemon active data: pools: 13 pools, 504 pgs objects: 221 objects, 241 MB usage: 1773 MB used, 133 GB / 134 GB avail pgs: 504 active+clean io: client: 0 B/s wr, 0 op/s rd, 0 op/s wr
四、手动控制PG的Primary OSD
可以通过手动修改osd的权重以提升 特定OSD被选为PG Primary OSD的概率,避免将速度慢的磁盘用作primary osd
4.1 查看osd.4为主的pg
[root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"
dumped all #查看OSD.4位主的PG
1.7e 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.280490 0'0 311:517 [4,0,8] 4 [4,0,8] 4 0'0 2019-03-27 13:28:30.900982 0'0 2019-03-24 06:16:20.594466 1.7b 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.256673 0'0 311:523 [4,6,5] 4 [4,6,5] 4 0'0 2019-03-28 02:46:27.659275 0'0 2019-03-23 09:10:34.438462 15.77 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.282033 0'0 311:162 [4,5,0] 4 [4,5,0] 4 0'0 2019-03-28 04:25:28.324399 0'0 2019-03-26 17:10:19.390530 1.77 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:34:03.733420 0'0 312:528 [4,0,5] 4 [4,0,5] 4 0'0 2019-03-28 08:34:03.733386 0'0 2019-03-27 08:26:21.579623 15.7a 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.257051 0'0 311:158 [4,2,3] 4 [4,2,3] 4 0'0 2019-03-28 03:27:22.186467 0'0 2019-03-26 17:10:19.390530 15.7c 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.273391 0'0 311:144 [4,0,8] 4 [4,0,8] 4 0'0 2019-03-27 17:59:38.124535 0'0 2019-03-26 17:10:19.390530 1.72 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.276870 0'0 311:528 [4,8,0] 4 [4,8,0] 4 0'0 2019-03-28 06:36:06.125767 0'0 2019-03-24 13:59:12.569691 15.7f 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.258669 0'0 311:149 [4,8,0] 4 [4,8,0] 4 0'0 2019-03-27 21:48:22.082918 0'0 2019-03-27 21:48:22.082918 15.69 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.258736 0'0 311:150 [4,0,8] 4 [4,0,8] 4 0'0 2019-03-28 00:07:06.805003 0'0 2019-03-28 00:07:06.805003 1.67 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.275098 0'0 311:517 [4,0,8] 4 [4,0,8] 4 0'0 2019-03-27 21:08:41.166673 0'0 2019-03-24 06:16:29.598240 14.22 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.257257 0'0 311:149 [4,5,6] 4 [4,5,6] 4 0'0 2019-03-27 20:32:16.816439 0'0 2019-03-26 17:09:56.246887 14.29 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.252788 0'0 311:151 [4,5,6] 4 [4,5,6] 4 0'0 2019-03-27 21:55:42.189434 0'0 2019-03-26 17:09:56.246887 5.21 2 0 0 0 0 4210688 139 139 active+clean 2019-03-28 08:02:25.257694 189'139 311:730 [4,6,2] 4 [4,6,2] 4 189'139 2019-03-27 19:02:33.483252 189'139 2019-03-25 08:42:13.970938 14.2a 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.256911 0'0 311:150 [4,6,5] 4 [4,6,5] 4 0'0 2019-03-27 18:09:45.512728 0'0 2019-03-26 17:09:56.246887 14.2b 0 0 0 0 0 0 1 1 active+clean 2019-03-28 08:02:25.258316 214'1 311:162 [4,6,2] 4 [4,6,2] 4 214'1 2019-03-27 23:48:05.092971 0'0 2019-03-26 17:09:56.246887 14.2d 1 0 0 0 0 46 1 1 active+clean 2019-03-28 08:02:25.282383 214'1 311:171 [4,3,2] 4 [4,3,2] 4 214'1 2019-03-28 03:14:08.690676 0'0 2019-03-26 17:09:56.246887 15.2c 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.258195 0'0 311:157 [4,3,5] 4 [4,3,5] 4 0'0 2019-03-28 02:03:17.819746 0'0 2019-03-28 02:03:17.819746 6.1a 1 0 0 0 0 19 2 2 active+clean 2019-03-28 08:02:25.281807 161'2 311:267 [4,2,6] 4 [4,2,6] 4 161'2 2019-03-27 22:42:45.639905 161'2 2019-03-26 12:51:51.614941 5.18 4 0 0 0 0 49168 98 98 active+clean 2019-03-28 08:02:25.258482 172'98 311:621 [4,8,3] 4 [4,8,3] 4 172'98 2019-03-27 21:27:03.723920 172'98 2019-03-27 21:27:03.723920 15.14 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.252656 0'0 311:148 [4,6,5] 4 [4,6,5] 4 0'0 2019-03-27 19:56:18.466744 0'0 2019-03-26 17:10:19.390530 15.17 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.256549 0'0 311:164 [4,5,0] 4 [4,5,0] 4 0'0 2019-03-27 23:58:46.490357 0'0 2019-03-26 17:10:19.390530 1.18 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.277674 0'0 311:507 [4,6,8] 4 [4,6,8] 4 0'0 2019-03-28 01:14:47.944309 0'0 2019-03-26 18:31:14.774358 5.1c 2 0 0 0 0 16 250 250 active+clean 2019-03-28 08:02:25.257857 183'250 311:19066 [4,2,6] 4 [4,2,6] 4 183'250 2019-03-28 05:42:09.856046 183'250 2019-03-25 23:36:49.652800 15.19 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.257506 0'0 311:164 [4,2,3] 4 [4,2,3] 4 0'0 2019-03-28 00:39:31.020637 0'0 2019-03-26 17:10:19.390530 16.7 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.282212 0'0 311:40 [4,3,2] 4 [4,3,2] 4 0'0 2019-03-28 01:11:12.974900 0'0 2019-03-26 21:40:00.073686 6.e 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.258109 0'0 311:251 [4,6,2] 4 [4,6,2] 4 0'0 2019-03-27 06:36:11.963158 0'0 2019-03-27 06:36:11.963158 13.5 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.257437 0'0 311:168 [4,0,2] 4 [4,0,2] 4 0'0 2019-03-27 19:52:21.320611 0'0 2019-03-26 13:31:34.012304 16.19 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.257560 0'0 311:42 [4,2,6] 4 [4,2,6] 4 0'0 2019-03-28 04:21:53.015903 0'0 2019-03-26 21:40:00.073686 7.1 3 0 0 0 0 1813 14 14 active+clean 2019-03-28 08:02:25.257994 192'14 311:303 [4,2,3] 4 [4,2,3] 4 192'14 2019-03-27 12:08:04.858102 192'14 2019-03-27 12:08:04.858102 14.9 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.252723 0'0 311:163 [4,3,5] 4 [4,3,5] 4 0'0 2019-03-28 04:45:30.060857 0'0 2019-03-28 04:45:30.060857 5.1 3 0 0 0 0 8404992 119 119 active+clean 2019-03-28 08:02:25.258586 189'119 311:635 [4,3,8] 4 [4,3,8] 4 189'119 2019-03-28 01:01:39.725401 189'119 2019-03-25 09:40:24.623173 13.6 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.257198 0'0 311:157 [4,5,0] 4 [4,5,0] 4 0'0 2019-03-27 15:49:19.196870 0'0 2019-03-26 13:31:34.012304 5.f 5 0 0 0 0 86016 128 128 active+clean 2019-03-28 08:02:25.258053 183'128 311:1179 [4,2,3] 4 [4,2,3] 4 183'128 2019-03-27 12:15:30.134353 183'128 2019-03-22 12:21:02.832942 16.1d 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.257306 0'0 311:42 [4,0,2] 4 [4,0,2] 4 0'0 2019-03-28 01:15:37.043172 0'0 2019-03-26 21:40:00.073686 12.0 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.258535 0'0 311:140 [4,6,8] 4 [4,6,8] 4 0'0 2019-03-27 15:42:11.927266 0'0 2019-03-26 13:31:31.916623 16.1f 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.258248 0'0 311:41 [4,0,5] 4 [4,0,5] 4 0'0 2019-03-28 08:01:48.349363 0'0 2019-03-28 08:01:48.349363 9.6 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.257612 0'0 311:211 [4,2,3] 4 [4,2,3] 4 0'0 2019-03-27 23:02:31.386965 0'0 2019-03-27 23:02:31.386965 1.f 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.279868 0'0 311:503 [4,3,8] 4 [4,3,8] 4 0'0 2019-03-28 07:41:02.022670 0'0 2019-03-24 07:50:30.260358 1.10 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.257936 0'0 311:538 [4,2,0] 4 [4,2,0] 4 0'0 2019-03-28 01:43:31.429879 0'0 2019-03-23 06:36:38.178339 1.12 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.256725 0'0 311:527 [4,3,5] 4 [4,3,5] 4 0'0 2019-03-28 04:49:49.213043 0'0 2019-03-25 17:35:25.833155 16.2 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.278599 0'0 311:31 [4,6,8] 4 [4,6,8] 4 0'0 2019-03-28 07:32:10.065419 0'0 2019-03-26 21:40:00.073686 15.1d 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.252838 0'0 311:155 [4,5,3] 4 [4,5,3] 4 0'0 2019-03-28 00:50:04.416619 0'0 2019-03-26 17:10:19.390530 5.2a 0 0 0 0 0 0 107 107 active+clean 2019-03-28 08:02:25.281096 172'107 311:621 [4,6,8] 4 [4,6,8] 4 172'107 2019-03-27 23:39:40.781443 172'107 2019-03-25 17:35:38.835798 5.2b 7 0 0 0 0 16826368 225 225 active+clean 2019-03-28 08:02:25.257363 189'225 311:2419 [4,0,5] 4 [4,0,5] 4 189'225 2019-03-27 10:24:42.972494 189'225 2019-03-25 04:13:33.567532 1.31 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.256401 0'0 311:514 [4,5,6] 4 [4,5,6] 4 0'0 2019-03-27 20:39:23.076113 0'0 2019-03-25 10:06:22.224727 5.31 1 0 0 0 0 4194304 113 113 active+clean 2019-03-28 08:02:25.282326 189'113 311:661 [4,2,3] 4 [4,2,3] 4 189'113 2019-03-27 23:35:50.633871 189'113 2019-03-25 10:27:03.837772 14.37 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.282270 0'0 311:153 [4,5,0] 4 [4,5,0] 4 0'0 2019-03-27 20:36:25.969312 0'0 2019-03-26 17:09:56.246887 15.34 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.258369 0'0 311:132 [4,8,3] 4 [4,8,3] 4 0'0 2019-03-27 23:30:49.442053 0'0 2019-03-26 17:10:19.390530 1.43 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.279242 0'0 311:501 [4,6,8] 4 [4,6,8] 4 0'0 2019-03-27 21:59:51.254952 0'0 2019-03-26 13:16:37.312462 1.48 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.281910 0'0 311:534 [4,0,5] 4 [4,0,5] 4 0'0 2019-03-27 23:47:00.053793 0'0 2019-03-24 04:51:10.218424 15.45 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.258421 0'0 311:155 [4,0,8] 4 [4,0,8] 4 0'0 2019-03-28 01:39:15.366349 0'0 2019-03-26 17:10:19.390530 1.4e 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.252906 0'0 311:519 [4,5,3] 4 [4,5,3] 4 0'0 2019-03-27 20:50:17.495390 0'0 2019-03-21 01:02:41.709506 1.51 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.281974 0'0 311:530 [4,6,2] 4 [4,6,2] 4 0'0 2019-03-28 07:23:04.730515 0'0 2019-03-26 00:23:54.419333 15.5a 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.257140 0'0 311:158 [4,0,2] 4 [4,0,2] 4 0'0 2019-03-28 00:12:17.000955 0'0 2019-03-26 17:10:19.390530 1.56 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.256961 0'0 311:521 [4,5,3] 4 [4,5,3] 4 0'0 2019-03-27 16:24:10.512235 0'0 2019-03-27 16:24:10.512235 15.50 0 0 0 0 0 0 0 0 active+clean 2019-03-28 08:02:25.252599 0'0 311:154 [4,5,3] 4 [4,5,3] 4 0'0 2019-03-28 00:25:01.475477 0'0 2019-03-26 17:10:19.390530
统计
[root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l
dumped all 56
4.2 权重设为0
[root@ceph2 ceph]# ceph osd primary-affinity osd.4 0
Error EPERM: you must enable 'mon osd allow primary affinity = true' on the mons before you can adjust primary-affinity. note that older clients will no longer be able to communicate with the cluster.
[root@ceph2 ceph]# ceph daemon /var/run/ceph/ceph-mon.$(hostname -s).asok config show|grep primary
"mon_osd_allow_primary_affinity": "false", "mon_osd_allow_primary_temp": "false",
4.3 修改配置文件
[root@ceph1 ~]# vim /etc/ceph/ceph.conf
[global] fsid = 35a91e48-8244-4e96-a7ee-980ab989d20d mon initial members = ceph2,ceph3,ceph4 mon host = 172.25.250.11,172.25.250.12,172.25.250.13 public network = 172.25.250.0/24 cluster network = 172.25.250.0/24 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx [osd] osd mkfs type = xfs osd mkfs options xfs = -f -i size=2048 osd mount options xfs = noatime,largeio,inode64,swalloc osd journal size = 5120 [mon] mon_allow_pool_delete = true mon_osd_allow_primary_affinity = true
[root@ceph1 ~]# ansible all -m copy -a 'src=/etc/ceph/ceph.conf dest=/etc/ceph/ceph.conf owner=ceph group=ceph mode=0644'
[root@ceph1 ~]# ansible mons -m shell -a ' systemctl restart ceph-mon.target'
[root@ceph1 ~]# ansible mons -m shell -a ' systemctl restart ceph-osd.target'
[root@ceph2 ceph]# ceph daemon /var/run/ceph/ceph-mon.$(hostname -s).asok config show|grep primary
没有生效
4.5 使用命令行修改
[root@ceph2 ceph]# ceph tell mon.\* injectargs '--mon_osd_allow_primary_affinity=true'
mon.ceph2: injectargs:mon_osd_allow_primary_affinity = 'true' (not observed, change may require restart) mon.ceph3: injectargs:mon_osd_allow_primary_affinity = 'true' (not observed, change may require restart) mon.ceph4: injectargs:mon_osd_allow_primary_affinity = 'true' (not observed, change may require restart)
[root@ceph2 ceph]# ceph daemon /var/run/ceph/ceph-mon.$(hostname -s).asok config show|grep primary
[root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l
dumped all 56
4.6 修改权重
[root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l dumped all 56 [root@ceph2 ceph]# ceph osd primary-affinity osd.4 0 set osd.4 primary-affinity to 0 (802) [root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l dumped all 56 [root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l dumped all 56 [root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l dumped all 0 [root@ceph2 ceph]# ceph osd primary-affinity osd.4 0.5 set osd.4 primary-affinity to 0.5 (8327682) [root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l dumped all 26 [root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l dumped all 26 [root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l dumped all 26
博主声明:本文的内容来源主要来自誉天教育晏威老师,由本人实验完成操作验证,需要的博友请联系誉天教育(http://www.yutianedu.com/),获得官方同意或者晏老师(https://www.cnblogs.com/breezey/)本人同意即可转载,谢谢!