017 Ceph的集群管理_3

一、验证OSD

1.1 osd状态

运行状态有：up，in，out，down

正常状态的OSD为up且in

当OSD故障时，守护进程offline，在5分钟内，集群仍会将其标记为up和in，这是为了防止网络抖动

如果5分钟内仍未恢复，则会标记为down和out。此时该OSD上的PG开始迁移。这个5分钟的时间间隔可以通过mon_osd_down_out_interval配置项修改

当故障的OSD重新上线以后，会触发新的数据再平衡

当集群有noout标志位时，则osd下线不会导致数据重平衡

OSD每隔6s会互相验证状态。并每隔120s向mon报告一次状态。

容量状态：nearfull，full

1.2 常用指令

[root@ceph2 ~]# ceph osd stat

9 osds: 9 up, 9 in; 32 remapped pgs    # 显示OSD状态

[root@ceph2 ~]# ceph osd df

ID CLASS WEIGHT  REWEIGHT SIZE   USE   AVAIL  %USE VAR  PGS    # 报告osd使用量
 0   hdd 0.01500  1.00000 15348M  112M 15236M 0.74 0.58  48 
 2   hdd 0.01500  1.00000 15348M  112M 15236M 0.73 0.57  40 
 1   hdd 0.01500  1.00000 15348M  114M 15234M 0.75 0.58  39 
 3   hdd 0.01500  1.00000 15348M  269M 15079M 1.76 1.37 256 
 6   hdd 0.01500  1.00000 15348M  208M 15140M 1.36 1.07 248 
 5   hdd 0.01500  1.00000 15348M  228M 15120M 1.49 1.17 238 
 8   hdd 0.01500  1.00000 15348M  245M 15103M 1.60 1.25 266 
 4   hdd 0.01500  1.00000 15348M  253M 15095M 1.65 1.29 245 
 7   hdd 0.01500  1.00000 15348M  218M 15130M 1.42 1.11 228 
 3   hdd 0.01500  1.00000 15348M  269M 15079M 1.76 1.37 256 
 6   hdd 0.01500  1.00000 15348M  208M 15140M 1.36 1.07 248 
 5   hdd 0.01500  1.00000 15348M  228M 15120M 1.49 1.17 238 
 8   hdd 0.01500  1.00000 15348M  245M 15103M 1.60 1.25 266 
 4   hdd 0.01500  1.00000 15348M  253M 15095M 1.65 1.29 245 
 7   hdd 0.01500  1.00000 15348M  218M 15130M 1.42 1.11 228 
                    TOTAL   134G 1765M   133G 1.28          
MIN/MAX VAR: 0.57/1.37  STDDEV: 0.36

[root@ceph2 ~]# ceph osd find osd.0

{             
    "osd": 0,                      
    "ip": "172.25.250.11:6800/185671",             #查找指定osd位置
    "crush_location": {
        "host": "ceph2-ssd",
        "root": "ssd-root"
    }
}

1.3 osd的心跳参数

osd_heartbeat_interval # osd之间传递心跳的间隔时间

osd_heartbeat_grace # 一个osd多久没心跳，就会被集群认为它down了

mon_osd_min_down_reporters # 确定一个osd状态为down的最少报告来源osd数

mon_osd_min_down_reports # 一个OSD必须重复报告一个osd状态为down的次数

mon_osd_down_out_interval # 当osd停止响应多长时间，将其标记为down和out

mon_osd_report_timeout # monitor宣布失败osd为down前的等待时间

osd_mon_report_interval_min # 一个新的osd加入集群时，等待多长时间，开始向monitor报告osd_mon_report_interval_max # monitor允许osd报告的最大间隔，超时就认为它down了

osd_mon_heartbeat_interval # osd向monitor报告心跳的时间

二、管理OSD容量

当集群容量达到mon_osd_nearfull_ratio的值时，集群会进入HEALTH_WARN状态。这是为了在达到full_ratio之前，提醒添加OSD。默认设置为0.85，即85%

当集群容量达到mon_osd_full_ratio的值时，集群将停止写入，但允许读取。集群会进入到HEALTH_ERR状态。默认为0.95，即95%。这是为了防止当一个或多个OSD故障时仍留有余地能重平衡数据

2.1 设置

[root@ceph2 ~]# ceph osd set-nearfull-ratio 0.75

osd set-nearfull-ratio 0.75

[root@ceph2 ~]# ceph osd set-full-ratio 0.85

osd set-full-ratio 0.85

[root@ceph2 ~]# ceph osd dump

crush_version 43

full_ratio 0.85

backfillfull_ratio 0.9

nearfull_ratio 0.75

[root@ceph2 ~]# ceph daemon osd.0 config show|grep full_ratio

"mon_osd_backfillfull_ratio": "0.900000",
"mon_osd_full_ratio": "0.950000",
"mon_osd_nearfull_ratio": "0.850000",
"osd_failsafe_full_ratio": "0.970000",
"osd_pool_default_cache_target_full_ratio": "0.800000",

[root@ceph2 ~]# ceph daemon osd.0 config show|grep full_ratio

"mon_osd_backfillfull_ratio": "0.900000",
"mon_osd_full_ratio": "0.950000",
"mon_osd_nearfull_ratio": "0.850000",
"osd_failsafe_full_ratio": "0.970000",
"osd_pool_default_cache_target_full_ratio": "0.800000",

[root@ceph2 ~]# ceph tell osd.* injectargs --mon_osd_full_ratio 0.85

[root@ceph2 ~]# ceph daemon osd.0 config show|grep full_ratio

"mon_osd_backfillfull_ratio": "0.900000",
"mon_osd_full_ratio": "0.850000",
"mon_osd_nearfull_ratio": "0.850000",
"osd_failsafe_full_ratio": "0.970000",
"osd_pool_default_cache_target_full_ratio": "0.800000",

三、集群状态full的问题

3.1 设置集群状态为full

[root@ceph2 ~]# ceph osd set full

full is set

[root@ceph2 ~]# ceph -s

 cluster:
    id:     35a91e48-8244-4e96-a7ee-980ab989d20d
    health: HEALTH_WARN
            full flag(s) set
 
  services:
    mon:        3 daemons, quorum ceph2,ceph3,ceph4
    mgr:        ceph4(active), standbys: ceph2, ceph3
    mds:        cephfs-1/1/1 up  {0=ceph2=up:active}, 1 up:standby
    osd:        9 osds: 9 up, 9 in; 32 remapped pgs
                flags full
    rbd-mirror: 1 daemon active
 
  data:
    pools:   14 pools, 536 pgs
    objects: 220 objects, 240 MB
    usage:   1768 MB used, 133 GB / 134 GB avail
    pgs:     508 active+clean
             28  active+clean+remapped           #pg有问题
 
  io:
    client:   2558 B/s rd, 0 B/s wr, 2 op/s rd, 0 op/s wr

3.2 取消full状态

[root@ceph2 ~]# ceph osd unset full

full is unset

[root@ceph2 ~]# ceph -s

cluster:
id: 35a91e48-8244-4e96-a7ee-980ab989d20d
health: HEALTH_ERR
full ratio(s) out of order
Reduced data availability: 32 pgs inactive, 32 pgs peering, 32 pgs stale
Degraded data redundancy: 32 pgs unclean          #PG也有问题

services:
mon: 3 daemons, quorum ceph2,ceph3,ceph4
mgr: ceph4(active), standbys: ceph2, ceph3
mds: cephfs-1/1/1 up {0=ceph2=up:active}, 1 up:standby
osd: 9 osds: 9 up, 9 in
rbd-mirror: 1 daemon active

data:
pools: 14 pools, 536 pgs
objects: 221 objects, 240 MB
usage: 1780 MB used, 133 GB / 134 GB avail
pgs: 5.970% pgs not active
504 active+clean
32 stale+peering

io:
client: 4911 B/s rd, 0 B/s wr, 5 op/s rd, 0 op/s wr

查看，去的定是一个存储池ssdpool的问题

3.3 删除ssdpool

[root@ceph2 ~]# ceph osd pool delete ssdpool ssdpool --yes-i-really-really-mean-it

[root@ceph2 ~]# ceph -s

  cluster:
    id:     35a91e48-8244-4e96-a7ee-980ab989d20d
    health: HEALTH_ERR
            full ratio(s) out of order
 
  services:
    mon:        3 daemons, quorum ceph2,ceph3,ceph4
    mgr:        ceph4(active), standbys: ceph2, ceph3
    mds:        cephfs-1/1/1 up  {0=ceph2=up:active}, 1 up:standby
    osd:        9 osds: 9 up, 9 in
    rbd-mirror: 1 daemon active
 
  data:
    pools:   13 pools, 504 pgs
    objects: 221 objects, 241 MB
    usage:   1772 MB used, 133 GB / 134 GB avail
    pgs:     504 active+clean
 
  io:
    client:   341 B/s rd, 0 op/s rd, 0 op/s wr

[root@ceph2 ~]# ceph osd unset full

[root@ceph2 ceph]# ceph -s

cluster:
    id:     35a91e48-8244-4e96-a7ee-980ab989d20d
    health: HEALTH_ERR
            full ratio(s) out of order   #依然不起作用
 
  services:
    mon:        3 daemons, quorum ceph2,ceph3,ceph4
    mgr:        ceph4(active), standbys: ceph2, ceph3
    mds:        cephfs-1/1/1 up  {0=ceph2=up:active}, 1 up:standby
    osd:        9 osds: 9 up, 9 in
    rbd-mirror: 1 daemon active
 
  data:
    pools:   13 pools, 504 pgs
    objects: 221 objects, 241 MB
    usage:   1773 MB used, 133 GB / 134 GB avail
    pgs:     504 active+clean
 
  io:
    client:   2046 B/s rd, 0 B/s wr, 2 op/s rd, 0 op/s wr

[root@ceph2 ceph]# ceph health detail

HEALTH_ERR full ratio(s) out of order
OSD_OUT_OF_ORDER_FULL full ratio(s) out of order
full_ratio (0.85) < backfillfull_ratio (0.9), increased    #发现是在前面配置full_ratio导致小于backfillfull_ratio

3.4 重设full_ratio

[root@ceph2 ceph]# ceph osd set-full-ratio 0.95

osd set-full-ratio 0.95

[root@ceph2 ceph]# ceph osd set-nearfull-ratio 0.9

osd set-nearfull-ratio 0.9

[root@ceph2 ceph]# ceph osd dump

epoch 325
fsid 35a91e48-8244-4e96-a7ee-980ab989d20d
created 2019-03-16 12:39:22.552356
modified 2019-03-28 10:54:42.035882
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 46
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.9
require_min_compat_client jewel
min_compat_client jewel
require_osd_release luminous
pool 1 'testpool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 190 flags hashpspool stripe_width 0 application rbd
    snap 1 'testpool-snap-20190316' 2019-03-16 22:27:34.150433
    snap 2 'testpool-snap-2' 2019-03-16 22:31:15.430823
pool 5 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 191 flags hashpspool stripe_width 0 application rbd
    removed_snaps [1~13]
pool 6 'rbdmirror' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 192 flags hashpspool stripe_width 0 application rbd
    removed_snaps [1~7]
pool 7 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 176 flags hashpspool stripe_width 0 application rgw
pool 8 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 178 flags hashpspool stripe_width 0 application rgw
pool 9 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 180 flags hashpspool stripe_width 0 application rgw
pool 10 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 182 flags hashpspool stripe_width 0 application rgw
pool 11 'xiantao.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 194 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 12 'xiantao.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 196 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 13 'xiantao.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 198 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 14 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 214 flags hashpspool stripe_width 0 application cephfs
pool 15 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 214 flags hashpspool stripe_width 0 application cephfs
pool 16 'test' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 last_change 280 flags hashpspool stripe_width 0 application rbd
max_osd 9
osd.0 up   in  weight 1 up_from 314 up_thru 315 down_at 313 last_clean_interval [308,312) 172.25.250.11:6808/1141125 172.25.250.11:6809/1141125 172.25.250.11:6810/1141125 172.25.250.11:6811/1141125 exists,up 745dce53-1c63-4c50-b434-d441038dafe4
osd.1 up   in  weight 1 up_from 315 up_thru 315 down_at 313 last_clean_interval [310,312) 172.25.250.13:6805/592704 172.25.250.13:6806/592704 172.25.250.13:6807/592704 172.25.250.13:6808/592704 exists,up a7562276-6dfd-4803-b248-a7cbdb64ebec
osd.2 up   in  weight 1 up_from 314 up_thru 315 down_at 313 last_clean_interval [308,312) 172.25.250.12:6800/94300 172.25.250.12:6801/94300 172.25.250.12:6802/94300 172.25.250.12:6803/94300 exists,up bbef1a00-3a31-48a0-a065-3a16b9edc3b1
osd.3 up   in  weight 1 up_from 315 up_thru 315 down_at 314 last_clean_interval [308,312) 172.25.250.11:6800/1140952 172.25.250.11:6801/1140952 172.25.250.11:6802/1140952 172.25.250.11:6803/1140952 exists,up e934a4fb-7125-4e85-895c-f66cc5534ceb
osd.4 up   in  weight 1 up_from 315 up_thru 315 down_at 313 last_clean_interval [310,312) 172.25.250.13:6809/592702 172.25.250.13:6810/592702 172.25.250.13:6811/592702 172.25.250.13:6812/592702 exists,up e2c33bb3-02d2-4cce-85e8-25c419351673
osd.5 up   in  weight 1 up_from 314 up_thru 315 down_at 313 last_clean_interval [308,312) 172.25.250.12:6804/94301 172.25.250.12:6805/94301 172.25.250.12:6806/94301 172.25.250.12:6807/94301 exists,up d299e33c-0c24-4cd9-a37a-a6fcd420a529
osd.6 up   in  weight 1 up_from 315 up_thru 315 down_at 314 last_clean_interval [308,312) 172.25.250.11:6804/1140955 172.25.250.11:6805/1140955 172.25.250.11:6806/1140955 172.25.250.11:6807/1140955 exists,up debe7f4e-656b-48e2-a0b2-bdd8613afcc4
osd.7 up   in  weight 1 up_from 314 up_thru 315 down_at 313 last_clean_interval [309,312) 172.25.250.13:6801/592699 172.25.250.13:6802/592699 172.25.250.13:6803/592699 172.25.250.13:6804/592699 exists,up 8c403679-7530-48d0-812b-72050ad43aae
osd.8 up   in  weight 1 up_from 315 up_thru 315 down_at 313 last_clean_interval [310,312) 172.25.250.12:6808/94302 172.25.250.12:6810/94302 172.25.250.12:6811/94302 172.25.250.12:6812/94302 exists,up bb73edf8-ca97-40c3-a727-d5fde1a9d1d9

3.5 再次尝试

[root@ceph2 ceph]# ceph osd unset full

full is unset

[root@ceph2 ceph]# ceph -s

 cluster:
    id:     35a91e48-8244-4e96-a7ee-980ab989d20d
    health: HEALTH_OK     #成功
 
  services:
    mon:        3 daemons, quorum ceph2,ceph3,ceph4
    mgr:        ceph4(active), standbys: ceph2, ceph3
    mds:        cephfs-1/1/1 up  {0=ceph2=up:active}, 1 up:standby
    osd:        9 osds: 9 up, 9 in
    rbd-mirror: 1 daemon active
 
  data:
    pools:   13 pools, 504 pgs
    objects: 221 objects, 241 MB
    usage:   1773 MB used, 133 GB / 134 GB avail
    pgs:     504 active+clean
 
  io:
    client:   0 B/s wr, 0 op/s rd, 0 op/s wr

四、手动控制PG的Primary OSD

可以通过手动修改osd的权重以提升特定OSD被选为PG Primary OSD的概率，避免将速度慢的磁盘用作primary osd

4.1 查看osd.4为主的pg

[root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"

dumped all                          #查看OSD.4位主的PG

1.7e          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.280490        0'0    311:517 [4,0,8]          4 [4,0,8]              4        0'0 2019-03-27 13:28:30.900982             0'0 2019-03-24 06:16:20.594466 
1.7b          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.256673        0'0    311:523 [4,6,5]          4 [4,6,5]              4        0'0 2019-03-28 02:46:27.659275             0'0 2019-03-23 09:10:34.438462 
15.77         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.282033        0'0    311:162 [4,5,0]          4 [4,5,0]              4        0'0 2019-03-28 04:25:28.324399             0'0 2019-03-26 17:10:19.390530 
1.77          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:34:03.733420        0'0    312:528 [4,0,5]          4 [4,0,5]              4        0'0 2019-03-28 08:34:03.733386             0'0 2019-03-27 08:26:21.579623 
15.7a         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.257051        0'0    311:158 [4,2,3]          4 [4,2,3]              4        0'0 2019-03-28 03:27:22.186467             0'0 2019-03-26 17:10:19.390530 
15.7c         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.273391        0'0    311:144 [4,0,8]          4 [4,0,8]              4        0'0 2019-03-27 17:59:38.124535             0'0 2019-03-26 17:10:19.390530 
1.72          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.276870        0'0    311:528 [4,8,0]          4 [4,8,0]              4        0'0 2019-03-28 06:36:06.125767             0'0 2019-03-24 13:59:12.569691 
15.7f         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.258669        0'0    311:149 [4,8,0]          4 [4,8,0]              4        0'0 2019-03-27 21:48:22.082918             0'0 2019-03-27 21:48:22.082918 
15.69         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.258736        0'0    311:150 [4,0,8]          4 [4,0,8]              4        0'0 2019-03-28 00:07:06.805003             0'0 2019-03-28 00:07:06.805003 
1.67          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.275098        0'0    311:517 [4,0,8]          4 [4,0,8]              4        0'0 2019-03-27 21:08:41.166673             0'0 2019-03-24 06:16:29.598240 
14.22         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.257257        0'0    311:149 [4,5,6]          4 [4,5,6]              4        0'0 2019-03-27 20:32:16.816439             0'0 2019-03-26 17:09:56.246887 
14.29         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.252788        0'0    311:151 [4,5,6]          4 [4,5,6]              4        0'0 2019-03-27 21:55:42.189434             0'0 2019-03-26 17:09:56.246887 
5.21          2                  0        0         0       0  4210688  139      139  active+clean 2019-03-28 08:02:25.257694    189'139    311:730 [4,6,2]          4 [4,6,2]              4    189'139 2019-03-27 19:02:33.483252         189'139 2019-03-25 08:42:13.970938 
14.2a         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.256911        0'0    311:150 [4,6,5]          4 [4,6,5]              4        0'0 2019-03-27 18:09:45.512728             0'0 2019-03-26 17:09:56.246887 
14.2b         0                  0        0         0       0        0    1        1  active+clean 2019-03-28 08:02:25.258316      214'1    311:162 [4,6,2]          4 [4,6,2]              4      214'1 2019-03-27 23:48:05.092971             0'0 2019-03-26 17:09:56.246887 
14.2d         1                  0        0         0       0       46    1        1  active+clean 2019-03-28 08:02:25.282383      214'1    311:171 [4,3,2]          4 [4,3,2]              4      214'1 2019-03-28 03:14:08.690676             0'0 2019-03-26 17:09:56.246887 
15.2c         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.258195        0'0    311:157 [4,3,5]          4 [4,3,5]              4        0'0 2019-03-28 02:03:17.819746             0'0 2019-03-28 02:03:17.819746 
6.1a          1                  0        0         0       0       19    2        2  active+clean 2019-03-28 08:02:25.281807      161'2    311:267 [4,2,6]          4 [4,2,6]              4      161'2 2019-03-27 22:42:45.639905           161'2 2019-03-26 12:51:51.614941 
5.18          4                  0        0         0       0    49168   98       98  active+clean 2019-03-28 08:02:25.258482     172'98    311:621 [4,8,3]          4 [4,8,3]              4     172'98 2019-03-27 21:27:03.723920          172'98 2019-03-27 21:27:03.723920 
15.14         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.252656        0'0    311:148 [4,6,5]          4 [4,6,5]              4        0'0 2019-03-27 19:56:18.466744             0'0 2019-03-26 17:10:19.390530 
15.17         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.256549        0'0    311:164 [4,5,0]          4 [4,5,0]              4        0'0 2019-03-27 23:58:46.490357             0'0 2019-03-26 17:10:19.390530 
1.18          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.277674        0'0    311:507 [4,6,8]          4 [4,6,8]              4        0'0 2019-03-28 01:14:47.944309             0'0 2019-03-26 18:31:14.774358 
5.1c          2                  0        0         0       0       16  250      250  active+clean 2019-03-28 08:02:25.257857    183'250  311:19066 [4,2,6]          4 [4,2,6]              4    183'250 2019-03-28 05:42:09.856046         183'250 2019-03-25 23:36:49.652800 
15.19         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.257506        0'0    311:164 [4,2,3]          4 [4,2,3]              4        0'0 2019-03-28 00:39:31.020637             0'0 2019-03-26 17:10:19.390530 
16.7          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.282212        0'0     311:40 [4,3,2]          4 [4,3,2]              4        0'0 2019-03-28 01:11:12.974900             0'0 2019-03-26 21:40:00.073686 
6.e           0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.258109        0'0    311:251 [4,6,2]          4 [4,6,2]              4        0'0 2019-03-27 06:36:11.963158             0'0 2019-03-27 06:36:11.963158 
13.5          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.257437        0'0    311:168 [4,0,2]          4 [4,0,2]              4        0'0 2019-03-27 19:52:21.320611             0'0 2019-03-26 13:31:34.012304 
16.19         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.257560        0'0     311:42 [4,2,6]          4 [4,2,6]              4        0'0 2019-03-28 04:21:53.015903             0'0 2019-03-26 21:40:00.073686 
7.1           3                  0        0         0       0     1813   14       14  active+clean 2019-03-28 08:02:25.257994     192'14    311:303 [4,2,3]          4 [4,2,3]              4     192'14 2019-03-27 12:08:04.858102          192'14 2019-03-27 12:08:04.858102 
14.9          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.252723        0'0    311:163 [4,3,5]          4 [4,3,5]              4        0'0 2019-03-28 04:45:30.060857             0'0 2019-03-28 04:45:30.060857 
5.1           3                  0        0         0       0  8404992  119      119  active+clean 2019-03-28 08:02:25.258586    189'119    311:635 [4,3,8]          4 [4,3,8]              4    189'119 2019-03-28 01:01:39.725401         189'119 2019-03-25 09:40:24.623173 
13.6          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.257198        0'0    311:157 [4,5,0]          4 [4,5,0]              4        0'0 2019-03-27 15:49:19.196870             0'0 2019-03-26 13:31:34.012304 
5.f           5                  0        0         0       0    86016  128      128  active+clean 2019-03-28 08:02:25.258053    183'128   311:1179 [4,2,3]          4 [4,2,3]              4    183'128 2019-03-27 12:15:30.134353         183'128 2019-03-22 12:21:02.832942 
16.1d         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.257306        0'0     311:42 [4,0,2]          4 [4,0,2]              4        0'0 2019-03-28 01:15:37.043172             0'0 2019-03-26 21:40:00.073686 
12.0          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.258535        0'0    311:140 [4,6,8]          4 [4,6,8]              4        0'0 2019-03-27 15:42:11.927266             0'0 2019-03-26 13:31:31.916623 
16.1f         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.258248        0'0     311:41 [4,0,5]          4 [4,0,5]              4        0'0 2019-03-28 08:01:48.349363             0'0 2019-03-28 08:01:48.349363 
9.6           0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.257612        0'0    311:211 [4,2,3]          4 [4,2,3]              4        0'0 2019-03-27 23:02:31.386965             0'0 2019-03-27 23:02:31.386965 
1.f           0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.279868        0'0    311:503 [4,3,8]          4 [4,3,8]              4        0'0 2019-03-28 07:41:02.022670             0'0 2019-03-24 07:50:30.260358 
1.10          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.257936        0'0    311:538 [4,2,0]          4 [4,2,0]              4        0'0 2019-03-28 01:43:31.429879             0'0 2019-03-23 06:36:38.178339 
1.12          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.256725        0'0    311:527 [4,3,5]          4 [4,3,5]              4        0'0 2019-03-28 04:49:49.213043             0'0 2019-03-25 17:35:25.833155 
16.2          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.278599        0'0     311:31 [4,6,8]          4 [4,6,8]              4        0'0 2019-03-28 07:32:10.065419             0'0 2019-03-26 21:40:00.073686 
15.1d         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.252838        0'0    311:155 [4,5,3]          4 [4,5,3]              4        0'0 2019-03-28 00:50:04.416619             0'0 2019-03-26 17:10:19.390530 
5.2a          0                  0        0         0       0        0  107      107  active+clean 2019-03-28 08:02:25.281096    172'107    311:621 [4,6,8]          4 [4,6,8]              4    172'107 2019-03-27 23:39:40.781443         172'107 2019-03-25 17:35:38.835798 
5.2b          7                  0        0         0       0 16826368  225      225  active+clean 2019-03-28 08:02:25.257363    189'225   311:2419 [4,0,5]          4 [4,0,5]              4    189'225 2019-03-27 10:24:42.972494         189'225 2019-03-25 04:13:33.567532 
1.31          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.256401        0'0    311:514 [4,5,6]          4 [4,5,6]              4        0'0 2019-03-27 20:39:23.076113             0'0 2019-03-25 10:06:22.224727 
5.31          1                  0        0         0       0  4194304  113      113  active+clean 2019-03-28 08:02:25.282326    189'113    311:661 [4,2,3]          4 [4,2,3]              4    189'113 2019-03-27 23:35:50.633871         189'113 2019-03-25 10:27:03.837772 
14.37         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.282270        0'0    311:153 [4,5,0]          4 [4,5,0]              4        0'0 2019-03-27 20:36:25.969312             0'0 2019-03-26 17:09:56.246887 
15.34         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.258369        0'0    311:132 [4,8,3]          4 [4,8,3]              4        0'0 2019-03-27 23:30:49.442053             0'0 2019-03-26 17:10:19.390530 
1.43          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.279242        0'0    311:501 [4,6,8]          4 [4,6,8]              4        0'0 2019-03-27 21:59:51.254952             0'0 2019-03-26 13:16:37.312462 
1.48          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.281910        0'0    311:534 [4,0,5]          4 [4,0,5]              4        0'0 2019-03-27 23:47:00.053793             0'0 2019-03-24 04:51:10.218424 
15.45         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.258421        0'0    311:155 [4,0,8]          4 [4,0,8]              4        0'0 2019-03-28 01:39:15.366349             0'0 2019-03-26 17:10:19.390530 
1.4e          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.252906        0'0    311:519 [4,5,3]          4 [4,5,3]              4        0'0 2019-03-27 20:50:17.495390             0'0 2019-03-21 01:02:41.709506 
1.51          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.281974        0'0    311:530 [4,6,2]          4 [4,6,2]              4        0'0 2019-03-28 07:23:04.730515             0'0 2019-03-26 00:23:54.419333 
15.5a         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.257140        0'0    311:158 [4,0,2]          4 [4,0,2]              4        0'0 2019-03-28 00:12:17.000955             0'0 2019-03-26 17:10:19.390530 
1.56          0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.256961        0'0    311:521 [4,5,3]          4 [4,5,3]              4        0'0 2019-03-27 16:24:10.512235             0'0 2019-03-27 16:24:10.512235 
15.50         0                  0        0         0       0        0    0        0  active+clean 2019-03-28 08:02:25.252599        0'0    311:154 [4,5,3]          4 [4,5,3]              4        0'0 2019-03-28 00:25:01.475477             0'0 2019-03-26 17:10:19.390530

统计

[root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l

dumped all
56

4.2 权重设为0

[root@ceph2 ceph]# ceph osd primary-affinity osd.4 0

Error EPERM: you must enable 'mon osd allow primary affinity = true' on the mons before you can adjust primary-affinity.  note that older clients will no longer be able to communicate with the cluster.

[root@ceph2 ceph]# ceph daemon /var/run/ceph/ceph-mon.$(hostname -s).asok config show|grep primary

   "mon_osd_allow_primary_affinity": "false",
    "mon_osd_allow_primary_temp": "false",

4.3 修改配置文件

[root@ceph1 ~]# vim /etc/ceph/ceph.conf

[global]
fsid = 35a91e48-8244-4e96-a7ee-980ab989d20d
mon initial members = ceph2,ceph3,ceph4
mon host = 172.25.250.11,172.25.250.12,172.25.250.13
public network = 172.25.250.0/24
cluster network = 172.25.250.0/24
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
[osd]
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = noatime,largeio,inode64,swalloc
osd journal size = 5120

[mon]
mon_allow_pool_delete = true
mon_osd_allow_primary_affinity = true

[root@ceph1 ~]# ansible all -m copy -a 'src=/etc/ceph/ceph.conf dest=/etc/ceph/ceph.conf owner=ceph group=ceph mode=0644'

[root@ceph1 ~]# ansible mons -m shell -a ' systemctl restart ceph-mon.target'

[root@ceph1 ~]# ansible mons -m shell -a ' systemctl restart ceph-osd.target'

[root@ceph2 ceph]# ceph daemon /var/run/ceph/ceph-mon.$(hostname -s).asok config show|grep primary

017 Ceph的集群管理_3

没有生效

4.5 使用命令行修改

[root@ceph2 ceph]# ceph tell mon.\* injectargs '--mon_osd_allow_primary_affinity=true'

mon.ceph2: injectargs:mon_osd_allow_primary_affinity = 'true' (not observed, change may require restart) 
mon.ceph3: injectargs:mon_osd_allow_primary_affinity = 'true' (not observed, change may require restart) 
mon.ceph4: injectargs:mon_osd_allow_primary_affinity = 'true' (not observed, change may require restart)

[root@ceph2 ceph]# ceph daemon /var/run/ceph/ceph-mon.$(hostname -s).asok config show|grep primary

017 Ceph的集群管理_3

[root@ceph2 ceph]# ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l

dumped all
56

4.6 修改权重

[root@ceph2 ceph]#  ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l
dumped all
56
[root@ceph2 ceph]# ceph osd primary-affinity osd.4 0
set osd.4 primary-affinity to 0 (802)
[root@ceph2 ceph]#  ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l
dumped all
56
[root@ceph2 ceph]#  ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l
dumped all
56
[root@ceph2 ceph]#  ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l
dumped all
0
[root@ceph2 ceph]# ceph osd primary-affinity osd.4 0.5
set osd.4 primary-affinity to 0.5 (8327682)
[root@ceph2 ceph]#  ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l
dumped all
26
[root@ceph2 ceph]#  ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l
dumped all
26
[root@ceph2 ceph]#  ceph pg dump|grep 'active+clean'|egrep "\[4,"|wc -l
dumped all
26

博主声明：本文的内容来源主要来自誉天教育晏威老师，由本人实验完成操作验证，需要的博友请联系誉天教育（http://www.yutianedu.com/），获得官方同意或者晏老师（https://www.cnblogs.com/breezey/）本人同意即可转载，谢谢！

秒客网

017 Ceph的集群管理_3

一、验证OSD

1.1 osd状态

1.2 常用指令

1.3 osd的心跳参数

二、管理OSD容量

2.1 设置

三、集群状态full的问题

3.1 设置集群状态为full

3.2 取消full状态

3.3 删除ssdpool

3.4 重设full_ratio

3.5 再次尝试

四、手动控制PG的Primary OSD

4.1 查看osd.4为主的pg

4.2 权重设为0

4.3 修改配置文件

4.5 使用命令行修改

4.6 修改权重

相关文章