前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >由OSD class配置引发的PG异常状态修复

由OSD class配置引发的PG异常状态修复

作者头像
用户1260683
发布2018-10-25 10:52:14
3.1K0
发布2018-10-25 10:52:14
举报

由OSD class配置引发的PG异常状态修复

问题描述

ceph版本12.2.8,一个PG卡在remapped状态,但是集群状态是OK的,为了修复这个remapped状态,才有了下面的操作。

代码语言:javascript
复制
[root@demohost cephuser]# ceph -s
  cluster:
    id:     21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum demohost-6,demohost-8,demohost-37
    mgr: demohost-8(active), standbys: demohost-37, demohost-6
    osd: 90 osds: 90 up, 90 in; 1 remapped pgs
    rgw: 1 daemon active

  data:
    pools:   7 pools, 3712 pgs
    objects: 1.30k objects, 1.68GiB
    usage:   103GiB used, 415TiB / 415TiB avail
    pgs:     3711 active+clean
             1    active+clean+remapped

  io:
    client:   16.8KiB/s rd, 16op/s rd, 0op/s wr

检查具体卡remapped的PG信息,对应的OSD为88,48,18,其中88是主OSD
[root@demohost cephuser]# ceph pg dump |grep remapped
dumped all
6.9c          0                  0        0         0       0        0    0        0 active+clean+remapped 2018-09-20 11:27:59.251616        0'0  9784:16679    [88,48]         88 [88,48,18]             88        0'0 2018-09-18 23:17:18.531269             0'0 2018-09-17 23:00:02.496995             0


检查OSD的信息发现这个几个OSD都额外新增了一个class为ssd的类型选项

[root@demohost cephuser]# ceph osd tree
ID   CLASS WEIGHT   TYPE NAME                          STATUS REWEIGHT PRI-AFF
  -1       90.00000 root default
  ......
  -2       18.00000     mediagroup site1-ssd
 -21        6.00000         media site1-rack1-ssd
-202        2.00000             host demohost-ssd
  19        1.00000                 osd.19                 up  1.00000 1.00000
  18   ssd  1.00000                 osd.18                 up  1.00000 1.00000 #SSD class
-203        2.00000             host demohost-36-ssd
  28        1.00000                 osd.28                 up  1.00000 1.00000
  29        1.00000                 osd.29                 up  1.00000 1.00000
-201        2.00000             host demohost-6-ssd
   8        1.00000                 osd.8                  up  1.00000 1.00000
   9        1.00000                 osd.9                  up  1.00000 1.00000
 -22        6.00000         media site1-rack2-ssd
-205        2.00000             host demohost-40-ssd
  49        1.00000                 osd.49                 up  1.00000 1.00000
  48   ssd  1.00000                 osd.48                 up  1.00000 1.00000 #SSD class
-206        2.00000             host demohost-42-ssd
  58        1.00000                 osd.58                 up  1.00000 1.00000
  59        1.00000                 osd.59                 up  1.00000 1.00000
-204        2.00000             host demohost-8-ssd
  38        1.00000                 osd.38                 up  1.00000 1.00000
  39        1.00000                 osd.39                 up  1.00000 1.00000
 -23        6.00000         media site1-rack3-ssd
-207        2.00000             host demohost-37-ssd
  68        1.00000                 osd.68                 up  1.00000 1.00000
  69        1.00000                 osd.69                 up  1.00000 1.00000
-208        2.00000             host demohost-38-ssd
  78        1.00000                 osd.78                 up  1.00000 1.00000
  79        1.00000                 osd.79                 up  1.00000 1.00000
-209        2.00000             host demohost-39-ssd
  89        1.00000                 osd.89                 up  1.00000 1.00000
  88   ssd  1.00000                 osd.88                 up  1.00000 1.00000 #SSD class


检查class类型,多了一个ssd
[root@demohost cephuser]# ceph osd crush class ls
[
    "ssd"
]

修复过程

于是手工删除掉对应OSD的class,然后重启,但是重启以后ssd的class依旧会重新添加进来

代码语言:javascript
复制
[root@demohost cephuser]# ceph osd crush rm-device-class 48
done removing class of osd(s): 48
[root@demohost cephuser]# ceph osd crush rm-device-class 88
done removing class of osd(s): 88
[root@demohost cephuser]# ceph osd crush rm-device-class 18
done removing class of osd(s): 18

查了一下L版本新增的配置,发现有一个自动打class类型配置,于是在ceph.conf中关闭这个功能。

代码语言:javascript
复制
#ceph.conf
osd_class_update_on_start = false

之后试着重启OSD 18,ssd的class已经不会自动添加,但是发现remapped状态变成了undersized,真是奇葩。

代码语言:javascript
复制
[root@demohost cephuser]# ceph -s
  cluster:
    id:     21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9
    health: HEALTH_WARN
            Degraded data redundancy: 1 pg undersized

  services:
    mon: 3 daemons, quorum demohost-6,demohost-8,demohost-37
    mgr: demohost-8(active), standbys: demohost-37, demohost-6
    osd: 90 osds: 90 up, 90 in
    rgw: 1 daemon active

  data:
    pools:   7 pools, 3712 pgs
    objects: 1.30k objects, 1.68GiB
    usage:   103GiB used, 415TiB / 415TiB avail
    pgs:     3711 active+clean
             1    active+undersized

  io:
    client:   17.0KiB/s rd, 17op/s rd, 0op/s wr

[root@demohost cephuser]# ceph health detail
HEALTH_WARN Degraded data redundancy: 1 pg undersized
PG_DEGRADED Degraded data redundancy: 1 pg undersized
    pg 6.9c is stuck undersized for 206.423798, current state active+undersized, last acting [88,48]

考虑到短时间内重启OSD,并不能触发crushmap的重新计算,于是先停掉了主OSD88服务,想办法让集群主动触发crushmap的重新计算。

代码语言:javascript
复制
先停掉OSD88
[root@demohost-40 supdev]# ceph -s
  cluster:
    id:     21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9
    health: HEALTH_WARN
            1 osds down
            Degraded data redundancy: 10/3897 objects degraded (0.257%), 8 pgs degraded

  services:
    mon: 3 daemons, quorum demohost-6,demohost-8,demohost-37
    mgr: demohost-8(active), standbys: demohost-37, demohost-6
    osd: 90 osds: 89 up, 90 in; 1 remapped pgs
    rgw: 1 daemon active

  data:
    pools:   7 pools, 3712 pgs
    objects: 1.30k objects, 1.68GiB
    usage:   103GiB used, 415TiB / 415TiB avail
    pgs:     10/3897 objects degraded (0.257%)
             3670 active+clean
             33   active+undersized
             8    active+undersized+degraded
             1    active+undersized+remapped

  io:
    client:   24.5KiB/s rd, 24op/s rd, 0op/s wr
    recovery: 0B/s, 2objects/s

将OSD88移出crush,触发重新crushmap的计算
[root@demohost-40 supdev]# ceph osd out 88
marked out osd.88.


[root@demohost-40 supdev]# ceph -s
  cluster:
    id:     21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum demohost-6,demohost-8,demohost-37
    mgr: demohost-8(active), standbys: demohost-37, demohost-6
    osd: 90 osds: 89 up, 89 in; 1 remapped pgs
    rgw: 1 daemon active

  data:
    pools:   7 pools, 3712 pgs
    objects: 1.30k objects, 1.68GiB
    usage:   102GiB used, 414TiB / 414TiB avail
    pgs:     3712 active+clean

  io:
    client:   8.92KiB/s rd, 8op/s rd, 0op/s wr
    recovery: 0B/s, 0keys/s, 0objects/s

之后启动OSD88,将其放回crush中,最终完成PG的异常修复。

代码语言:javascript
复制
[root@demohost-40 supdev]# ceph osd in 88
marked in osd.88.


[root@demohost-40 supdev]# ceph -s
  cluster:
    id:     21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum demohost-6,demohost-8,demohost-37
    mgr: demohost-8(active), standbys: demohost-37, demohost-6
    osd: 90 osds: 90 up, 90 in; 1 remapped pgs #这里仍然有个rempped,又是一个bug
    rgw: 1 daemon active

  data:
    pools:   7 pools, 3712 pgs
    objects: 1.30k objects, 1.68GiB
    usage:   103GiB used, 415TiB / 415TiB avail
    pgs:     3712 active+clean

  io:
    client:   13.0KiB/s rd, 12op/s rd, 0op/s wr

总结

从整个排错过程来看,crush算法在L版本以后引入了自动化根据磁盘类型来生成class标签,之后再按class类型自动化生成rule,这个原本是为了简化crush配置的设置,却在用户自定义crush的场景中埋下了导火索。因此,强烈建议所有需要自定义crush规则的用户,都在ceph.conf中加上osd_class_update_on_start = false,来避免本文发生的悲剧。同时整个PG状态的统计和显示在L版本还存在一些bug,虽然不影响正常使用,但是仍然会给很多人带来困惑,甚至是误导,就如很早以前一个同行说的,对待存储一定要时刻保持敬畏之心,所有的操作一定要慎重,不然分分钟丢掉饭碗。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-09-20,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 Ceph对象存储方案 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 由OSD class配置引发的PG异常状态修复
    • 问题描述
      • 修复过程
        • 总结
        领券
        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档