前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Bluestore下的OSD自启动修复

Bluestore下的OSD自启动修复

作者头像
用户1260683
发布2019-05-09 14:13:07
2.4K0
发布2019-05-09 14:13:07
举报

问题描述

集群误操作,停掉了所有OSD服务,同时关闭了自启动,尝试”systemctl start ceph-osd@10“发现日志出现下面的报错

2019-03-05 15:14:11.675359 7f8a4450dd80  0 set uid:gid to 167:167 (ceph:ceph)
2019-03-05 15:14:11.675379 7f8a4450dd80  0 ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable), process ceph-osd, pid 757348
2019-03-05 15:14:11.675627 7f8a4450dd80 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-10: (2) No such file or directory
2019-03-05 15:14:31.921023 7f463703ad80  0 set uid:gid to 167:167 (ceph:ceph)
2019-03-05 15:14:31.921042 7f463703ad80  0 ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable), process ceph-osd, pid 757360
2019-03-05 15:14:31.921286 7f463703ad80 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-10: (2) No such file or directory

问题排查

检查mount列表,果然对应的目录都没有mount

[root@host supdev]# mount -l
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=132002648k,nr_inodes=33000662,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/sda3 on / type ext4 (rw,relatime,stripe=64,data=ordered)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=36,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=48183)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
/dev/sda5 on /export type ext4 (rw,relatime,stripe=64,data=ordered)
/dev/sda4 on /var type xfs (rw,relatime,attr2,inode64,sunit=512,swidth=512,noquota)
tmpfs on /run/user/511 type tmpfs (rw,nosuid,nodev,relatime,size=26402464k,mode=700,uid=511,gid=1001)

同时检查systemctl启动设置发现少了mount部分

[root@host supdev]# systemctl list-units |grep ceph
● ceph-disk@sdb.service                                                                                                               loaded failed failed    Ceph disk activation: /sdb
● ceph-osd@10.service                                                                                                                 loaded failed failed    Ceph object storage daemon osd.10
● ceph-osd@11.service                                                                                                                 loaded failed failed    Ceph object storage daemon osd.11
● ceph-osd@12.service                                                                                                                 loaded failed failed    Ceph object storage daemon osd.12
● ceph-osd@13.service                                                                                                                 loaded failed failed    Ceph object storage daemon osd.13
● ceph-osd@14.service                                                                                                                 loaded failed failed    Ceph object storage daemon osd.14
● ceph-osd@15.service                                                                                                                 loaded failed failed    Ceph object storage daemon osd.15
● ceph-osd@16.service                                                                                                                 loaded failed failed    Ceph object storage daemon osd.16
● ceph-osd@17.service                                                                                                                 loaded failed failed    Ceph object storage daemon osd.17
● ceph-osd@18.service                                                                                                                 loaded failed failed    Ceph object storage daemon osd.18
● ceph-osd@19.service                                                                                                                 loaded failed failed    Ceph object storage daemon osd.19
● ceph-volume@-dev-mapper-ceph--0e4c2c68--1f88--4cdd--9f09--7b430cb999f7-osd--block--ef191e4a--f54c--5556--b83a--dadd8a415048.service loaded failed failed    Ceph Volume activation: -dev-mapper-ceph--0e4c2c68--1f88--4cdd--9f09--7b430cb999f7-osd--block--ef191e4a--f54c--5556--b83a--dadd8a415048
...

如果是正常启动,会有下面的”var-lib-ceph-osd-ceph\x2d10.mount“服务,所以接下来就要手工修复mount操作

[root@host supdev]# systemctl list-units |grep ceph
  var-lib-ceph-osd-ceph\x2d10.mount                                                                                                   loaded active mounted   /var/lib/ceph/osd/ceph-10
  var-lib-ceph-osd-ceph\x2d11.mount                                                                                                   loaded active mounted   /var/lib/ceph/osd/ceph-11
  var-lib-ceph-osd-ceph\x2d12.mount                                                                                                   loaded active mounted   /var/lib/ceph/osd/ceph-12
  var-lib-ceph-osd-ceph\x2d13.mount                                                                                                   loaded active mounted   /var/lib/ceph/osd/ceph-13
  var-lib-ceph-osd-ceph\x2d14.mount                                                                                                   loaded active mounted   /var/lib/ceph/osd/ceph-14
  var-lib-ceph-osd-ceph\x2d15.mount                                                                                                   loaded active mounted   /var/lib/ceph/osd/ceph-15
  var-lib-ceph-osd-ceph\x2d16.mount                                                                                                   loaded active mounted   /var/lib/ceph/osd/ceph-16
  var-lib-ceph-osd-ceph\x2d17.mount                                                                                                   loaded active mounted   /var/lib/ceph/osd/ceph-17
  var-lib-ceph-osd-ceph\x2d18.mount                                                                                                   loaded active mounted   /var/lib/ceph/osd/ceph-18
  var-lib-ceph-osd-ceph\x2d19.mount                                                                                                   loaded active mounted   /var/lib/ceph/osd/ceph-19
...                                                                          loaded active active    ceph target allowing to start/stop all ceph*@.service instances at once

修复自启动

Bluestore使用的是ceph-volume进行初始化,所以还是借助这个工具一条命令搞定。

[root@host supdev]# ceph-volume lvm activate --bluestore --all
--> Activating OSD ID 19 FSID 68daf6d4-41ab-5d3c-a388-dfec7a9125a3
Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-19
Running command: restorecon /var/lib/ceph/osd/ceph-19
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-19
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-a45209e0-98d8-4e00-9926-fa7abce66437/osd-block-68daf6d4-41ab-5d3c-a388-dfec7a9125a3 --path /var/lib/ceph/osd/ceph-19
Running command: ln -snf /dev/ceph-a45209e0-98d8-4e00-9926-fa7abce66437/osd-block-68daf6d4-41ab-5d3c-a388-dfec7a9125a3 /var/lib/ceph/osd/ceph-19/block
Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-19/block
Running command: chown -R ceph:ceph /dev/dm-9
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-19
Running command: systemctl enable ceph-volume@lvm-19-68daf6d4-41ab-5d3c-a388-dfec7a9125a3
Running command: systemctl enable --runtime ceph-osd@19
Running command: systemctl start ceph-osd@19
--> ceph-volume lvm activate successful for osd ID: 19
...
--> Activating OSD ID 13 FSID 0dc8e233-d553-5da8-91f9-06cd92c72c99
Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-13
Running command: restorecon /var/lib/ceph/osd/ceph-13
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-13
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-950dde27-1e41-4b45-a3c8-15e98d1e487d/osd-block-0dc8e233-d553-5da8-91f9-06cd92c72c99 --path /var/lib/ceph/osd/ceph-13
Running command: ln -snf /dev/ceph-950dde27-1e41-4b45-a3c8-15e98d1e487d/osd-block-0dc8e233-d553-5da8-91f9-06cd92c72c99 /var/lib/ceph/osd/ceph-13/block
Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-13/block
Running command: chown -R ceph:ceph /dev/dm-3
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-13
Running command: systemctl enable ceph-volume@lvm-13-0dc8e233-d553-5da8-91f9-06cd92c72c99
Running command: systemctl enable --runtime ceph-osd@13
Running command: systemctl start ceph-osd@13
--> ceph-volume lvm activate successful for osd ID: 13

当然你也可以按指定的OSD ID进行启动

[root@host supdev]# ceph-volume lvm activate --bluestore 10 b93aa213-4485-58c1-8ed1-881e60bee76f
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-10
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-2b712814-5cce-4128-ac2a-2fa50c9c29ba/osd-block-b93aa213-4485-58c1-8ed1-881e60bee76f --path /var/lib/ceph/osd/ceph-10
Running command: ln -snf /dev/ceph-2b712814-5cce-4128-ac2a-2fa50c9c29ba/osd-block-b93aa213-4485-58c1-8ed1-881e60bee76f /var/lib/ceph/osd/ceph-10/block
Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-10/block
Running command: chown -R ceph:ceph /dev/dm-0
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-10
Running command: systemctl enable ceph-volume@lvm-10-b93aa213-4485-58c1-8ed1-881e60bee76f
Running command: systemctl enable --runtime ceph-osd@10
Running command: systemctl start ceph-osd@10
--> ceph-volume lvm activate successful for osd ID: 10

最后检查一下mount状态

[root@host supdev]# mount -l
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=132002648k,nr_inodes=33000662,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
...
tmpfs on /var/lib/ceph/osd/ceph-19 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-15 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-17 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-10 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-18 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-11 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-14 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-16 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-12 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-13 type tmpfs (rw,relatime)

服务进程也起来了

[root@host supdev]# ps axu|grep osd
ceph      757718  1.0  0.0 836748 104348 ?       Ssl  15:16   0:11 /usr/bin/ceph-osd -f --cluster ceph --id 19 --setuser ceph --setgroup ceph
ceph      757788  1.0  0.0 836748 94308 ?        Ssl  15:16   0:11 /usr/bin/ceph-osd -f --cluster ceph --id 15 --setuser ceph --setgroup ceph
ceph      757868  1.0  0.0 837772 92236 ?        Ssl  15:16   0:11 /usr/bin/ceph-osd -f --cluster ceph --id 17 --setuser ceph --setgroup ceph
ceph      758134  1.1  0.0 926904 86688 ?        Ssl  15:16   0:12 /usr/bin/ceph-osd -f --cluster ceph --id 10 --setuser ceph --setgroup ceph
ceph      758214  0.9  0.0 805004 75660 ?        Ssl  15:16   0:10 /usr/bin/ceph-osd -f --cluster ceph --id 18 --setuser ceph --setgroup ceph
ceph      758308  1.0  0.0 925888 84972 ?        Ssl  15:16   0:10 /usr/bin/ceph-osd -f --cluster ceph --id 11 --setuser ceph --setgroup ceph
ceph      758387  1.1  0.0 836744 88400 ?        Ssl  15:16   0:12 /usr/bin/ceph-osd -f --cluster ceph --id 14 --setuser ceph --setgroup ceph
ceph      758493  1.0  0.0 809104 78316 ?        Ssl  15:16   0:11 /usr/bin/ceph-osd -f --cluster ceph --id 16 --setuser ceph --setgroup ceph
ceph      758655  1.0  0.0 836756 91524 ?        Ssl  15:16   0:11 /usr/bin/ceph-osd -f --cluster ceph --id 12 --setuser ceph --setgroup ceph
ceph      758782  1.0  0.0 801968 75580 ?        Ssl  15:16   0:10 /usr/bin/ceph-osd -f --cluster ceph --id 13 --setuser ceph --setgroup ceph
root      760147  0.0  0.0 112708   972 pts/0    S+   15:34   0:00 grep --color=auto osd

总结

ceph-volume启动OSD也是使用的之前lvm的tag特性,只要没有丢失tag信息,修复自启动还是非常简单的。具体可以查看之前的Bluestore下的OSD启动分析文章,这里就不再重复了。


本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2019-03-05,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 Ceph对象存储方案 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 问题描述
  • 问题排查
  • 修复自启动
  • 总结
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档