Luminous下删除和新建OSD的正确姿势

Luminous下删除和新建OSD的正确姿势

如何正确删除OSD

L版本开始极大的降低了对运维操作复杂度,新增了很多命令去确保数据安全,很多新手在删除OSD的时候很容易忽视了集群PGs的状态最终导致数据丢失,因此官方加入以下几个命令

- ceph osd ok-to-stop: Checks whether it looks like PGs will remain 
available even if the specified OSD(s) are stopped.
- ceph osd safe-to-destroy: Checks whether it is safe to destroy an OSD.  
This does various checks to ensure there is no data on the OSD(s), no 
unfound objects, stuck peering, and so forth.

用户在删除OSD之前运行这些命令,通过命令返回的内容,就可以判断删除操作是否能够确保数据安全。

另外在删除OSD的时候,官方也提供了2种类型的操作,一种是使用ceph osd destroy去替换故障磁盘,一种是彻底删除OSD,具体说明如下

- ceph osd destroy: zap info about an OSD but keep it's ID in place (with 
a 'destroyed' flag) so that it can be recreated with a replacement device.
- ceph osd purge: zap everything about an OSD, including the ID

下面用真实案例来告诉大家如何删除一个OSD-0,删除前,运行前面提到的ok-to-stop和safe-to-destroy命令,根据返回的结果来决定是否能够执行删除OSD操作。

[root@demo cephuser]# ceph osd ok-to-stop osd.0
OSD(s) 0 are ok to stop without reducing availability, provided there are no other concurrent failures or interventions. 0 PGs are likely to be degraded (but remain available) as a result.
[root@demo cephuser]# ceph osd safe-to-destroy osd.0
OSD(s) 0 are safe to destroy without reducing data durability.

删除之前最好确认OSD对应的状态和数据信息

[root@demo cephuser]# ceph osd tree #确认OSD的STATUS等信息
ID CLASS WEIGHT TYPE NAME    STATUS REWEIGHT PRI-AFF
-1            0 root default
 0   hdd      0 osd.0            up  1.00000 1.00000

[root@demo cephuser]# ceph osd crush ls osd.0 #确认osd-0的Crush信息
osd.0


[root@demo cephuser]# ceph auth get osd.0 #获取osd-0相关的cephx信息
exported keyring for osd.0
[osd.0]
        key = AQCgFytbZ3J8HxAAFYL5i36b0D3OIoJpnwZ4Uw==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"

如果是替换现有OSD-0的磁盘,则执行destroy操作,你会发现只有OSD-0的keyring被删除。

[root@demo cephuser]# systemctl stop ceph-osd@0
[root@demo cephuser]# ceph osd destroy osd.0 --yes-i-really-mean-it
destroyed osd.0
[root@demo cephuser]# ceph osd tree
ID CLASS WEIGHT TYPE NAME    STATUS    REWEIGHT PRI-AFF
-1            0 root default
 0   hdd      0 osd.0        destroyed  1.00000 1.00000 #状态变为destroyed
[root@demo cephuser]# ceph osd crush ls osd.0
osd.0
[root@demo cephuser]# ceph auth get osd.0
Error ENOENT: failed to find osd.0 in keyring

如果是彻底删除OSD-0,则执行purge操作,你会发现所有OSD-0关联的信息都被删除。

[root@demo cephuser]# systemctl stop ceph-osd@0
[root@demo cephuser]# ceph osd purge osd.0 --yes-i-really-mean-it
purged osd.0
[root@demo cephuser]# ceph osd tree
ID CLASS WEIGHT TYPE NAME    STATUS REWEIGHT PRI-AFF
-1            0 root default
[root@demo cephuser]# ceph osd crush ls osd.0
Error ENOENT: node 'osd.0' does not exist
[root@demo cephuser]# ceph auth get osd.0
Error ENOENT: failed to find osd.0 in keyring

最后如果你需要清理原来OSD-0对应数据盘的数据,可以执行以下命令

[root@demo cephuser]# ceph-volume lvm zap /dev/sdb --destroy
--> Zapping: /dev/sdb
--> Unmounting /var/lib/ceph/osd/ceph-0
Running command: umount -v /var/lib/ceph/osd/ceph-0
 stderr: umount: /var/lib/ceph/osd/ceph-0 (tmpfs) 已卸载
--> Destroying volume group ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa because --destroy was given
Running command: vgremove -v -f ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa
 stderr: Removing ceph--3d2c442b--c663--48d5--bb18--098cb5f307fa-osd--block--7449a599--585a--4caf--8452--9b5facab3df3 (253:2)
 stderr: Archiving volume group "ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa" metadata (seqno 21).
 stderr: Releasing logical volume "osd-block-7449a599-585a-4caf-8452-9b5facab3df3"
 stderr: Creating volume group backup "/etc/lvm/backup/ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa" (seqno 22).
 stdout: Logical volume "osd-block-7449a599-585a-4caf-8452-9b5facab3df3" successfully removed
 stderr: Removing physical volume "/dev/sdb" from volume group "ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa"
 stdout: Volume group "ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa" successfully removed
--> Destroying physical volume /dev/sdb because --destroy was given
Running command: pvremove -v -f /dev/sdb
 stderr: Wiping internal VG cache
    Wiping cache of LVM-capable devices
 stdout: Labels on physical volume "/dev/sdb" successfully wiped.
Running command: wipefs --all /dev/sdb
Running command: dd if=/dev/zero of=/dev/sdb bs=1M count=10
--> Zapping successful for: /dev/sdb

新增OSD的注意事项

在使用ceph-volume的时候如果你需要将db和wal放置到独立的SSD分区上,那么你需要提前手工进行分区(ceph-volume后续会提供自动分区方案,目前需要手工),以建立OSD-1的wal和db为例。

使用sgdisk新建分区,并指定分区的partuuid以及label标签

sgdisk --new=0:0:+100M --change-name=1:osd-1-wal --partition-guid=1:4fbd7e29-9d25-41b8-afd0-062c0ceff051 --mbrtogpt -- /dev/sdc

sgdisk --new=0:0:+100M --change-name=2:osd-1-db --partition-guid=2:4fbd7e29-9d25-41b8-afd0-062c0ceff052 --mbrtogpt -- /dev/sdc

打上label标签的好处就是方便运维,一目了然就知道分区的具体作用。效果如下

[root@demo cephuser]# blkid
/dev/sda1: UUID="a1e4eaa2-fd83-44e9-937e-a1b360c1c707" TYPE="xfs"
/dev/sda2: UUID="UYtArS-hwDi-nNkc-N78D-6EZX-cIXW-PSpbgq" TYPE="LVM2_member"
/dev/mapper/centos-root: UUID="7065f959-f667-4253-8a90-d49afcf19a29" TYPE="xfs"
/dev/mapper/centos-swap: UUID="bbdef107-d9f9-4470-860e-9d2f55821c4c" TYPE="swap"
/dev/sdc1: PARTLABEL="osd-1-wal" PARTUUID="4fbd7e29-9d25-41b8-afd0-062c0ceff051"
/dev/sdc2: PARTLABEL="osd-1-db" PARTUUID="4fbd7e29-9d25-41b8-afd0-062c0ceff052"

在使用ceph-volume初始化osd的时候,最好使用partuuid去进行初始化,这样在LV的tag字段ceph.wal_device和ceph.db_device会按你的partuuid进行设置

[root@demo cephuser]# ceph-volume lvm create --bluestore --data /dev/sdb --block.db /dev/disk/by-partuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff052 --block.wal /dev/disk/by-partuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff051


[root@demo cephuser]#  /usr/sbin/lvs --noheadings --readonly --separator="   " -o lv_tags,lv_path,lv_name,vg_name,lv_uuid
     /dev/centos/root   root   centos   fq7hUc-BtCA-AeIk-VvBD-Mxad-7uix-ZtK4I6
     /dev/centos/swap   swap   centos   r7oGD8-Tf15-v4ir-eOR3-LeDg-qqrf-gblNWI
  ceph.block_device=/dev/ceph-e6f9ba96-7323-4a2f-b854-f343d088eb8d/osd-block-9fe04363-fb0b-498d-9354-274513ef7407,ceph.block_uuid=WBQohD-RX3l-iIac-z5a4-2v74-toc7-agsQv5,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.db_device=/dev/disk/by-partuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff052,ceph.db_uuid=4fbd7e29-9d25-41b8-afd0-062c0ceff052,ceph.encrypted=0,ceph.osd_fsid=9fe04363-fb0b-498d-9354-274513ef7407,ceph.osd_id=0,ceph.type=block,ceph.vdo=0,ceph.wal_device=/dev/disk/by-partuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff051,ceph.wal_uuid=4fbd7e29-9d25-41b8-afd0-062c0ceff051   /dev/ceph-e6f9ba96-7323-4a2f-b854-f343d088eb8d/osd-block-9fe04363-fb0b-498d-9354-274513ef7407   osd-block-9fe04363-fb0b-498d-9354-274513ef7407   ceph-e6f9ba96-7323-4a2f-b854-f343d088eb8d   WBQohD-RX3l-iIac-z5a4-2v74-toc7-agsQv5

如果你使用/dev/sdc1一类名称进行初始化,那么如果发生重启,则可能会出现对应的ceph.wal_device和ceph.db_device发生变化,虽然现在OSD在挂载的时候会按uuid来查找对应的设备(使用get_osd_device_path方法,具体见下面代码),但是为了确保tag数据的一致性,还是推荐使用partuuid方式。

[root@demo cephuser]# ceph-volume lvm create --bluestore --data /dev/sdb --block.db /dev/sdc1 --block.wal /dev/sdc2
[root@demo cephuser]# /usr/sbin/lvs --noheadings --readonly --separator="   " -o lv_tags,lv_path,lv_name,vg_name,lv_uuid
     /dev/centos/root   root   centos   fq7hUc-BtCA-AeIk-VvBD-Mxad-7uix-ZtK4I6
     /dev/centos/swap   swap   centos   r7oGD8-Tf15-v4ir-eOR3-LeDg-qqrf-gblNWI
  ceph.block_device=/dev/ceph-2850c361-dc1c-434b-9689-f73798bb514e/osd-block-471be0c0-905b-4836-a5a1-dc8d5a3845d3,ceph.block_uuid=Eidmyr-axIE-2E3F-5Jj5-N4tN-TbiH-l0jrKm,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.db_device=/dev/sdc1,ceph.db_uuid=d0fe82cf-34cb-4b95-b70c-52f37a86b333,ceph.encrypted=0,ceph.osd_fsid=471be0c0-905b-4836-a5a1-dc8d5a3845d3,ceph.osd_id=0,ceph.type=block,ceph.vdo=0,ceph.wal_device=/dev/sdc2,ceph.wal_uuid=d0577828-e8df-48f0-888a-3a8ee183322c   /dev/ceph-2850c361-dc1c-434b-9689-f73798bb514e/osd-block-471be0c0-905b-4836-a5a1-dc8d5a3845d3   osd-block-471be0c0-905b-4836-a5a1-dc8d5a3845d3   ceph-2850c361-dc1c-434b-9689-f73798bb514e   Eidmyr-axIE-2E3F-5Jj5-N4tN-TbiH-l0jrKm

uuid获取db和wal路径的源码示例如下

#ceph-12.2.5/src/ceph-volume/ceph_volume/devices/lvm/activate.py
#激活osd的时候调用下面的方法
def activate_bluestore(lvs, no_systemd=False):
    .....
    db_device_path = get_osd_device_path(osd_lv, lvs, 'db', dmcrypt_secret=dmcrypt_secret)
    wal_device_path = get_osd_device_path(osd_lv, lvs, 'wal', dmcrypt_secret=dmcrypt_secret)

#获取最终存储设备路径  
def get_osd_device_path(osd_lv, lvs, device_type, dmcrypt_secret=None):
    """
    ``device_type`` can be one of ``db``, ``wal`` or ``block`` so that
    we can query ``lvs`` (a ``Volumes`` object) and fallback to querying the uuid
    if that is not present.

    Return a path if possible, failing to do that a ``None``, since some of these devices
    are optional
    """
    osd_lv = lvs.get(lv_tags={'ceph.type': 'block'})
    is_encrypted = osd_lv.tags.get('ceph.encrypted', '0') == '1'
    logger.debug('Found block device (%s) with encryption: %s', osd_lv.name, is_encrypted)
    uuid_tag = 'ceph.%s_uuid' % device_type
    device_uuid = osd_lv.tags.get(uuid_tag) #根据lv tag的ceph.db_uuid或者ceph.wal_uuid字段来获取最终的设备路径
    if not device_uuid:
        return None

    device_lv = lvs.get(lv_uuid=device_uuid)
    if device_lv:
        if is_encrypted:
            encryption_utils.luks_open(dmcrypt_secret, device_lv.lv_path, device_uuid)
            return '/dev/mapper/%s' % device_uuid
        return device_lv.lv_path
    else:
        # this could be a regular device, so query it with blkid
        physical_device = disk.get_device_from_partuuid(device_uuid)
        if physical_device and is_encrypted:
            encryption_utils.luks_open(dmcrypt_secret, physical_device, device_uuid)
            return '/dev/mapper/%s' % device_uuid
        return physical_device or None
    return None    

原文发布于微信公众号 - Ceph对象存储方案(cephbook)

原文发表时间:2018-06-22

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏分布式系统进阶

KafkaController分析7-启动流程

18310
来自专栏用户2442861的专栏

Python socket聊天室程序

http://blog.csdn.net/calling_wisdom/article/details/42524745

35910
来自专栏Hellovass 的博客

社交化分享组件踩坑

问题是这样的,项目里的社交化分享是基于 UMShare 封装成的一个 ShareLib module,为了让这个 module 对调用者说更透明,我将 WXEn...

35350
来自专栏cloudskyme

网页中显示xml,直接显示xml格式的文件

第一种方法 使用<pre></pre>包围代码(在浏览器中测试不行啊,但是在富编辑器中又可以,怪); 使用<xmp></xmp>包围代码(官方不推荐,但是效果不...

50670
来自专栏腾讯IVWEB团队的专栏

React Native 与 OC 之间通信那些事儿

React Native 用 IOS 自带的 JavaScriptCore 作为 JS 的解析引擎,普通的 JS-OC 通信就是 React Native 在...

51400
来自专栏行者悟空

利用zookeeper实现服务上线(离线)自动感知

32420
来自专栏java达人

Spring声明式事务的一个注意点及原理简析

以前我们说过,Spring通过ThreadLocal机制解除了事务管理模块与数据访问层的紧密耦合,提高了模块的可重用性,也保证了多线程环境下的对connecti...

22660
来自专栏散尽浮华

Linux系统下的用户密码设定梳理

随着linux使用的普遍,对于linux用户以及系统的安全要求越来越高,而用户密码复杂程度是系统安全性高低的首要体现。因此如何对linux下用户的密码进行规则限...

32990
来自专栏JackeyGao的博客

SLB和django runserver结合报错问题

SLB 检测流量会使服务器报[Errno 104] Connection reset by peer

13510
来自专栏阿杜的世界

Spring的容器内部事件发布自定义事件机制Spring 的容器内事件发布类结构应用场景

EventListener接口的作用仅仅在于“标记”,具体要提供哪些功能需要开发者自己定义,而且,还需要为自己定义的接口提供一个默认的实现类——只有接口的话什么...

10620

扫码关注云+社区

领取腾讯云代金券