L版本开始极大的降低了对运维操作复杂度,新增了很多命令去确保数据安全,很多新手在删除OSD的时候很容易忽视了集群PGs的状态最终导致数据丢失,因此官方加入以下几个命令
- ceph osd ok-to-stop: Checks whether it looks like PGs will remain
available even if the specified OSD(s) are stopped.
- ceph osd safe-to-destroy: Checks whether it is safe to destroy an OSD.
This does various checks to ensure there is no data on the OSD(s), no
unfound objects, stuck peering, and so forth.用户在删除OSD之前运行这些命令,通过命令返回的内容,就可以判断删除操作是否能够确保数据安全。
另外在删除OSD的时候,官方也提供了2种类型的操作,一种是使用ceph osd destroy去替换故障磁盘,一种是彻底删除OSD,具体说明如下
- ceph osd destroy: zap info about an OSD but keep it's ID in place (with
a 'destroyed' flag) so that it can be recreated with a replacement device.
- ceph osd purge: zap everything about an OSD, including the ID下面用真实案例来告诉大家如何删除一个OSD-0,删除前,运行前面提到的ok-to-stop和safe-to-destroy命令,根据返回的结果来决定是否能够执行删除OSD操作。
[root@demo cephuser]# ceph osd ok-to-stop osd.0
OSD(s) 0 are ok to stop without reducing availability, provided there are no other concurrent failures or interventions. 0 PGs are likely to be degraded (but remain available) as a result.
[root@demo cephuser]# ceph osd safe-to-destroy osd.0
OSD(s) 0 are safe to destroy without reducing data durability.删除之前最好确认OSD对应的状态和数据信息
[root@demo cephuser]# ceph osd tree #确认OSD的STATUS等信息
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0 root default
0 hdd 0 osd.0 up 1.00000 1.00000
[root@demo cephuser]# ceph osd crush ls osd.0 #确认osd-0的Crush信息
osd.0
[root@demo cephuser]# ceph auth get osd.0 #获取osd-0相关的cephx信息
exported keyring for osd.0
[osd.0]
key = AQCgFytbZ3J8HxAAFYL5i36b0D3OIoJpnwZ4Uw==
caps mgr = "allow profile osd"
caps mon = "allow profile osd"
caps osd = "allow *"如果是替换现有OSD-0的磁盘,则执行destroy操作,你会发现只有OSD-0的keyring被删除。
[root@demo cephuser]# systemctl stop ceph-osd@0
[root@demo cephuser]# ceph osd destroy osd.0 --yes-i-really-mean-it
destroyed osd.0
[root@demo cephuser]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0 root default
0 hdd 0 osd.0 destroyed 1.00000 1.00000 #状态变为destroyed
[root@demo cephuser]# ceph osd crush ls osd.0
osd.0
[root@demo cephuser]# ceph auth get osd.0
Error ENOENT: failed to find osd.0 in keyring如果是彻底删除OSD-0,则执行purge操作,你会发现所有OSD-0关联的信息都被删除。
[root@demo cephuser]# systemctl stop ceph-osd@0
[root@demo cephuser]# ceph osd purge osd.0 --yes-i-really-mean-it
purged osd.0
[root@demo cephuser]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0 root default
[root@demo cephuser]# ceph osd crush ls osd.0
Error ENOENT: node 'osd.0' does not exist
[root@demo cephuser]# ceph auth get osd.0
Error ENOENT: failed to find osd.0 in keyring最后如果你需要清理原来OSD-0对应数据盘的数据,可以执行以下命令
[root@demo cephuser]# ceph-volume lvm zap /dev/sdb --destroy
--> Zapping: /dev/sdb
--> Unmounting /var/lib/ceph/osd/ceph-0
Running command: umount -v /var/lib/ceph/osd/ceph-0
stderr: umount: /var/lib/ceph/osd/ceph-0 (tmpfs) 已卸载
--> Destroying volume group ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa because --destroy was given
Running command: vgremove -v -f ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa
stderr: Removing ceph--3d2c442b--c663--48d5--bb18--098cb5f307fa-osd--block--7449a599--585a--4caf--8452--9b5facab3df3 (253:2)
stderr: Archiving volume group "ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa" metadata (seqno 21).
stderr: Releasing logical volume "osd-block-7449a599-585a-4caf-8452-9b5facab3df3"
stderr: Creating volume group backup "/etc/lvm/backup/ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa" (seqno 22).
stdout: Logical volume "osd-block-7449a599-585a-4caf-8452-9b5facab3df3" successfully removed
stderr: Removing physical volume "/dev/sdb" from volume group "ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa"
stdout: Volume group "ceph-3d2c442b-c663-48d5-bb18-098cb5f307fa" successfully removed
--> Destroying physical volume /dev/sdb because --destroy was given
Running command: pvremove -v -f /dev/sdb
stderr: Wiping internal VG cache
Wiping cache of LVM-capable devices
stdout: Labels on physical volume "/dev/sdb" successfully wiped.
Running command: wipefs --all /dev/sdb
Running command: dd if=/dev/zero of=/dev/sdb bs=1M count=10
--> Zapping successful for: /dev/sdb在使用ceph-volume的时候如果你需要将db和wal放置到独立的SSD分区上,那么你需要提前手工进行分区(ceph-volume后续会提供自动分区方案,目前需要手工),以建立OSD-1的wal和db为例。
使用sgdisk新建分区,并指定分区的partuuid以及label标签
sgdisk --new=0:0:+100M --change-name=1:osd-1-wal --partition-guid=1:4fbd7e29-9d25-41b8-afd0-062c0ceff051 --mbrtogpt -- /dev/sdc
sgdisk --new=0:0:+100M --change-name=2:osd-1-db --partition-guid=2:4fbd7e29-9d25-41b8-afd0-062c0ceff052 --mbrtogpt -- /dev/sdc打上label标签的好处就是方便运维,一目了然就知道分区的具体作用。效果如下
[root@demo cephuser]# blkid
/dev/sda1: UUID="a1e4eaa2-fd83-44e9-937e-a1b360c1c707" TYPE="xfs"
/dev/sda2: UUID="UYtArS-hwDi-nNkc-N78D-6EZX-cIXW-PSpbgq" TYPE="LVM2_member"
/dev/mapper/centos-root: UUID="7065f959-f667-4253-8a90-d49afcf19a29" TYPE="xfs"
/dev/mapper/centos-swap: UUID="bbdef107-d9f9-4470-860e-9d2f55821c4c" TYPE="swap"
/dev/sdc1: PARTLABEL="osd-1-wal" PARTUUID="4fbd7e29-9d25-41b8-afd0-062c0ceff051"
/dev/sdc2: PARTLABEL="osd-1-db" PARTUUID="4fbd7e29-9d25-41b8-afd0-062c0ceff052"在使用ceph-volume初始化osd的时候,最好使用partuuid去进行初始化,这样在LV的tag字段ceph.wal_device和ceph.db_device会按你的partuuid进行设置
[root@demo cephuser]# ceph-volume lvm create --bluestore --data /dev/sdb --block.db /dev/disk/by-partuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff052 --block.wal /dev/disk/by-partuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff051
[root@demo cephuser]# /usr/sbin/lvs --noheadings --readonly --separator=" " -o lv_tags,lv_path,lv_name,vg_name,lv_uuid
/dev/centos/root root centos fq7hUc-BtCA-AeIk-VvBD-Mxad-7uix-ZtK4I6
/dev/centos/swap swap centos r7oGD8-Tf15-v4ir-eOR3-LeDg-qqrf-gblNWI
ceph.block_device=/dev/ceph-e6f9ba96-7323-4a2f-b854-f343d088eb8d/osd-block-9fe04363-fb0b-498d-9354-274513ef7407,ceph.block_uuid=WBQohD-RX3l-iIac-z5a4-2v74-toc7-agsQv5,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.db_device=/dev/disk/by-partuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff052,ceph.db_uuid=4fbd7e29-9d25-41b8-afd0-062c0ceff052,ceph.encrypted=0,ceph.osd_fsid=9fe04363-fb0b-498d-9354-274513ef7407,ceph.osd_id=0,ceph.type=block,ceph.vdo=0,ceph.wal_device=/dev/disk/by-partuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff051,ceph.wal_uuid=4fbd7e29-9d25-41b8-afd0-062c0ceff051 /dev/ceph-e6f9ba96-7323-4a2f-b854-f343d088eb8d/osd-block-9fe04363-fb0b-498d-9354-274513ef7407 osd-block-9fe04363-fb0b-498d-9354-274513ef7407 ceph-e6f9ba96-7323-4a2f-b854-f343d088eb8d WBQohD-RX3l-iIac-z5a4-2v74-toc7-agsQv5如果你使用/dev/sdc1一类名称进行初始化,那么如果发生重启,则可能会出现对应的ceph.wal_device和ceph.db_device发生变化,虽然现在OSD在挂载的时候会按uuid来查找对应的设备(使用get_osd_device_path方法,具体见下面代码),但是为了确保tag数据的一致性,还是推荐使用partuuid方式。
[root@demo cephuser]# ceph-volume lvm create --bluestore --data /dev/sdb --block.db /dev/sdc1 --block.wal /dev/sdc2
[root@demo cephuser]# /usr/sbin/lvs --noheadings --readonly --separator=" " -o lv_tags,lv_path,lv_name,vg_name,lv_uuid
/dev/centos/root root centos fq7hUc-BtCA-AeIk-VvBD-Mxad-7uix-ZtK4I6
/dev/centos/swap swap centos r7oGD8-Tf15-v4ir-eOR3-LeDg-qqrf-gblNWI
ceph.block_device=/dev/ceph-2850c361-dc1c-434b-9689-f73798bb514e/osd-block-471be0c0-905b-4836-a5a1-dc8d5a3845d3,ceph.block_uuid=Eidmyr-axIE-2E3F-5Jj5-N4tN-TbiH-l0jrKm,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=21cc0dcd-06f3-4d5d-82c2-dbd411ef0ed9,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.db_device=/dev/sdc1,ceph.db_uuid=d0fe82cf-34cb-4b95-b70c-52f37a86b333,ceph.encrypted=0,ceph.osd_fsid=471be0c0-905b-4836-a5a1-dc8d5a3845d3,ceph.osd_id=0,ceph.type=block,ceph.vdo=0,ceph.wal_device=/dev/sdc2,ceph.wal_uuid=d0577828-e8df-48f0-888a-3a8ee183322c /dev/ceph-2850c361-dc1c-434b-9689-f73798bb514e/osd-block-471be0c0-905b-4836-a5a1-dc8d5a3845d3 osd-block-471be0c0-905b-4836-a5a1-dc8d5a3845d3 ceph-2850c361-dc1c-434b-9689-f73798bb514e Eidmyr-axIE-2E3F-5Jj5-N4tN-TbiH-l0jrKmuuid获取db和wal路径的源码示例如下
#ceph-12.2.5/src/ceph-volume/ceph_volume/devices/lvm/activate.py
#激活osd的时候调用下面的方法
def activate_bluestore(lvs, no_systemd=False):
.....
db_device_path = get_osd_device_path(osd_lv, lvs, 'db', dmcrypt_secret=dmcrypt_secret)
wal_device_path = get_osd_device_path(osd_lv, lvs, 'wal', dmcrypt_secret=dmcrypt_secret)
#获取最终存储设备路径
def get_osd_device_path(osd_lv, lvs, device_type, dmcrypt_secret=None):
"""
``device_type`` can be one of ``db``, ``wal`` or ``block`` so that
we can query ``lvs`` (a ``Volumes`` object) and fallback to querying the uuid
if that is not present.
Return a path if possible, failing to do that a ``None``, since some of these devices
are optional
"""
osd_lv = lvs.get(lv_tags={'ceph.type': 'block'})
is_encrypted = osd_lv.tags.get('ceph.encrypted', '0') == '1'
logger.debug('Found block device (%s) with encryption: %s', osd_lv.name, is_encrypted)
uuid_tag = 'ceph.%s_uuid' % device_type
device_uuid = osd_lv.tags.get(uuid_tag) #根据lv tag的ceph.db_uuid或者ceph.wal_uuid字段来获取最终的设备路径
if not device_uuid:
return None
device_lv = lvs.get(lv_uuid=device_uuid)
if device_lv:
if is_encrypted:
encryption_utils.luks_open(dmcrypt_secret, device_lv.lv_path, device_uuid)
return '/dev/mapper/%s' % device_uuid
return device_lv.lv_path
else:
# this could be a regular device, so query it with blkid
physical_device = disk.get_device_from_partuuid(device_uuid)
if physical_device and is_encrypted:
encryption_utils.luks_open(dmcrypt_secret, physical_device, device_uuid)
return '/dev/mapper/%s' % device_uuid
return physical_device or None
return None