本文主要讲述基于Kerberos环境下的CDH5.13.1版本安装CDSW1.3.0数据磁盘初始化异常问题分析及解决办法。
1.操作系统版本为Redhat7.2
2.CM和CDH版本为5.13.1
3.CDH集群已启用Kerberos
4.CDSW版本为1.3.0
Cloudera Data Science Workbench(CDSW)是为企业提供快速、简单、安全的自助数据科学的产品。它允许数据科学家将他们现有的技能和工具(如R,Python和Scala)安全地运行在Hadoop集群中的数据上。它是一个协作的,可扩展和可延伸的数据探索、分析、建模和可视化平台。CDSW使数据科学家能够管理自己的分析管道,从而加速机器学习项目从探索到生产环境。
CDSW的优点包括:
1.将数据科学带入Hadoop
2.自助服务的协作平台
3.企业级技术
4.可在集群内部部署或在云端部署,支持主从结构,支持多租户资源管理
在启用Kerberos的CDH集群添加Gateway节点后,安装CDSW服务,出现如下异常:
ERROR:: Error in pvcreate for [/dev/sdb]: 5
ERROR:: Unable to setup docker storage.: 5
ERROR:: Unable to create storage for docker.: 5
Exit code: 5
上述红色字体的异常信息大致讲的是:在执行pvcreate /dev/sdb时出现错误,导致无法为Docker创建存储空间。
1.执行"pvremove -f -f /dev/sdb"命令,强制移除/dev/sdb,由于我这块盘上没有数据,所以直接强制移除,如果有数据的话,建议先把数据备份。
然而强制移除似乎无效,从命令反馈结果来看,这块数据盘对应的卷组是“vg_data”,后面还有提示:Mounted filesystem?
CDSW是基于Docker+Kubernetes的,其数据盘用裸盘就可以,安装部署的时候会自动格式化、挂载,并不需要人为将磁盘进行格式化、挂载,当时看到这个有点懵,谁把盘挂上去了?查看文件系统信息,这块盘被挂载到了/data目录,难怪会抛异常。
2.查找物理卷/dev/sdb对应的卷组信息
[root@cdsw ~]# vgscan
Reading all physical volumes. This may take a while...
Found volume group "rhel" using metadata type lvm2
Found volume group "vg_data" using metadata type lvm2
[root@cdsw ~]# vgdisplay -v vg_data
Using volume group(s) on command line.
--- Volume group ---
VG Name vg_data
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 2
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 1
Max PV 0
Cur PV 1
Act PV 1
VG Size 1000.00 GiB
PE Size 4.00 MiB
Total PE 255999
Alloc PE / Size 255999 / 1000.00 GiB
Free PE / Size 0 / 0
VG UUID CWa9f4-gK0d-zQzv-uzFV-rwre-lLhl-vQq3wn
--- Logical volume ---
LV Path /dev/vg_data/lv_data
LV Name lv_data
VG Name vg_data
LV UUID Ye4hx9-32oH-1uTd-ffNn-snsu-tIQk-3Ln81d
LV Write Access read/write
LV Creation host, time cdsw.localdomain, 2018-04-23 17:17:03 +0000
LV Status available
# open 1
LV Size 1000.00 GiB
Current LE 255999
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:2
--- Physical volumes ---
PV Name /dev/sdb
PV UUID oyhxyk-cJC9-CZ3Y-KI3J-BJ7Y-jMtX-ltdYqn
PV Status allocatable
Total PE / Free PE 255999 / 0
记下LV Path,后续卸载操作需要使用。
1.卸载/dev/sdb设备,并删除逻辑卷,数据盘上如果有数据,请先备份。
[root@cdsw ~]# umount /dev/vg_data/lv_data
[root@cdsw ~]#
[root@cdsw ~]# lvremove /dev/vg_data/lv_data
Do you really want to remove active logical volume lv_data? [y/n]: y
Logical volume "lv_data" successfully removed
[root@cdsw ~]# lvdisplay | grep /dev/vg_data/lv_data
2.删除卷组“vg_data”,删除物理卷“/dev/sdb”
[root@cdsw ~]# vgremove vg_data /dev/sdb
Volume group "vg_data" successfully removed
Volume group "sdb" not found
Cannot process volume group sdb
[root@cdsw ~]# vgremove vg_data
Volume group "vg_data" not found
Cannot process volume group vg_data
[root@cdsw ~]# vgscan
Reading all physical volumes. This may take a while...
Found volume group "rhel" using metadata type lvm2
[root@cdsw ~]# pvremove /dev/sdb
Labels on physical volume "/dev/sdb" successfully wiped
3.编辑/etc/fstab,删除对应挂载信息。这一步如果不做,系统重启过程中会报错。
[root@cdsw ~]# vi /etc/fstab
[root@cdsw ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 314G 27G 288G 9% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 8.6M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda1 497M 124M 373M 25% /boot
tmpfs 6.3G 0 6.3G 0% /run/user/0
cm_processes 32G 1.7M 32G 1% /run/cloudera-scm-agent/process
1.将数据盘还原为裸盘后,继续安装,仍然安装失败,异常如下:
2.删除/run目录下保存的安装CDSW的记录
[root@cdsw ~]# ls -l /run/cloudera-scm-agent/process/
total 0
drwxr-x--x 3 root root 200 Jan 11 21:21 959-cluster-host-inspector
drwxr-x--x 3 root root 200 Jan 11 21:23 964-cluster-host-inspector
drwxr-x--x 3 root root 180 Jan 11 21:23 969-cluster-host-inspector
drwxr-x--x 3 root root 180 Jan 11 22:45 976-host-inspector
drwxr-x--x 8 root root 360 Jan 11 23:01 977-cdsw-CDSW_DOCKER-prepare_node
drwxr-x--x 8 root root 360 Jan 11 23:03 978-cdsw-CDSW_DOCKER-prepare_node
drwxr-x--x 8 root root 360 Jan 11 23:11 979-cdsw-CDSW_DOCKER-prepare_node
drwxr-x--x 8 root root 360 Jan 11 23:31 980-cdsw-CDSW_DOCKER-prepare_node
drwxr-x--x 8 root root 360 Jan 12 00:03 981-cdsw-CDSW_DOCKER-prepare_node
drwxr-x--x 8 root root 360 Jan 12 00:25 982-cdsw-CDSW_DOCKER-prepare_node
drwxr-xr-x 4 root root 120 Jan 12 00:36 ccdeploy_hadoop-conf_etchadoopconf.cloudera.hdfs_4426068868072363370
drwxr-xr-x 4 root root 100 Jan 12 00:36 ccdeploy_hadoop-conf_etchadoopconf.cloudera.hdfs_-7433161567816885887
drwxr-xr-x 4 root root 100 Jan 12 00:36 ccdeploy_hadoop-conf_etchadoopconf.cloudera.yarn_1128395222526042064
drwxr-xr-x 4 root root 120 Jan 12 00:36 ccdeploy_hadoop-conf_etchadoopconf.cloudera.yarn_992013524586640974
drwxr-xr-x 8 root root 180 Jan 12 00:36 ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_4839323033697164425
drwxr-xr-x 8 root root 200 Jan 12 00:36 ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-6359529878913266590
[root@cdsw ~]# rm -rf /run/cloudera-scm-agent/process/*cdsw-CDSW*
[root@cdsw ~]# ls -l /run/cloudera-scm-agent/process/
total 0
drwxr-x--x 3 root root 200 Jan 11 21:21 959-cluster-host-inspector
drwxr-x--x 3 root root 200 Jan 11 21:23 964-cluster-host-inspector
drwxr-x--x 3 root root 180 Jan 11 21:23 969-cluster-host-inspector
drwxr-x--x 3 root root 180 Jan 11 22:45 976-host-inspector
drwxr-xr-x 4 root root 120 Jan 12 00:36 ccdeploy_hadoop-conf_etchadoopconf.cloudera.hdfs_4426068868072363370
drwxr-xr-x 4 root root 100 Jan 12 00:36 ccdeploy_hadoop-conf_etchadoopconf.cloudera.hdfs_-7433161567816885887
drwxr-xr-x 4 root root 100 Jan 12 00:36 ccdeploy_hadoop-conf_etchadoopconf.cloudera.yarn_1128395222526042064
drwxr-xr-x 4 root root 120 Jan 12 00:36 ccdeploy_hadoop-conf_etchadoopconf.cloudera.yarn_992013524586640974
drwxr-xr-x 8 root root 180 Jan 12 00:36 ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_4839323033697164425
drwxr-xr-x 8 root root 200 Jan 12 00:36 ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-6359529878913266590
1.执行Prepare Node命令,我这里CDSW已经启动成功了,所以Prepare Node按钮是灰色的,等待命令执行完成即可,个人猜测这一步主要是安装CDSW的一些依赖
2.启动CDSW,所有角色运行正常
1.查看/dev/sdb对应VG Name,正常情况下,VG Name为docker
2.执行cdsw status命令查看状态,结果显示正常,CDSW is ready
5.2.运行测试用例
由于CDH集群启用了Kerberos,访问CDSW的Web UI需要为Windows配置Kerberos的客户端工具,并且需要为CDSW集成Kerberos,提供可用的Principal,否则CDSW无法正常使用CDH集群服务,会出现运行Project卡死的情况,篇幅有限,这里不做介绍。
主要测试CDSW自带的Python和Scala用例
1.Python测试用例
2.Scala测试用例
1lCDSW的数据盘无需格式化及挂载,否则安装过程会报错。
2.Kerberos环境下使用CDSW,需要提供/创建可用的Principal,否则应用无法提交,且CDSW不报错,会出现应用卡死的假象。