自定义镜像

最近更新时间:2024-06-19 17:20:01

我的收藏

功能介绍

EMR on TKE 提供的 Spark 和 Hive 服务的自定义镜像功能,允许基于基础镜像制作和使用自定义的 Spark 和 Hive 镜像。提供给您更大的灵活性,使用户能够根据自己的需求和偏好快速配置和优化 Spark 和 Hive 镜像环境。
使用说明:
1. 当前账号已创建容器镜像服务企业版或个人版仓库
2. EMR-TKE-1.1.0及以上部分组件支持自定义镜像
3. 当前环境已安装 Docker,且支持公网访问。

自定义镜像制作步骤

Step1 获取基础镜像

下载自定义镜像工具,执行下面命令,可用于查询支持自定义镜像的 EMR On TKE 版本和组件,以及对应的镜像版本。
./custom-image-tool list



其中 Task Base Image 镜像是用于 spark 任务的镜像,可以在控制台-配置管理修改配置参数,指定为自定义镜像。配置文件为 spark-defaults.conf(SPARK)、kyuubi-defaults.conf(KYUUBI)和 hive-site.xml(HIVE/HIVESERVER2)
# Custom container image to use for executors. spark.kubernetes.driver.container.image # Custom container image to use for executors. spark.kubernetes.executor.container.image
以 EMR-TKE-1.1.0版本的 spark 镜像为例,获取镜像到本地。
docker pull ccr.ccs.tencentyun.com/emr-image/spark:v3.3.2-60-553-release

Step2 制作自定义镜像

1. 在本地环境新建 Dockerfile,并将上述镜像作为基础镜像。如:
FROM ccr.ccs.tencentyun.com/emr-image/spark:v3.3.2-60-553-release USER root ## Here is your custom code ## # You can replace the spark jar file, or update the python version, etc.#
2. 在 Dockerfile 所在文件夹下,执行以下命令制作镜像,其中镜像名称与镜像仓库访问域名相关,EMR目前仅支持 容器镜像服务 的企业版和个人版镜像仓库。
docker build -t <Image_Name> .

Step3 校验自定义镜像

本步骤可选,主要通过 自定义镜像工具 校验所制作镜像的用户、Entrypoint、环境变量和目录文件是否符合规范,一定程度上保证镜像的可用性。下面是自定义镜像工具的使用方法。
Usage: custom-image-tool [command] Available Commands: check Check your custom image based on the emr image completion Generate the autocompletion script for the specified shell help Help about any command list List the mirror addresses released by each emr version Flags: -e, --emr-version string Version of EMR On TKE (default "1.1.0") -h, --help help for custom-image-tool -i, --image string Custom image name to be verified -s, --service string Service in the specify emr version (default "spark-3.3.2") Use "custom-image-tool [command] --help" for more information about a command.
以下是镜像校验的使用示例:
[root@VM-114-21-centos ~]#./custom-image-tool check -e 1.1.0 -s spark-3.3.2 -i spark:v3.3.2-test Current EmrVersion: 1.1.0 Current Service: spark-3.3.2 Current ImageName: spark:v3.3.2-test INFO =============== Base Test Starts =============== INFO Base: User is [root]. INFO Base: Entrypoint is /usr/local/service/spark/kubernetes/dockerfiles/spark/entrypoint.sh. INFO [Base Test] Pass INFO =============== Base Test Ends =============== INFO =============== Env Test Starts =============== INFO Env: [SPARK_HOME] is set to [/usr/local/service/spark]. INFO [Env Test] Pass INFO =============== Env Test Ends =============== INFO =============== File Structure Test Starts =============== INFO File Structure: file [beeline] exists in [/usr/local/service/spark/bin] INFO File Structure: file [docker-image-tool] exists in [/usr/local/service/spark/bin] INFO File Structure: file [find-spark-home] exists in [/usr/local/service/spark/bin] ... INFO File Structure: file [rss-client-spark3] exists in [/usr/local/service/spark/jars] INFO File Structure: file [spark-emr-analyzer] exists in [/usr/local/service/spark/jars] INFO File Structure: file [temrfs_hadoop_plugin_network] exists in [/usr/local/service/spark/jars] INFO [File Structure Test] Pass INFO =============== File Structure Test Ends =============== INFO =============== Job Test Starts =============== 24/06/07 13:15:29 INFO SparkContext: Running Spark version 3.3.2 24/06/07 13:15:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 24/06/07 13:15:30 INFO UserGroupInformation: Hadoop UGI authentication : SIMPLE 24/06/07 13:15:30 INFO ResourceUtils: ============================================================== 24/06/07 13:15:30 INFO ResourceUtils: No custom resources configured for spark.driver. 24/06/07 13:15:30 INFO ResourceUtils: ============================================================== 24/06/07 13:15:30 INFO SparkContext: Submitted application: Spark Pi ... Pi is roughly 3.1380356901784507 24/06/07 13:15:50 INFO SparkUI: Stopped Spark web UI at http://265afb66bdea:4040 24/06/07 13:15:50 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 24/06/07 13:15:50 INFO MemoryStore: MemoryStore cleared :24/06/07 13:15:51 INFO BlockManager: BlockManager stopped F24/06/07 13:15:51 INFO BlockManagerMaster: BlockManagerMaster stopped q24/06/07 13:15:51 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! G24/06/07 13:15:51 INFO SparkContext: Successfully stopped SparkContext A24/06/07 13:15:51 INFO ShutdownHookManager: Shutdown hook called o24/06/07 13:15:51 INFO ShutdownHookManager: Deleting directory /tmp/spark-37031e13-3013-497a-92ff-cfe93cfce78a o24/06/07 13:15:51 INFO ShutdownHookManager: Deleting directory /tmp/spark-bd1aec58-8e12-41a4-9388-c8d85487c566 INFO Job: the command [/usr/local/service/spark/bin/spark-submit --deploy-mode client --master local --class org.apache.spark.examples.SparkPi /usr/local/service/spark/examples/jars/spark-examples_2.12*.jar] excutes successfully. INFO [Job Test] Pass INFO =============== Job Test Ends =============== =============== Summary =============== [Base Test] Pass [Env Test] Pass [File Structure Test] Pass [Job Test] Pass

Step4 上传镜像至容器镜像仓库

可参考容器镜像仓库 企业版个人版 使用方法