功能介绍
EMR on TKE 提供的 Spark 和 Hive 服务的自定义镜像功能,允许基于基础镜像制作和使用自定义的 Spark 和 Hive 镜像。提供给您更大的灵活性,使用户能够根据自己的需求和偏好快速配置和优化 Spark 和 Hive 镜像环境。
使用说明:
1. 当前账号已创建容器镜像服务企业版或个人版仓库
2. EMR-TKE-1.1.0及以上部分组件支持自定义镜像
3. 当前环境已安装 Docker,且支持公网访问。
自定义镜像制作步骤
Step1 获取基础镜像
./custom-image-tool list
其中 Task Base Image 镜像是用于 spark 任务的镜像,可以在控制台-配置管理修改配置参数,指定为自定义镜像。配置文件为 spark-defaults.conf(SPARK)、kyuubi-defaults.conf(KYUUBI)和 hive-site.xml(HIVE/HIVESERVER2)
# Custom container image to use for executors. spark.kubernetes.driver.container.image # Custom container image to use for executors. spark.kubernetes.executor.container.image
以 EMR-TKE-1.1.0版本的 spark 镜像为例,获取镜像到本地。
docker pull ccr.ccs.tencentyun.com/emr-image/spark:v3.3.2-60-553-release
Step2 制作自定义镜像
1. 在本地环境新建 Dockerfile,并将上述镜像作为基础镜像。如:
FROM ccr.ccs.tencentyun.com/emr-image/spark:v3.3.2-60-553-release USER root ## Here is your custom code ## # You can replace the spark jar file, or update the python version, etc.#
2. 在 Dockerfile 所在文件夹下,执行以下命令制作镜像,其中镜像名称与镜像仓库访问域名相关,EMR目前仅支持 容器镜像服务 的企业版和个人版镜像仓库。
docker build -t <Image_Name> .
Step3 校验自定义镜像
Usage: custom-image-tool [command] Available Commands: check Check your custom image based on the emr image completion Generate the autocompletion script for the specified shell help Help about any command list List the mirror addresses released by each emr version Flags: -e, --emr-version string Version of EMR On TKE (default "1.1.0") -h, --help help for custom-image-tool -i, --image string Custom image name to be verified -s, --service string Service in the specify emr version (default "spark-3.3.2") Use "custom-image-tool [command] --help" for more information about a command.
以下是镜像校验的使用示例:
[root@VM-114-21-centos ~]#./custom-image-tool check -e 1.1.0 -s spark-3.3.2 -i spark:v3.3.2-test Current EmrVersion: 1.1.0 Current Service: spark-3.3.2 Current ImageName: spark:v3.3.2-test INFO =============== Base Test Starts =============== INFO Base: User is [root]. INFO Base: Entrypoint is /usr/local/service/spark/kubernetes/dockerfiles/spark/entrypoint.sh. INFO [Base Test] Pass INFO =============== Base Test Ends =============== INFO =============== Env Test Starts =============== INFO Env: [SPARK_HOME] is set to [/usr/local/service/spark]. INFO [Env Test] Pass INFO =============== Env Test Ends =============== INFO =============== File Structure Test Starts =============== INFO File Structure: file [beeline] exists in [/usr/local/service/spark/bin] INFO File Structure: file [docker-image-tool] exists in [/usr/local/service/spark/bin] INFO File Structure: file [find-spark-home] exists in [/usr/local/service/spark/bin] ... INFO File Structure: file [rss-client-spark3] exists in [/usr/local/service/spark/jars] INFO File Structure: file [spark-emr-analyzer] exists in [/usr/local/service/spark/jars] INFO File Structure: file [temrfs_hadoop_plugin_network] exists in [/usr/local/service/spark/jars] INFO [File Structure Test] Pass INFO =============== File Structure Test Ends =============== INFO =============== Job Test Starts =============== 24/06/07 13:15:29 INFO SparkContext: Running Spark version 3.3.2 24/06/07 13:15:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 24/06/07 13:15:30 INFO UserGroupInformation: Hadoop UGI authentication : SIMPLE 24/06/07 13:15:30 INFO ResourceUtils: ============================================================== 24/06/07 13:15:30 INFO ResourceUtils: No custom resources configured for spark.driver. 24/06/07 13:15:30 INFO ResourceUtils: ============================================================== 24/06/07 13:15:30 INFO SparkContext: Submitted application: Spark Pi ... Pi is roughly 3.1380356901784507 24/06/07 13:15:50 INFO SparkUI: Stopped Spark web UI at http://265afb66bdea:4040 24/06/07 13:15:50 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 24/06/07 13:15:50 INFO MemoryStore: MemoryStore cleared :24/06/07 13:15:51 INFO BlockManager: BlockManager stopped F24/06/07 13:15:51 INFO BlockManagerMaster: BlockManagerMaster stopped q24/06/07 13:15:51 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! G24/06/07 13:15:51 INFO SparkContext: Successfully stopped SparkContext A24/06/07 13:15:51 INFO ShutdownHookManager: Shutdown hook called o24/06/07 13:15:51 INFO ShutdownHookManager: Deleting directory /tmp/spark-37031e13-3013-497a-92ff-cfe93cfce78a o24/06/07 13:15:51 INFO ShutdownHookManager: Deleting directory /tmp/spark-bd1aec58-8e12-41a4-9388-c8d85487c566 INFO Job: the command [/usr/local/service/spark/bin/spark-submit --deploy-mode client --master local --class org.apache.spark.examples.SparkPi /usr/local/service/spark/examples/jars/spark-examples_2.12*.jar] excutes successfully. INFO [Job Test] Pass INFO =============== Job Test Ends =============== =============== Summary =============== [Base Test] Pass [Env Test] Pass [File Structure Test] Pass [Job Test] Pass
Step4 上传镜像至容器镜像仓库