前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Container exited with a non-zero exit code 134 Container exited code 134

Container exited with a non-zero exit code 134 Container exited code 134

原创
作者头像
sandyshu
发布2022-09-22 16:27:34
1.4K0
发布2022-09-22 16:27:34
举报

背景---在相关问题的日志显示如下所示:

1.任务的相关日志如下所示:

客户反馈的问题日志
客户反馈的问题日志

1.1 反馈了多个任务中有出现以上的日志.

客户提供的submit相关命令如下所示:

spark-submit \

--driver-class-path "$yarn_client_driver_classpath" \

--jars "$extraJars" --files "$extraFiles" \

--conf spark.dynamicAllocation.enabled=true \

--conf spark.driver.userClassPathFirst=true \

--conf spark.port.maxRetries=30 \

--conf spark.shuffle.file.buffer=96k \

--conf spark.reducer.maxSizeInFlight=96m \

--conf spark.task.maxFailures=20 \

--conf spark.network.timeout=500s \

--conf spark.yarn.maxAppAttempts=3 \

--conf spark.executor.extraJavaOptions="-Dfile.encoding=UTF-8 -XX:+UseG1GC" \

--conf spark.driver.extraJavaOptions="-Dfile.encoding=UTF-8 -XX:+UseG1GC" \

--master yarn --deploy-mode "$yarn_deploy_mode" \

$kerberos_args \

"$@"

1.2 相关log如下:

exec /bin/bash -c "LD_LIBRARY_PATH=\"$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH\" $JAVA_HOME/bin/java -server -Xmx10240m '-Dfile.encoding=UTF-8' '-XX:+UseG1GC' -Djava.io.tmpdir=$PWD/tmp '-Dspark.port.maxRetries=30' '-Dspark.network.timeout=500s' '-Dspark.driver.port=46243' -Dspark.yarn.app.container.log.dir=/data/emr/yarn/logs/application_1662701224474_3019/container_e20_1662701224474_3019_01_000076 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@172.31.1.18:46243 --executor-id 67 --hostname 172.31.1.12 --cores 1 --app-id application_1662701224474_3019 --user-class-path file:$PWD/__app__.jar --user-class-path file:$PWD/amqp-client-5.4.3.jar --user-class-path file:$PWD/ant-1.9.1.jar --user-class-path file:$PWD/aviator-3.3.0.jar --user-class-path file:$PWD/aws-java-sdk-core-1.11.60.jar --user-class-path file:$PWD/aws-java-sdk-s3-1.11.60.jar --user-class-path file:$PWD/azure-storage-8.2.0.jar --user-class-path file:$PWD/caffeine-2.9.0.jar --user-class-path file:$PWD/commons-compress-1.8.1.jar --user-class-path file:$PWD/commons-csv-1.8.jar --user-class-path file:$PWD/commons-email-1.3.1.jar --user-class-path file:$PWD/commons-exec-1.1.jar --user-class-path file:$PWD/commons-lang-2.4.jar --user-class-path file:$PWD/commons-lang3-3.5.jar --user-class-path file:$PWD/commons-pool2-2.6.2.jar --user-class-path file:$PWD/cos_api-5.6.55.jar --user-class-path file:$PWD/gson-2.6.2.jar --user-class-path file:$PWD/guava-15.0.jar --user-class-path file:$PWD/hbase-client-2.3.5.jar --user-class-path file:$PWD/hbase-common-2.3.5.jar --user-class-path file:$PWD/hbase-hadoop2-compat-2.3.5.jar --user-class-path file:$PWD/hbase-hadoop-compat-2.3.5.jar --user-class-path file:$PWD/hbase-mapreduce-2.3.5.jar --user-class-path file:$PWD/hbase-metrics-2.3.5.jar --user-class-path file:$PWD/hbase-metrics-api-2.3.5.jar --user-class-path file:$PWD/hbase-protocol-2.3.5.jar --user-class-path file:$PWD/hbase-protocol-shaded-2.3.5.jar --user-class-path file:$PWD/hbase-server-2.3.5.jar --user-class-path file:$PWD/hbase-shaded-miscellaneous-3.3.0.jar --user-class-path file:$PWD/hbase-shaded-netty-3.3.0.jar --user-class-path file:$PWD/hbase-shaded-protobuf-3.3.0.jar --user-class-path file:$PWD/hbase-zookeeper-2.3.5.jar --user-class-path file:$PWD/httpclient-4.5.13.jar --user-class-path file:$PWD/httpcore-4.4.5.jar --user-class-path file:$PWD/insight-shaded-guava-15.0.jar --user-class-path file:$PWD/jackson-dataformat-yaml-2.12.5.jar --user-class-path file:$PWD/javassist-3.28.0-GA.jar --user-class-path file:$PWD/jboss-marshalling-2.0.11.Final.jar --user-class-path file:$PWD/jboss-marshalling-river-2.0.11.Final.jar --user-class-path file:$PWD/jedis-3.1.0.jar --user-class-path file:$PWD/joda-time-2.8.1.jar --user-class-path file:$PWD/lombok-1.18.20.jar --user-class-path file:$PWD/mail-1.4.5.jar --user-class-path file:$PWD/memory-0.12.1.jar --user-class-path file:$PWD/nacos-api-1.3.2.jar --user-class-path file:$PWD/nacos-client-1.3.2.jar --user-class-path file:$PWD/nacos-common-1.3.2.jar --user-class-path file:$PWD/opencsv-2.3.jar --user-class-path file:$PWD/redisson-3.16.3.jar --user-class-path file:$PWD/reflections-0.10.2.jar --user-class-path file:$PWD/RoaringBitmap-0.6.44.jar --user-class-path file:$PWD/simpleclient-0.5.0.jar --user-class-path file:$PWD/sketches-core-0.13.0.jar --user-class-path file:$PWD/spring-core-4.1.8.RELEASE.jar --user-class-path file:$PWD/ua_uc_check-1.0.0.jar --user-class-path file:$PWD/UserAgentUtils-1.20.jar --user-class-path file:$PWD/zookeeper-3.5.7.jar 1>/data/emr/yarn/logs/application_1662701224474_3019/container_e20_1662701224474_3019_01_000076/stdout 2>/data/emr/yarn/logs/application_1662701224474_3019/container_e20_1662701224474_3019_01_000076/stderr"

End of LogType:launch_container.sh

************************************************************************************

1.3 YARN executor launch context:

env:

CLASSPATH -> $HADOOP_HOME/lib/<CPS>{{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>/usr/local/service/hadoop/etc/hadoop<CPS>/usr/local/service/hadoop/share/hadoop/common/*<CPS>/usr/local/service/hadoop/share/hadoop/common/lib/*<CPS>/usr/local/service/hadoop/share/hadoop/hdfs/*<CPS>/usr/local/service/hadoop/share/hadoop/hdfs/lib/*<CPS>/usr/local/service/hadoop/share/hadoop/mapreduce/*<CPS>/usr/local/service/hadoop/share/hadoop/mapreduce/lib/*<CPS>/usr/local/service/hadoop/share/hadoop/yarn/*<CPS>/usr/local/service/hadoop/share/hadoop/yarn/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__

SPARK_YARN_STAGING_DIR -> hdfs://HDFS19683/user/hadoop/.sparkStaging/application_1662701224474_3019

SPARK_USER -> hadoop

command:

LD_LIBRARY_PATH=\"$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH\" \

{{JAVA_HOME}}/bin/java \

-server \

-Xmx10240m \

'-Dfile.encoding=UTF-8' \

'-XX:+UseG1GC' \

-Djava.io.tmpdir={{PWD}}/tmp \

'-Dspark.port.maxRetries=30' \

'-Dspark.network.timeout=500s' \

'-Dspark.driver.port=46243' \

-Dspark.yarn.app.container.log.dir=<LOG_DIR> \

-XX:OnOutOfMemoryError='kill %p' \

org.apache.spark.executor.CoarseGrainedExecutorBackend \

--driver-url \

spark://CoarseGrainedScheduler@****:46243 \

--executor-id \

<executorId> \

--hostname \

<hostname> \

--cores \

2.问题解决如下所示:

2.1) 以下两个-XX:+UseG1GC均去除,解决该问题.

--conf spark.executor.extraJavaOptions="-Dfile.encoding=UTF-8 -XX:+UseG1GC" \

--conf spark.driver.extraJavaOptions="-Dfile.encoding=UTF-8 -XX:+UseG1GC" \

(-XX永久代的

3.原理剖析:

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

此处是JVM相关报错,可在故障机上使用 ulimit -a 命令查看本机的设置.

-t: cpu time (seconds) unlimited

-f: file size (blocks) unlimited

-d: data seg size (kbytes) unlimited

-s: stack size (kbytes) 8192

-c: core file size (blocks) 0

-v: address space (kbytes) unlimited

-l: locked-in-memory size (kbytes) unlimited

-u: processes 1392

-n: file descriptors 2560

此处报错是ulimit -c 对应的数值是:-c: core file size (blocks) 0

4.接下来看下UseG1GC和CMS两者之间的区别.

-XX:+UseG1GC

对应的两者之间的对比,可参考如下的url:https://blog.chriscs.com/2017/06/20/g1-vs-cms/,

此错误通过调整 spark.storage.memoryFraction的值依旧生效.

原理如下:

Spark通过确保它不超过RDD堆空间体积乘以此参数的值来控制缓存RDD的总大小的参数。JVM也可以使用RDD高速缓存分数的未使用部分。因此,Spark应用程序的GC分析应涵盖两个内存分数的内存使用情况。

当观察到GC延迟,导致效率下降时,我们应首先检查并确保Spark应用程序以有效的方式使用有限的内存空间.

RDD占用的内存空间越少,程序执行剩余的堆空间就越多,从而提高了GC的效率;

相反,由于旧代中存在大量缓冲对象,RDD过多的内存消耗会导致显着的性能损失.

相关文档引入如下所示:

https://blog.chriscs.com/2017/06/20/g1-vs-cms/

https://www.cnblogs.com/qingyunzong/p/8973857.html

https://www.databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
TDSQL MySQL 版
TDSQL MySQL 版(TDSQL for MySQL)是腾讯打造的一款分布式数据库产品,具备强一致高可用、全球部署架构、分布式水平扩展、高性能、企业级安全等特性,同时提供智能 DBA、自动化运营、监控告警等配套设施,为客户提供完整的分布式数据库解决方案。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档