首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >PySpark在纱线客户端模式下运行,但在集群模式下失败,因为“用户没有初始化火花上下文!”

PySpark在纱线客户端模式下运行,但在集群模式下失败,因为“用户没有初始化火花上下文!”
EN

Stack Overflow用户
提问于 2022-01-11 14:25:24
回答 1查看 877关注 0票数 3
  • 标准dataproc映像2.0
  • Ubuntu 18.04 LTS
  • Hadoop 3.2
  • 火花3.1

我正在测试在dataproc pyspark集群上运行一个非常简单的脚本:

testing_dep.py

代码语言:javascript
运行
复制
import os
os.listdir('./')

我可以在客户端模式下运行testing_dep.py (默认情况下为dataproc):

代码语言:javascript
运行
复制
gcloud dataproc jobs submit pyspark ./testing_dep.py --cluster=pyspark-monsoon --region=us-central1

但是,当我试图在集群模式下运行相同的作业时,我会得到错误:

代码语言:javascript
运行
复制
gcloud dataproc jobs submit pyspark ./testing_dep.py --cluster=pyspark-monsoon --region=us-central1 --properties=spark.submit.deployMode=cluster

错误日志:

代码语言:javascript
运行
复制
Job [417443357bcd43f99ee3dc60f4e3bfea] submitted.
Waiting for job output...
22/01/12 05:32:20 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at monsoon-testing-m/10.128.15.236:8032
22/01/12 05:32:20 INFO org.apache.hadoop.yarn.client.AHSProxy: Connecting to Application History server at monsoon-testing-m/10.128.15.236:10200
22/01/12 05:32:22 INFO org.apache.hadoop.conf.Configuration: resource-types.xml not found
22/01/12 05:32:22 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils: Unable to find 'resource-types.xml'.
22/01/12 05:32:24 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1641965080466_0001
22/01/12 05:32:42 ERROR org.apache.spark.deploy.yarn.Client: Application diagnostics message: Application application_1641965080466_0001 failed 2 times due to AM Container for appattempt_1641965080466_0001_000002 exited with  exitCode: 13
Failing this attempt.Diagnostics: [2022-01-12 05:32:42.154]Exception from container-launch.
Container id: container_1641965080466_0001_02_000001
Exit code: 13

[2022-01-12 05:32:42.203]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
22/01/12 05:32:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception: 
java.lang.IllegalStateException: User did not initialize spark context!
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:520)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:899)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:898)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)


[2022-01-12 05:32:42.203]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
22/01/12 05:32:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception: 
java.lang.IllegalStateException: User did not initialize spark context!
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:520)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:899)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:898)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)


For more detailed output, check the application tracking page: http://monsoon-testing-m:8188/applicationhistory/app/application_1641965080466_0001 Then click on links to logs of each attempt.
. Failing the application.
Exception in thread "main" org.apache.spark.SparkException: Application application_1641965080466_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1242)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1634)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [417443357bcd43f99ee3dc60f4e3bfea] failed with error:
Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at:
https://console.cloud.google.com/dataproc/jobs/417443357bcd43f99ee3dc60f4e3bfea?project=monsoon-credittech&region=us-central1
gcloud dataproc jobs wait '417443357bcd43f99ee3dc60f4e3bfea' --region 'us-central1' --project 'monsoon-credittech'
https://console.cloud.google.com/storage/browser/monsoon-credittech.appspot.com/google-cloud-dataproc-metainfo/64632294-3e9b-4c55-af8a-075fc7d6f412/jobs/417443357bcd43f99ee3dc60f4e3bfea/
gs://monsoon-credittech.appspot.com/google-cloud-dataproc-metainfo/64632294-3e9b-4c55-af8a-075fc7d6f412/jobs/417443357bcd43f99ee3dc60f4e3bfea/driveroutput

你能帮我理解一下我做错了什么,以及为什么这个代码失败了吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-01-19 21:26:45

在纱线集群模式下运行Spark时,预期会出现错误,但作业不会创建Spark上下文。请参阅ApplicationMaster.scala的源代码。

为了避免此错误,您需要创建一个SparkContext或SparkSession,例如:

代码语言:javascript
运行
复制
from pyspark.sql import SparkSession

spark = SparkSession.builder \
                    .appName('MySparkApp') \
                    .getOrCreate()

客户端模式没有经过相同的代码路径,也没有类似的检查。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70668449

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档