首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >运行PySpark时AWS EMR的代码失败

运行PySpark时AWS EMR的代码失败
EN

Stack Overflow用户
提问于 2020-06-06 12:35:24
回答 1查看 3.1K关注 0票数 1

我试图安装和运行PySpark在木星笔记本上的AWS ElasticMapReduce (电子病历)。如你所见

代码语言:javascript
运行
复制
%%info

Current session configs: {'driverMemory': '1000M', 'executorCores': 2, 'kind': 'pyspark'}
代码语言:javascript
运行
复制
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("docker-numpy").getOrCreate()
sc = spark.sparkContext

Output

The code failed because of a fatal error:
    Unable to create Session. Error: Unexpected endpoint: http://172.31.3.115:8998.

Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context.
b) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.
c) Restart the kernel.

其中172.31.3.115是我的主要内部/私有IP。我对notebook@ip-x-x-x-x$ more .sparkmagic/config做了如下更改

代码语言:javascript
运行
复制
{
  "kernel_python_credentials" : {
    "username": "",
    "password": "",
    "url": "http://172.31.3.115:8998",
    "auth": "None"
  },

  "kernel_scala_credentials" : {
    "username": "",
    "password": "",
    "url": "http://172.31.3.115:8998",
    "auth": "None"
  },
  "kernel_r_credentials": {
    "username": "",
    "password": "",
    "url": "http://172.31.3.115:8998"
  },

  "logging_config": {
    "version": 1,
    "formatters": {
      "magicsFormatter": { 
        "format": "%(asctime)s\t%(levelname)s\t%(message)s",
        "datefmt": ""
      }
    },
    "handlers": {
      "magicsHandler": { 
        "class": "hdijupyterutils.filehandler.MagicsFileHandler",
        "formatter": "magicsFormatter",
        "home_path": "~/.sparkmagic"
      }
    },
    "loggers": {
      "magicsLogger": { 
        "handlers": ["magicsHandler"],
        "level": "DEBUG",
        "propagate": 0
      }
    }
  },

  "wait_for_idle_timeout_seconds": 15,
  "livy_session_startup_timeout_seconds": 60,

  "fatal_error_suggestion": "The code failed because of a fatal error:\n\t{}.\n\nSome things to try:\na) Make sure Spark has enough available resources for Jupyter to create a Spark context.\nb) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.\nc) Restart the kernel.",

  "ignore_ssl_errors": false,

  "session_configs": {
    "driverMemory": "1000M",
    "executorCores": 2
  },

  "use_auto_viz": true,
  "coerce_dataframe": true,
  "max_results_sql": 2500,
  "pyspark_dataframe_encoding": "utf-8",

  "heartbeat_refresh_seconds": 30,
  "livy_server_heartbeat_timeout_seconds": 0,
  "heartbeat_retry_seconds": 10,

  "server_extension_default_kernel_name": "pysparkkernel",
  "custom_headers": {},

  "retry_policy": "configurable",
  "retry_seconds_to_sleep_list": [0.2, 0.5, 1, 3, 5],
  "configurable_retry_policy_max_retries": 8
}

和其他许多人一样,我也尝试过12。首先,我无法在电子病历上找到SPARK_HOME。我也有一个问题,如何在电子病历上安装Livy或设置Advanced Cluster Options?我正在使用aws-cli手动创建集群,如下所示

代码语言:javascript
运行
复制
aws emr create-cluster \
 --name 'EMR 6.0.0 with Docker' \
 --release-label emr-6.0.0 \
 --applications Name=Livy Name=Spark Name=Hadoop Name=JupyterHub \
 --ec2-attributes "KeyName=sowmya_private_key,SubnetId=subnet-b39550d8" \
 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m5.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m5.xlarge \
 --use-default-roles \
 --configurations file://./emr-configuration.json

这告诉我,下面的集群已经启动。

代码语言:javascript
运行
复制
{
    "ClusterId": "j-3T56U7A09JWAD"
}

我一直在跟踪AWS的这些链接/教程。

https://aws.amazon.com/blogs/machine-learning/build-amazon-sagemaker-notebooks-backed-by-spark-in-amazon-emr/

https://aws.amazon.com/blogs/big-data/simplify-your-spark-dependency-management-with-docker-in-emr-6-0-0/

没有太多的隐私问题,这里有一个错误日志的大呕吐物。

代码语言:javascript
运行
复制
The code failed because of a fatal error:
    Session 1 unexpectedly reached final status 'dead'. See logs:
stdout: 

stderr: 
20/06/06 04:05:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/06 04:05:16 INFO RMProxy: Connecting to ResourceManager at ip-172-31-3-115.us-east-2.compute.internal/172.31.3.115:8032
20/06/06 04:05:16 INFO Client: Requesting a new application from cluster with 2 NodeManagers
20/06/06 04:05:16 INFO Configuration: resource-types.xml not found
20/06/06 04:05:16 INFO ResourceUtils: Unable to find 'resource-types.xml'.
20/06/06 04:05:16 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
20/06/06 04:05:16 INFO Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
20/06/06 04:05:16 INFO Client: Setting up container launch context for our AM
20/06/06 04:05:16 INFO Client: Setting up the launch environment for our AM container
20/06/06 04:05:16 INFO Client: Preparing resources for our AM container
20/06/06 04:05:16 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
20/06/06 04:05:18 INFO Client: Uploading resource file:/mnt/tmp/spark-0cd5b0e0-9c69-4105-835f-ce1c484787d4/__spark_libs__3675935773843248835.zip -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/__spark_libs__3675935773843248835.zip
20/06/06 04:05:18 INFO Client: Uploading resource file:/usr/lib/livy/rsc-jars/livy-api-0.6.0-incubating.jar -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/livy-api-0.6.0-incubating.jar
20/06/06 04:05:18 INFO Client: Uploading resource file:/usr/lib/livy/rsc-jars/livy-rsc-0.6.0-incubating.jar -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/livy-rsc-0.6.0-incubating.jar
20/06/06 04:05:18 INFO Client: Uploading resource file:/usr/lib/livy/rsc-jars/netty-all-4.1.17.Final.jar -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/netty-all-4.1.17.Final.jar
20/06/06 04:05:18 INFO Client: Uploading resource file:/usr/lib/livy/repl_2.12-jars/commons-codec-1.9.jar -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/commons-codec-1.9.jar
20/06/06 04:05:19 INFO Client: Uploading resource file:/usr/lib/livy/repl_2.12-jars/livy-core_2.12-0.6.0-incubating.jar -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/livy-core_2.12-0.6.0-incubating.jar
20/06/06 04:05:19 INFO Client: Uploading resource file:/usr/lib/livy/repl_2.12-jars/livy-repl_2.12-0.6.0-incubating.jar -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/livy-repl_2.12-0.6.0-incubating.jar
20/06/06 04:05:19 INFO Client: Uploading resource file:/usr/lib/spark/R/lib/sparkr.zip#sparkr -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/sparkr.zip
20/06/06 04:05:19 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/pyspark.zip
20/06/06 04:05:19 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/py4j-0.10.7-src.zip
20/06/06 04:05:19 WARN Client: Same name resource file:///usr/lib/spark/python/lib/pyspark.zip added multiple times to distributed cache
20/06/06 04:05:19 WARN Client: Same name resource file:///usr/lib/spark/python/lib/py4j-0.10.7-src.zip added multiple times to distributed cache
20/06/06 04:05:19 INFO Client: Uploading resource file:/mnt/tmp/spark-0cd5b0e0-9c69-4105-835f-ce1c484787d4/__spark_conf__7110997886244851568.zip -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/__spark_conf__.zip
20/06/06 04:05:20 INFO SecurityManager: Changing view acls to: livy
20/06/06 04:05:20 INFO SecurityManager: Changing modify acls to: livy
20/06/06 04:05:20 INFO SecurityManager: Changing view acls groups to: 
20/06/06 04:05:20 INFO SecurityManager: Changing modify acls groups to: 
20/06/06 04:05:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(livy); groups with view permissions: Set(); users  with modify permissions: Set(livy); groups with modify permissions: Set()
20/06/06 04:05:21 INFO Client: Submitting application application_1591413438501_0002 to ResourceManager
20/06/06 04:05:21 INFO YarnClientImpl: Submitted application application_1591413438501_0002
20/06/06 04:05:21 INFO Client: Application report for application_1591413438501_0002 (state: ACCEPTED)
20/06/06 04:05:21 INFO Client: 
     client token: N/A
     diagnostics: [Sat Jun 06 04:05:21 +0000 2020] Application is Activated, waiting for resources to be assigned for AM.  Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:24576, vCores:8> ; Queue's Absolute capacity = 100.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; Queue's capacity (absolute resource) = <memory:24576, vCores:8> ; Queue's used capacity (absolute resource) = <memory:0, vCores:0> ; Queue's max capacity (absolute resource) = <memory:24576, vCores:8> ; 
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1591416321309
     final status: UNDEFINED
     tracking URL: http://ip-172-31-3-115.us-east-2.compute.internal:20888/proxy/application_1591413438501_0002/
     user: livy
20/06/06 04:05:21 INFO ShutdownHookManager: Shutdown hook called
20/06/06 04:05:21 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-0cd5b0e0-9c69-4105-835f-ce1c484787d4
20/06/06 04:05:21 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-d83d52f6-d17d-4e29-a562-7013ed539e1a

YARN Diagnostics: 
Application application_1591413438501_0002 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1591413438501_0002_000001 exited with  exitCode: 7
Failing this attempt.Diagnostics: [2020-06-06 04:05:25.619]Exception from container-launch.
Container id: container_1591413438501_0002_01_000001
Exit code: 7
Exception message: Launch container failed
Shell error output: Unable to find image '839713865431.dkr.ecr.us-east-2.amazonaws.com/emr-docker-examples:pyspark-latest' locally
/usr/bin/docker: Error response from daemon: manifest for 839713865431.dkr.ecr.us-east-2.amazonaws.com/emr-docker-examples:pyspark-latest not found: manifest unknown: Requested image not found.
See '/usr/bin/docker run --help'.

Shell output: main : command provided 4
main : run as user is hadoop
main : requested yarn user is livy
Creating script paths...
Creating local dirs...
Getting exit code file...
Changing effective user to root...
Wrote the exit code 7 to /mnt/yarn/nmPrivate/application_1591413438501_0002/container_1591413438501_0002_01_000001/container_1591413438501_0002_01_000001.pid.exitcode


[2020-06-06 04:05:25.645]Container exited with a non-zero exit code 7. Last 4096 bytes of stderr.txt :


[2020-06-06 04:05:25.646]Container exited with a non-zero exit code 7. Last 4096 bytes of stderr.txt :


For more detailed output, check the application tracking page: http://ip-172-31-3-115.us-east-2.compute.internal:8088/cluster/app/application_1591413438501_0002 Then click on links to logs of each attempt.
. Failing the application..

Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context.
b) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.
c) Restart the kernel.
EN

Stack Overflow用户

发布于 2020-06-26 02:19:00

我通常使用以下步骤创建集群:

  1. 使用AWS管理控制台创建EMR集群。

emr-5.25.0.选择

  1. 我选择的唯一应用程序是Spark.

默认情况下,

  1. 添加以下配置以应用Python3:

[{“分类”:“火花-env”,“配置”:{“分类”:“导出”,“属性”:{ "PYSPARK_PYTHON":“/usr/bin/python3 3”}} ]

  1. 单击创建集群.

  1. 将SSH的终端会话打开到主节点并安装jupyterlab:

Sudopip-3.6安装jupyterlab

  1. 启动jupyerlab:

导出PYSPARK_DRIVER_PYTHON=$(其中jupyter)输出PYSPARK_DRIVER_PYTHON_OPTS=“实验室-ip=0.0.0.0”火花-主纱-驱动器-内存8g -执行器-内存20g -执行器-核心4

  1. 打开第二个终端会话以启动到主节点的SSH隧道:

ssh -i /path/to/ssh/key.pem -ND 8157 -i

就这样。

票数 2
EN
查看全部 1 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62231798

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档