首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何在Docker上设置Apache Spark和Zeppelin

如何在Docker上设置Apache Spark和Zeppelin
EN

Stack Overflow用户
提问于 2019-08-24 22:26:26
回答 2查看 3.3K关注 0票数 2

我正在尝试在Docker上使用Zeppelin设置Spark开发环境,但我在连接Zeppelin和Spark容器时遇到了问题。

我正在使用当前的docker-compose部署Docker Stack

代码语言:javascript
运行
复制
version: '3'
services:

  spark-master:
    image: gettyimages/spark
    command: bin/spark-class org.apache.spark.deploy.master.Master -h spark-master
    hostname: spark-master
    environment:
      SPARK_CONF_DIR: /conf
      SPARK_PUBLIC_DNS: 10.129.34.90
    volumes:
      - spark-master-volume:/conf
      - spark-master-volume:/tmp/data
    ports: 
      - 8000:8080

  spark-worker:
    image: gettyimages/spark
    command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077
    hostname: spark-worker
    environment:
      SPARK_MASTER_URL: spark-master:7077
      SPARK_CONF_DIR: /conf
      SPARK_PUBLIC_DNS: 10.129.34.90
      SPARK_WORKER_CORES: 2
      SPARK_WORKER_MEMORY: 2g
    volumes:
      - spark-worker-volume:/conf
      - spark-worker-volume:/tmp/data
    ports:
      - "8081-8100:8081-8100" 

  zeppelin:
    image: apache/zeppelin:0.8.0
    ports: 
      - 8080:8080
      - 8443:8443
    volumes:
      - spark-master-volume:/opt/zeppelin/logs
      - spark-master-volume:/opt/zeppelin/notebookcd
    environment:
      MASTER: "spark://spark-master:7077"
      SPARK_MASTER: "spark://spark-master:7077"
      SPARK_HOME: /usr/spark-2.4.1
    depends_on:
      - spark-master

volumes:
  spark-master-volume:
    driver: local
  spark-worker-volume:
    driver: local

它正常构建,但当我尝试在Zeppelin上运行Spark时,它抛出了我:

java.lang.RuntimeException: /zeppelin/bin/interpreter.sh: line 231: /usr/spark-2.4.1/bin/spark-submit: No such file or directory

我认为问题出在卷上,但我不知道怎么做才对。

EN

回答 2

Stack Overflow用户

发布于 2019-08-25 23:00:47

您需要在您的zeppelin docker实例上安装spark以使用spark-submit,并更新spark解释器配置以将其指向您的spark集群

代码语言:javascript
运行
复制
zeppelin_notebook_server:
    container_name: zeppelin_notebook_server
    build:
      context: zeppelin/
    restart: unless-stopped
    volumes:
      - ./zeppelin/config/interpreter.json:/zeppelin/conf/interpreter.json:rw
      - ./zeppelin/notebooks:/zeppelin/notebook
      - ../sample-data:/sample-data:ro
    ports:
      - "8085:8080"
    networks:
      - general
    labels:
      container_group: "notebook"

  spark_base:
    container_name: spark-base
    build:
      context: spark/base
    image: spark-base:latest

  spark_master:
    container_name: spark-master
    build:
      context: spark/master/
    networks:
      - general
    hostname: spark-master
    ports:
      - "3030:8080"
      - "7077:7077"
    environment:
      - "SPARK_LOCAL_IP=spark-master"
    depends_on:
      - spark_base
    volumes:
      - ./spark/apps/jars:/opt/spark-apps
      - ./spark/apps/data:/opt/spark-data
      - ../sample-data:/sample-data:ro

  spark_worker_1:
    container_name: spark-worker-1
    build:
      context: spark/worker/
    networks:
      - general
    hostname: spark-worker-1
    ports:
      - "3031:8081"
    env_file: spark/spark-worker-env.sh
    environment:
      - "SPARK_LOCAL_IP=spark-worker-1"
    depends_on:
      - spark_master
    volumes:
      - ./spark/apps/jars:/opt/spark-apps
      - ./spark/apps/data:/opt/spark-data
      - ../sample-data:/sample-data:ro

  spark_worker_2:
    container_name: spark-worker-2
    build:
      context: spark/worker/
    networks:
      - general
    hostname: spark-worker-2
    ports:
      - "3032:8082"
    env_file: spark/spark-worker-env.sh
    environment:
      - "SPARK_LOCAL_IP=spark-worker-2"
    depends_on:
      - spark_master
    volumes:
      - ./spark/apps/jars:/opt/spark-apps
      - ./spark/apps/data:/opt/spark-data
      - ../sample-data:/sample-data:ro

Zeppelin docker文件:

代码语言:javascript
运行
复制
FROM "apache/zeppelin:0.8.1"

RUN wget http://apache.mirror.iphh.net/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz --progress=bar:force && \
    tar xvf spark-2.4.3-bin-hadoop2.7.tgz && \
    mkdir -p /usr/local/spark && \
    mv spark-2.4.3-bin-hadoop2.7/* /usr/local/spark/. && \
    mkdir -p /sample-data

ENV SPARK_HOME "/usr/local/spark/"

确保您的zeppelin spark解释器配置与以下内容相同:

票数 4
EN

Stack Overflow用户

发布于 2020-04-22 22:26:41

使用内容构建Dockerfile

代码语言:javascript
运行
复制
FROM gettyimages/spark

ENV APACHE_SPARK_VERSION 2.4.1
ENV APACHE_HADOOP_VERSION 2.8.0
ENV ZEPPELIN_VERSION 0.8.1

RUN apt-get update 
RUN set -x \
    && curl -fSL "http://www-eu.apache.org/dist/zeppelin/zeppelin-0.8.1/zeppelin-0.8.1-bin-all.tgz" -o /tmp/zeppelin.tgz \
    && tar -xzvf /tmp/zeppelin.tgz -C /opt/ \
    && mv /opt/zeppelin-* /opt/zeppelin \
    && rm /tmp/zeppelin.tgz 

ENV SPARK_SUBMIT_OPTIONS "--jars /opt/zeppelin/sansa-examples-spark-2016-12.jar"
ENV SPARK_HOME "/usr/spark-2.4.1/"

WORKDIR /opt/zeppelin

CMD ["/opt/zeppelin/bin/zeppelin.sh"]

然后在带有前缀的docker-compose.yml文件中定义您的服务

代码语言:javascript
运行
复制
version: '3'
services:
  zeppelin:
    build: ./zeppelin
    image: zeppelin:0.8.1-hadoop-2.8.0-spark-2.4.1
    ...

最后,在docker stack deploy之前使用docker-compose -f docker-compose.yml build构建定制镜像

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57638836

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档