前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >spark-3.1.1 on yarn setup on linux

spark-3.1.1 on yarn setup on linux

原创
作者头像
esse LL
修改2023-11-05 16:10:00
2630
修改2023-11-05 16:10:00
举报
文章被收录于专栏:操作系统实验操作系统实验

1. before start

follow all steps in hadoop-3.1.3 cluster setup on linux

and then switch to root user:

代码语言:shell
复制
su

2. cp spark and extract

代码语言:shell
复制
tar -xvzf /opt/software/spark-3.1.1-bin-hadoop3.2.tgz -C /opt/module

3. set env variables

代码语言:shell
复制
vi /etc/profile

add the following 4 lines:

代码语言:shell
复制
export SPARK_HOME="/opt/module/spark-3.1.1-bin-hadoop3.2"
export PATH=$PATH:$SPARK_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native

source or re-login:

代码语言:shell
复制
source /etc/profile

4. test spark-submit

代码语言:shell
复制
cd /opt
spark-submit --version

5. config yarn

代码语言:shell
复制
cd $SPARK_HOME
cp conf/spark-defaults.conf.template conf/spark-defaults.conf
vi conf/spark-defaults.conf

add:

代码语言:txt
复制
spark.master yarn

6. run on yarn

start hdfs and yarn:

代码语言:shell
复制
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh

use jps to check ResourceManager process

代码语言:shell
复制
spark-submit --master yarn --class org.apache.spark.examples.SparkPi  $SPARK_HOME/examples/jars/spark-examples_2.12-3.1.1.jar

7. test hdfs

put test file in hdfs:

代码语言:shell
复制
cd ~
wget -O alice.txt https://www.gutenberg.org/files/11/11-0.txt
hdfs dfs -mkdir inputs
hdfs dfs -put alice.txt inputs

run spark-shell and read the test file:

代码语言:shell
复制
spark-shell --master yarn --deploy-mode client
代码语言:scala
复制
val input = sc.textFile("inputs/alice.txt")
// Count the number of non blank lines
input.filter(line => line.length()>0).count()

8. optional ops

8.1 config workers memory

代码语言:shell
复制
vi $SPARK_HOME/conf/spark-defaults.conf

add following 3 lines:

代码语言:txt
复制
spark.driver.memory 512m
spark.yarn.am.memory 512m
spark.executor.memory 512m

if necessary:

8.2 switch to jdk 1.8

try following cmds:

代码语言:shell
复制
which java
ls -l /usr/bin/java
mv /usr/bin/java /usr/bin/java2
java -version

jdk should be 1.8 now.

9. more information

Spark web UI at http://master:4040

Yarn web UI at http://master:8088/

download archive version of pkgs, view http://archive.apache.org/dist/spark/

for more information, view

https://spark.apache.org/docs/latest/running-on-yarn.html

https://www.linode.com/docs/guides/install-configure-run-spark-on-top-of-hadoop-yarn-cluster/

https://sparkbyexamples.com/spark/spark-setup-on-hadoop-yarn/

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1. before start
  • 2. cp spark and extract
  • 3. set env variables
  • 4. test spark-submit
  • 5. config yarn
  • 6. run on yarn
  • 7. test hdfs
  • 8. optional ops
    • 8.1 config workers memory
      • 8.2 switch to jdk 1.8
      • 9. more information
      领券
      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档