前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Spark2.x学习笔记:1、Spark2.2快速入门(本地模式)

Spark2.x学习笔记:1、Spark2.2快速入门(本地模式)

作者头像
程裕强
发布2018-01-02 16:33:29
2.1K0
发布2018-01-02 16:33:29
举报

1、Spark2.2快速入门(本地模式)

1.1 Spark本地模式

学习Spark,先易后难,先从最简单的本地模式学起。

本地模式(local),常用于本地开发测试,解压缩Spark软件包就可以用,也就是所谓的“开封即用”

1.2 安装JDK8

(1)下载 登录Oracle官网http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html,同意协议条款,选择64位Linux版tar包链接,可以直接单击链接下载,推荐通过多线程下载工具(比如迅雷)加速下载。 (2)上传到服务器 通过XShell将在Windows系统下载的JDK8软件包上传到服务器192.168.1.180 (3)解压缩 此处解压缩到/opt目录。为了方便管理,我将第三方软件都安装到/opt目录。

代码语言:javascript
复制
[root@master ~]# tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt

(4)配置JDK环境变量 可以在/etc/profile文件中设置环境变量,为了方便管理此处在/etc/profile.d/目录下创建custom.sh文件,用于设置用户环境变量。

代码语言:javascript
复制
[root@master ~]# vi /etc/profile.d/custom.sh
[root@master ~]# cat /etc/profile.d/custom.sh
#java path
export JAVA_HOME=/opt/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib
[root@master ~]#

(5)环境变量生效

代码语言:javascript
复制
[root@master ~]# source /etc/profile.d/custom.sh

(6)运行java -version验证JDK

代码语言:javascript
复制
[root@master ~]# java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
[root@master ~]# 

1.3 下载Spark2.x软件包

(1)登录Spark官网 http://spark.apache.org/downloads.html (2)第1个选择spark发行版(选择2.2.0版),第2个选择软件包类型(选择Hadoop 2.7),第3个选择下载类型(直接下载较慢,选择Select Apache Mirror)

这里写图片描述
这里写图片描述

(3)单击spark-2.2.0-bin-hadoop2.7.tgz链接,选择国内镜像

这里写图片描述
这里写图片描述

(4)通过多线程下载工具加速下载 选择一个最近的镜像,比如此处选择清华大学镜像,通过wget命令wget http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz直接下载。

代码语言:javascript
复制
[root@master ~]# wget http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
--2017-08-29 22:43:51--  http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
Resolving mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)... 101.6.6.177, 2402:f000:1:416:101:6:6:177
Connecting to mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)|101.6.6.177|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 203728858 (194M) [application/octet-stream]
Saving to: ‘spark-2.2.0-bin-hadoop2.7.tgz’

100%[============================================================================================================>] 203,728,858 9.79MB/s   in 23s    

2017-08-29 22:44:15 (8.32 MB/s) - ‘spark-2.2.0-bin-hadoop2.7.tgz’ saved [203728858/203728858]

[root@master ~]#

(5)然后解压缩/opt目录。我们约定Linux平台下第三方软件包都放到/opt目录下。

代码语言:javascript
复制
[root@master ~]# tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz -C /opt

(6)由于Spark根目录太长,重命名一下。当然也可以不进行重命名。

代码语言:javascript
复制
[root@master ~]# mv /opt/spark-2.2.0-bin-hadoop2.7/ /opt/spark-2.2.0

1.4 Spark目录结构

代码语言:javascript
复制
[root@master ~]# cd /opt/spark-2.2.0/
[root@master spark-2.2.0]# ll
total 84
drwxr-xr-x. 2 500 500  4096 Jun 30 19:09 bin
drwxr-xr-x. 2 500 500   230 Jun 30 19:09 conf
drwxr-xr-x. 5 500 500    50 Jun 30 19:09 data
drwxr-xr-x. 4 500 500    29 Jun 30 19:09 examples
drwxr-xr-x. 2 500 500 12288 Jun 30 19:09 jars
-rw-r--r--. 1 500 500 17881 Jun 30 19:09 LICENSE
drwxr-xr-x. 2 500 500  4096 Jun 30 19:09 licenses
-rw-r--r--. 1 500 500 24645 Jun 30 19:09 NOTICE
drwxr-xr-x. 8 500 500   240 Jun 30 19:09 python
drwxr-xr-x. 3 500 500    17 Jun 30 19:09 R
-rw-r--r--. 1 500 500  3809 Jun 30 19:09 README.md
-rw-r--r--. 1 500 500   128 Jun 30 19:09 RELEASE
drwxr-xr-x. 2 500 500  4096 Jun 30 19:09 sbin
drwxr-xr-x. 2 500 500    42 Jun 30 19:09 yarn
[root@master spark-2.2.0]# 

目录

说明

bin

可执行脚本,Spark相关命令

conf

spark配置文件

data

spark自带例子用到的数据

examples

spark自带样例程序

lib

spark相关jar包

sbin

集群启停,因为spark有自带的集群环境

Spark软件包bin目录说明:

  • spark-shell :spark shell模式启动命令(脚本)
  • spark-submit:spark应用程序提交脚本(脚本)
  • run-example:运行spark提供的样例程序
  • spark-sql:spark SQL命令启动命令(脚本)

1.5 运行样例程序

代码语言:javascript
复制
[root@master1 spark-2.2.0]# bin/run-example SparkPi 4 4
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/08/29 01:27:26 INFO SparkContext: Running Spark version 2.2.0
17/08/29 01:27:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/29 01:27:26 INFO SparkContext: Submitted application: Spark Pi
17/08/29 01:27:27 INFO SecurityManager: Changing view acls to: root
17/08/29 01:27:27 INFO SecurityManager: Changing modify acls to: root
17/08/29 01:27:27 INFO SecurityManager: Changing view acls groups to: 
17/08/29 01:27:27 INFO SecurityManager: Changing modify acls groups to: 
17/08/29 01:27:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
17/08/29 01:27:27 INFO Utils: Successfully started service 'sparkDriver' on port 40549.
17/08/29 01:27:27 INFO SparkEnv: Registering MapOutputTracker
17/08/29 01:27:27 INFO SparkEnv: Registering BlockManagerMaster
17/08/29 01:27:27 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/08/29 01:27:27 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/08/29 01:27:27 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-719136e3-dc4e-4061-a07a-e5f04d679ad1
17/08/29 01:27:27 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/08/29 01:27:27 INFO SparkEnv: Registering OutputCommitCoordinator
17/08/29 01:27:27 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/08/29 01:27:27 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.180:4040
17/08/29 01:27:27 INFO SparkContext: Added JAR file:/opt/spark-2.2.0/examples/jars/scopt_2.11-3.3.0.jar at spark://192.168.1.180:40549/jars/scopt_2.11-3.3.0.jar with timestamp 1503984447798
17/08/29 01:27:27 INFO SparkContext: Added JAR file:/opt/spark-2.2.0/examples/jars/spark-examples_2.11-2.2.0.jar at spark://192.168.1.180:40549/jars/spark-examples_2.11-2.2.0.jar with timestamp 1503984447798
17/08/29 01:27:27 INFO Executor: Starting executor ID driver on host localhost
17/08/29 01:27:27 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43952.
17/08/29 01:27:27 INFO NettyBlockTransferService: Server created on 192.168.1.180:43952
17/08/29 01:27:27 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/08/29 01:27:27 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.180, 43952, None)
17/08/29 01:27:27 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.180:43952 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.180, 43952, None)
17/08/29 01:27:27 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.180, 43952, None)
17/08/29 01:27:27 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.180, 43952, None)
17/08/29 01:27:28 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/opt/spark-2.2.0/spark-warehouse').
17/08/29 01:27:28 INFO SharedState: Warehouse path is 'file:/opt/spark-2.2.0/spark-warehouse'.
17/08/29 01:27:29 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
17/08/29 01:27:29 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
17/08/29 01:27:29 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 4 output partitions
17/08/29 01:27:29 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
17/08/29 01:27:29 INFO DAGScheduler: Parents of final stage: List()
17/08/29 01:27:29 INFO DAGScheduler: Missing parents: List()
17/08/29 01:27:29 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
17/08/29 01:27:29 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 366.3 MB)
17/08/29 01:27:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1172.0 B, free 366.3 MB)
17/08/29 01:27:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.180:43952 (size: 1172.0 B, free: 366.3 MB)
17/08/29 01:27:29 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/08/29 01:27:29 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3))
17/08/29 01:27:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
17/08/29 01:27:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 4825 bytes)
17/08/29 01:27:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 4825 bytes)
17/08/29 01:27:29 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 4825 bytes)
17/08/29 01:27:29 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 4825 bytes)
17/08/29 01:27:29 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
17/08/29 01:27:29 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
17/08/29 01:27:29 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
17/08/29 01:27:29 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/08/29 01:27:29 INFO Executor: Fetching spark://192.168.1.180:40549/jars/scopt_2.11-3.3.0.jar with timestamp 1503984447798
17/08/29 01:27:29 INFO TransportClientFactory: Successfully created connection to /192.168.1.180:40549 after 34 ms (0 ms spent in bootstraps)
17/08/29 01:27:29 INFO Utils: Fetching spark://192.168.1.180:40549/jars/scopt_2.11-3.3.0.jar to /tmp/spark-058642cb-042f-4960-b7e9-172fc02caff8/userFiles-28264a42-00c6-42cb-8d3f-e4fe670fb272/fetchFileTemp1808807623002630899.tmp
17/08/29 01:27:29 INFO Executor: Adding file:/tmp/spark-058642cb-042f-4960-b7e9-172fc02caff8/userFiles-28264a42-00c6-42cb-8d3f-e4fe670fb272/scopt_2.11-3.3.0.jar to class loader
17/08/29 01:27:29 INFO Executor: Fetching spark://192.168.1.180:40549/jars/spark-examples_2.11-2.2.0.jar with timestamp 1503984447798
17/08/29 01:27:29 INFO Utils: Fetching spark://192.168.1.180:40549/jars/spark-examples_2.11-2.2.0.jar to /tmp/spark-058642cb-042f-4960-b7e9-172fc02caff8/userFiles-28264a42-00c6-42cb-8d3f-e4fe670fb272/fetchFileTemp3327801226116360399.tmp
17/08/29 01:27:29 INFO Executor: Adding file:/tmp/spark-058642cb-042f-4960-b7e9-172fc02caff8/userFiles-28264a42-00c6-42cb-8d3f-e4fe670fb272/spark-examples_2.11-2.2.0.jar to class loader
17/08/29 01:27:30 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 867 bytes result sent to driver
17/08/29 01:27:30 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 436 ms on localhost (executor driver) (1/4)
17/08/29 01:27:30 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 867 bytes result sent to driver
17/08/29 01:27:30 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 867 bytes result sent to driver
17/08/29 01:27:30 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 423 ms on localhost (executor driver) (2/4)
17/08/29 01:27:30 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 424 ms on localhost (executor driver) (3/4)
17/08/29 01:27:30 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 867 bytes result sent to driver
17/08/29 01:27:30 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 428 ms on localhost (executor driver) (4/4)
17/08/29 01:27:30 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/08/29 01:27:30 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.482 s
17/08/29 01:27:30 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.766385 s
Pi is roughly 3.1493878734696836
17/08/29 01:27:30 INFO SparkUI: Stopped Spark web UI at http://192.168.1.180:4040
17/08/29 01:27:30 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/08/29 01:27:30 INFO MemoryStore: MemoryStore cleared
17/08/29 01:27:30 INFO BlockManager: BlockManager stopped
17/08/29 01:27:30 INFO BlockManagerMaster: BlockManagerMaster stopped
17/08/29 01:27:30 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/08/29 01:27:30 INFO SparkContext: Successfully stopped SparkContext
17/08/29 01:27:30 INFO ShutdownHookManager: Shutdown hook called
17/08/29 01:27:30 INFO ShutdownHookManager: Deleting directory /tmp/spark-058642cb-042f-4960-b7e9-172fc02caff8
[root@master1 spark-2.2.0]# 

可以看到运行结果:Pi is roughly 3.1493878734696836

1.6 初识spark-shell

进入spark-shell

代码语言:javascript
复制
[root@master spark-2.2.0]# bin/spark-shell 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/08/28 23:32:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/28 23:32:50 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.1.180:4040
Spark context available as 'sc' (master = local[*], app id = local-1503977564935).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

从上面的spark-shell日志中可以看到Spark context Web UI available at http://192.168.1.180:4040,表明spark-shell启动了一个WebUI,在浏览器地址栏输入http://192.168.1.180:4040即可打开。

这里写图片描述
这里写图片描述
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2017-08-29 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1、Spark2.2快速入门(本地模式)
    • 1.1 Spark本地模式
      • 1.2 安装JDK8
        • 1.3 下载Spark2.x软件包
          • 1.4 Spark目录结构
            • 1.5 运行样例程序
              • 1.6 初识spark-shell
              领券
              问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档