Spark2.x学习笔记:1、Spark2.2快速入门(本地模式)

1、Spark2.2快速入门(本地模式)

1.1 Spark本地模式

学习Spark,先易后难,先从最简单的本地模式学起。

本地模式(local),常用于本地开发测试,解压缩Spark软件包就可以用,也就是所谓的“开封即用”

1.2 安装JDK8

(1)下载 登录Oracle官网http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html,同意协议条款,选择64位Linux版tar包链接,可以直接单击链接下载,推荐通过多线程下载工具(比如迅雷)加速下载。 (2)上传到服务器 通过XShell将在Windows系统下载的JDK8软件包上传到服务器192.168.1.180 (3)解压缩 此处解压缩到/opt目录。为了方便管理,我将第三方软件都安装到/opt目录。

[root@master ~]# tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt

(4)配置JDK环境变量 可以在/etc/profile文件中设置环境变量,为了方便管理此处在/etc/profile.d/目录下创建custom.sh文件,用于设置用户环境变量。

[root@master ~]# vi /etc/profile.d/custom.sh
[root@master ~]# cat /etc/profile.d/custom.sh
#java path
export JAVA_HOME=/opt/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib
[root@master ~]#

(5)环境变量生效

[root@master ~]# source /etc/profile.d/custom.sh

(6)运行java -version验证JDK

[root@master ~]# java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
[root@master ~]# 

1.3 下载Spark2.x软件包

(1)登录Spark官网 http://spark.apache.org/downloads.html (2)第1个选择spark发行版(选择2.2.0版),第2个选择软件包类型(选择Hadoop 2.7),第3个选择下载类型(直接下载较慢,选择Select Apache Mirror)

(3)单击spark-2.2.0-bin-hadoop2.7.tgz链接,选择国内镜像

(4)通过多线程下载工具加速下载 选择一个最近的镜像,比如此处选择清华大学镜像,通过wget命令wget http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz直接下载。

[root@master ~]# wget http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
--2017-08-29 22:43:51--  http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
Resolving mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)... 101.6.6.177, 2402:f000:1:416:101:6:6:177
Connecting to mirrors.tuna.tsinghua.edu.cn (mirrors.tuna.tsinghua.edu.cn)|101.6.6.177|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 203728858 (194M) [application/octet-stream]
Saving to: ‘spark-2.2.0-bin-hadoop2.7.tgz’

100%[============================================================================================================>] 203,728,858 9.79MB/s   in 23s    

2017-08-29 22:44:15 (8.32 MB/s) - ‘spark-2.2.0-bin-hadoop2.7.tgz’ saved [203728858/203728858]

[root@master ~]#

(5)然后解压缩/opt目录。我们约定Linux平台下第三方软件包都放到/opt目录下。

[root@master ~]# tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz -C /opt

(6)由于Spark根目录太长,重命名一下。当然也可以不进行重命名。

[root@master ~]# mv /opt/spark-2.2.0-bin-hadoop2.7/ /opt/spark-2.2.0

1.4 Spark目录结构

[root@master ~]# cd /opt/spark-2.2.0/
[root@master spark-2.2.0]# ll
total 84
drwxr-xr-x. 2 500 500  4096 Jun 30 19:09 bin
drwxr-xr-x. 2 500 500   230 Jun 30 19:09 conf
drwxr-xr-x. 5 500 500    50 Jun 30 19:09 data
drwxr-xr-x. 4 500 500    29 Jun 30 19:09 examples
drwxr-xr-x. 2 500 500 12288 Jun 30 19:09 jars
-rw-r--r--. 1 500 500 17881 Jun 30 19:09 LICENSE
drwxr-xr-x. 2 500 500  4096 Jun 30 19:09 licenses
-rw-r--r--. 1 500 500 24645 Jun 30 19:09 NOTICE
drwxr-xr-x. 8 500 500   240 Jun 30 19:09 python
drwxr-xr-x. 3 500 500    17 Jun 30 19:09 R
-rw-r--r--. 1 500 500  3809 Jun 30 19:09 README.md
-rw-r--r--. 1 500 500   128 Jun 30 19:09 RELEASE
drwxr-xr-x. 2 500 500  4096 Jun 30 19:09 sbin
drwxr-xr-x. 2 500 500    42 Jun 30 19:09 yarn
[root@master spark-2.2.0]# 

目录

说明

bin

可执行脚本,Spark相关命令

conf

spark配置文件

data

spark自带例子用到的数据

examples

spark自带样例程序

lib

spark相关jar包

sbin

集群启停,因为spark有自带的集群环境

Spark软件包bin目录说明:

  • spark-shell :spark shell模式启动命令(脚本)
  • spark-submit:spark应用程序提交脚本(脚本)
  • run-example:运行spark提供的样例程序
  • spark-sql:spark SQL命令启动命令(脚本)

1.5 运行样例程序

[root@master1 spark-2.2.0]# bin/run-example SparkPi 4 4
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/08/29 01:27:26 INFO SparkContext: Running Spark version 2.2.0
17/08/29 01:27:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/29 01:27:26 INFO SparkContext: Submitted application: Spark Pi
17/08/29 01:27:27 INFO SecurityManager: Changing view acls to: root
17/08/29 01:27:27 INFO SecurityManager: Changing modify acls to: root
17/08/29 01:27:27 INFO SecurityManager: Changing view acls groups to: 
17/08/29 01:27:27 INFO SecurityManager: Changing modify acls groups to: 
17/08/29 01:27:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
17/08/29 01:27:27 INFO Utils: Successfully started service 'sparkDriver' on port 40549.
17/08/29 01:27:27 INFO SparkEnv: Registering MapOutputTracker
17/08/29 01:27:27 INFO SparkEnv: Registering BlockManagerMaster
17/08/29 01:27:27 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/08/29 01:27:27 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/08/29 01:27:27 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-719136e3-dc4e-4061-a07a-e5f04d679ad1
17/08/29 01:27:27 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/08/29 01:27:27 INFO SparkEnv: Registering OutputCommitCoordinator
17/08/29 01:27:27 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/08/29 01:27:27 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.180:4040
17/08/29 01:27:27 INFO SparkContext: Added JAR file:/opt/spark-2.2.0/examples/jars/scopt_2.11-3.3.0.jar at spark://192.168.1.180:40549/jars/scopt_2.11-3.3.0.jar with timestamp 1503984447798
17/08/29 01:27:27 INFO SparkContext: Added JAR file:/opt/spark-2.2.0/examples/jars/spark-examples_2.11-2.2.0.jar at spark://192.168.1.180:40549/jars/spark-examples_2.11-2.2.0.jar with timestamp 1503984447798
17/08/29 01:27:27 INFO Executor: Starting executor ID driver on host localhost
17/08/29 01:27:27 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43952.
17/08/29 01:27:27 INFO NettyBlockTransferService: Server created on 192.168.1.180:43952
17/08/29 01:27:27 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/08/29 01:27:27 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.180, 43952, None)
17/08/29 01:27:27 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.180:43952 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.180, 43952, None)
17/08/29 01:27:27 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.180, 43952, None)
17/08/29 01:27:27 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.180, 43952, None)
17/08/29 01:27:28 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/opt/spark-2.2.0/spark-warehouse').
17/08/29 01:27:28 INFO SharedState: Warehouse path is 'file:/opt/spark-2.2.0/spark-warehouse'.
17/08/29 01:27:29 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
17/08/29 01:27:29 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
17/08/29 01:27:29 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 4 output partitions
17/08/29 01:27:29 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
17/08/29 01:27:29 INFO DAGScheduler: Parents of final stage: List()
17/08/29 01:27:29 INFO DAGScheduler: Missing parents: List()
17/08/29 01:27:29 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
17/08/29 01:27:29 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 366.3 MB)
17/08/29 01:27:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1172.0 B, free 366.3 MB)
17/08/29 01:27:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.180:43952 (size: 1172.0 B, free: 366.3 MB)
17/08/29 01:27:29 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/08/29 01:27:29 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3))
17/08/29 01:27:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
17/08/29 01:27:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 4825 bytes)
17/08/29 01:27:29 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 4825 bytes)
17/08/29 01:27:29 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 4825 bytes)
17/08/29 01:27:29 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 4825 bytes)
17/08/29 01:27:29 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
17/08/29 01:27:29 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
17/08/29 01:27:29 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
17/08/29 01:27:29 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/08/29 01:27:29 INFO Executor: Fetching spark://192.168.1.180:40549/jars/scopt_2.11-3.3.0.jar with timestamp 1503984447798
17/08/29 01:27:29 INFO TransportClientFactory: Successfully created connection to /192.168.1.180:40549 after 34 ms (0 ms spent in bootstraps)
17/08/29 01:27:29 INFO Utils: Fetching spark://192.168.1.180:40549/jars/scopt_2.11-3.3.0.jar to /tmp/spark-058642cb-042f-4960-b7e9-172fc02caff8/userFiles-28264a42-00c6-42cb-8d3f-e4fe670fb272/fetchFileTemp1808807623002630899.tmp
17/08/29 01:27:29 INFO Executor: Adding file:/tmp/spark-058642cb-042f-4960-b7e9-172fc02caff8/userFiles-28264a42-00c6-42cb-8d3f-e4fe670fb272/scopt_2.11-3.3.0.jar to class loader
17/08/29 01:27:29 INFO Executor: Fetching spark://192.168.1.180:40549/jars/spark-examples_2.11-2.2.0.jar with timestamp 1503984447798
17/08/29 01:27:29 INFO Utils: Fetching spark://192.168.1.180:40549/jars/spark-examples_2.11-2.2.0.jar to /tmp/spark-058642cb-042f-4960-b7e9-172fc02caff8/userFiles-28264a42-00c6-42cb-8d3f-e4fe670fb272/fetchFileTemp3327801226116360399.tmp
17/08/29 01:27:29 INFO Executor: Adding file:/tmp/spark-058642cb-042f-4960-b7e9-172fc02caff8/userFiles-28264a42-00c6-42cb-8d3f-e4fe670fb272/spark-examples_2.11-2.2.0.jar to class loader
17/08/29 01:27:30 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 867 bytes result sent to driver
17/08/29 01:27:30 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 436 ms on localhost (executor driver) (1/4)
17/08/29 01:27:30 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 867 bytes result sent to driver
17/08/29 01:27:30 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 867 bytes result sent to driver
17/08/29 01:27:30 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 423 ms on localhost (executor driver) (2/4)
17/08/29 01:27:30 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 424 ms on localhost (executor driver) (3/4)
17/08/29 01:27:30 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 867 bytes result sent to driver
17/08/29 01:27:30 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 428 ms on localhost (executor driver) (4/4)
17/08/29 01:27:30 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/08/29 01:27:30 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.482 s
17/08/29 01:27:30 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.766385 s
Pi is roughly 3.1493878734696836
17/08/29 01:27:30 INFO SparkUI: Stopped Spark web UI at http://192.168.1.180:4040
17/08/29 01:27:30 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/08/29 01:27:30 INFO MemoryStore: MemoryStore cleared
17/08/29 01:27:30 INFO BlockManager: BlockManager stopped
17/08/29 01:27:30 INFO BlockManagerMaster: BlockManagerMaster stopped
17/08/29 01:27:30 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/08/29 01:27:30 INFO SparkContext: Successfully stopped SparkContext
17/08/29 01:27:30 INFO ShutdownHookManager: Shutdown hook called
17/08/29 01:27:30 INFO ShutdownHookManager: Deleting directory /tmp/spark-058642cb-042f-4960-b7e9-172fc02caff8
[root@master1 spark-2.2.0]# 

可以看到运行结果:Pi is roughly 3.1493878734696836

1.6 初识spark-shell

进入spark-shell

[root@master spark-2.2.0]# bin/spark-shell 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/08/28 23:32:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/28 23:32:50 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.1.180:4040
Spark context available as 'sc' (master = local[*], app id = local-1503977564935).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

从上面的spark-shell日志中可以看到Spark context Web UI available at http://192.168.1.180:4040,表明spark-shell启动了一个WebUI,在浏览器地址栏输入http://192.168.1.180:4040即可打开。

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏FreeBuf

世界5大顶级PC厂商为什么会屡曝安全问题?

像微软这类知名的开发商通常会付诸大量精力在操作系统和应用的安全问题上,一旦发现问题,立马修复并及时通知用户,以确保漏洞不会被黑客恶意利用。但是这样就安全了嘛? ...

1697
来自专栏BestSDK

微软最新版SDK显示:正秘密研发折叠设备,可三屏同时显示!

众所周知,微软代号仙女座的可折叠折叠设备存在已经被专利证实。尽管从未发布官方声明,但微软现在在Windows 10的最新SDK预览版中添加了更多证据。

1094
来自专栏安恒信息

研究人员发现攻击4G无线上网卡和SIM卡的方法

Positive Technologies的研究者在欧洲黑客联盟(Chaos Computer Club)会议上披露了4G USB无线上网卡中存在漏洞,攻击者可...

2486
来自专栏安智客

采用开源OP-TEE 的芯片厂商

OP-TEE (Open Portable Trusted Execution Environment)。 OP-TEE 是一个开源工程,完整的实现了一...

3639
来自专栏跟着阿笨一起玩NET

设置U盘为第一启动顺序

本文转载:http://u.diannaodian.com/Article/1004.html

651
来自专栏FreeBuf

谁蹭了我的WiFi?浅谈家用无线路由器攻防

家用无线路由器作为家庭里不可或缺的网络设备,在给普通人带来极大便利的同时,也给处于互联网时代的我们带来了很多安全隐患,本文将针对普通家用无线路由器的常见攻击过程...

2037
来自专栏安恒信息

微软证实0day漏洞影响IE10和IE9,已发布紧急修复补丁

上周,微软证实了IE10曝出一个“0-day漏洞”(CNNVD-201402-209)的消息,并且该漏洞已经被用于攻击中。今天,该公司再发安全警...

3166
来自专栏FreeBuf

macOS 0-day漏洞详情披露,可被利用完全接管系统

2017 年 12 月 31 日,一名推特账号为 Siguza 的安全研究人员公布了 macOS 0-day 漏洞的详情。该漏洞是一个本地提权漏洞,影响到所有 ...

1737
来自专栏黑白安全

macOS High Sierra 以明文存储外部 APFS 驱动密码

根据 Mac 取证专家 Sarah Edwards 的报告,新版 macOS High Sierra 再次出现 APFS(苹果文件系统)漏洞,以明文形式记录 A...

643
来自专栏大数据学习笔记

scala.Predef$.$scope()Lscala/xml/TopScope$和not found: type Application异常

intellij idea+scala+spark开发的程序之前一直正常,今天提示下面错误。 问题1 java.lang.NoSuchMethodError: ...

3677

扫码关注云+社区