首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >为什么lines.map不工作,但lines.take.map工作,在星火?

为什么lines.map不工作,但lines.take.map工作,在星火?
EN

Stack Overflow用户
提问于 2013-12-20 08:53:56
回答 1查看 457关注 0票数 1

我是斯卡拉和斯派克的新手。

我正在和SparkHdfsLR.scala代码一起练习。

但我在这部分代码中遇到了一个问题:

代码语言:javascript
运行
复制
60    val lines = sc.textFile(inputPath)
61    val points = lines.map(parsePoint _).cache()
62    val ITERATIONS = args(2).toInt

第61行不工作。在我把它改成这个之后:

代码语言:javascript
运行
复制
60    val lines = sc.textFile(inputPath)
61    val points = lines.take(149800).map(parsePoint _)  //149800 is the total number of lines
62    val ITERATIONS = args(2).toInt

来自sbt运行的错误消息是:

代码语言:javascript
运行
复制
[error] (run-main) org.apache.spark.SparkException: Job failed: Task 0.0:1 failed more than 4 times
org.apache.spark.SparkException: Job failed: Task 0.0:1 failed more than 4 times
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
java.lang.RuntimeException: Nonzero exit code: 1
at scala.sys.package$.error(package.scala:27)
[error] {file:/var/sdb/home/tim.tan/workspace/spark/}default-d3d73f/compile:run: Nonzero exit code: 1
[error] Total time: 52 s, completed Dec 20, 2013 5:42:18 PM

任务节点的std错误是:

代码语言:javascript
运行
复制
13/12/20 17:42:16 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/12/20 17:42:16 INFO executor.StandaloneExecutorBackend: Connecting to driver: akka://spark@SHXJ-H07-SDB06:38975/user/StandaloneScheduler
13/12/20 17:42:17 INFO executor.StandaloneExecutorBackend: Successfully registered with driver
13/12/20 17:42:17 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/12/20 17:42:17 INFO spark.SparkEnv: Connecting to BlockManagerMaster: akka://spark@SHXJ-H07-SDB06:38975/user/BlockManagerMaster
13/12/20 17:42:17 INFO storage.MemoryStore: MemoryStore started with capacity 323.9 MB.
13/12/20 17:42:17 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20131220174217-be8e
13/12/20 17:42:17 INFO network.ConnectionManager: Bound socket to port 52043 with id = ConnectionManagerId(TS-BH90,52043)
13/12/20 17:42:17 INFO storage.BlockManagerMaster: Trying to register BlockManager
13/12/20 17:42:17 INFO storage.BlockManagerMaster: Registered BlockManager
13/12/20 17:42:17 INFO spark.SparkEnv: Connecting to MapOutputTracker: akka://spark@SHXJ-H07-SDB06:38975/user/MapOutputTracker
13/12/20 17:42:17 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-1b1a6c0b-965e-4834-a3d3-554c95442041
13/12/20 17:42:17 INFO server.Server: jetty-7.x.y-SNAPSHOT
13/12/20 17:42:17 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:41811
13/12/20 17:42:18 ERROR executor.StandaloneExecutorBackend: Driver terminated or disconnected! Shutting down.

登录工作人员如下:

代码语言:javascript
运行
复制
13/12/19 17:49:26 INFO worker.Worker: Asked to launch executor app-20131219174926-0001/2 for SparkHdfsLR
13/12/19 17:49:26 INFO worker.ExecutorRunner: Launch command: "java" "-cp" ":/var/bh/spark/conf:/var/bh/spark/assembly/target/scala-2.9.3/spark-assembly-0.8.0-incubating-hadoop1.0.3.jar:/var/bh/spark/core/target/scala-2.9.3/test-classes:/var/bh/spark/repl/target/scala-2.9.3/test-classes:/var/bh/spark/mllib/target/scala-2.9.3/test-classes:/var/bh/spark/bagel/target/scala-2.9.3/test-classes:/var/bh/spark/streaming/target/scala-2.9.3/test-classes" "-Djava.library.path=/var/bh/hadoop/lib/native/Linux-amd64-64/" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@SHXJ-H07-SDB06:56158/user/StandaloneScheduler" "2" "TS-BH87" "8"
13/12/19 17:49:30 INFO worker.Worker: Asked to kill executor app-20131219174926-0001/2
13/12/19 17:49:30 INFO worker.ExecutorRunner: Runner thread for executor app-20131219174926-0001/2 interrupted
13/12/19 17:49:30 INFO worker.ExecutorRunner: Killing process!

看来工作负载没有成功启动。

我也不知道原因。有人能给我提个建议吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-12-23 02:53:21

我找到了为什么它不起作用。

由于配置不好,火花只能在独立模式下工作。更正配置,如果要在分布式模式下运行代码,最后两个参数必须针对函数SparkContext:

代码语言:javascript
运行
复制
new SparkContext(master, jobName, [sparkHome], [jars])

如果最后两个参数不是特定的,scala脚本只能在独立模式下工作。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/20699662

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档