我已经在独立模式下配置了一个spark集群。我可以看到两个工作进程都在运行,但是当我启动一个spark-shell时,我遇到了这个问题: spark集群的配置是自动的。
val lines=sc.parallelize(List(1,2,3,4))这项工作没问题,新的rdd已经创建,但当我开始下一项任务时。
lines.take(2).foreach(println) 我有一个我不能解决的错误:
输出:
 16/02/18 10:27:02 INFO DAGScheduler: Got job 0 (take at :24) with 1 output partitions
 16/02/18 10:27:02 INFO DAGScheduler: Final stage: ResultStage 0 (take at :24)
 16/02/18 10:27:02 INFO DAGScheduler: Parents of final stage: List() 
 16/02/18 10:27:02 INFO DAGScheduler: Missing parents: List() 
 16/02/18 10:27:02 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at :21), which has no missing parents
 16/02/18 10:27:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1288.0 B, free 1288.0 B)
 16/02/18 10:27:04 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 856.0 B, free 2.1 KB)一分半钟后:
16/02/18 10:28:43 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(0,java.io.IOException: Failed to create directory /srv/spark/work/app-20160218102438-0000/0)] in 2 attempts org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) 
    at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
    at scala.util.Try$.apply(Try.scala:161)
    at scala.util.Failure.recover(Try.scala:185)
    at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324) 
    at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324) 
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
    at org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293) 
    at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133) 
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) 
    at scala.concurrent.Promise$class.complete(Promise.scala:55) 
    at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153) 
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) 
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) 
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658)
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) 
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634) 
    at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) 
    at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685) 
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) 
    at scala.concurrent.Promise$class.tryFailure(Promise.scala:112) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153) 
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:241) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds 
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:242) ... 7 more在一个worker中,我可以看到这个日志错误:
Invalid maximum head size: -Xmxm0M could not create JVM我还可以看到一些问题,我认为这与绑定端口或类似的问题有关。
发布于 2016-08-26 21:24:16
问题可能出在spark集群和客户端实例之间的端口可见性区域。
这对用户来说很奇怪,但这就是Spark架构的特点--每个Spark节点都应该看到SparkContext配置中spark.driver.port定义的客户端实例和特定端口。默认情况下,此选项为空,这意味着将随机选择此端口。因此,在默认配置下,每个Spark节点都需要看到客户端实例的任何端口。但您可以覆盖spark.driver.port。
例如,如果您的客户端机器在防火墙之后或在docker容器内,这可能是一个问题。您需要在外部打开此端口
https://stackoverflow.com/questions/35500310
复制相似问题