我正在尝试使用Spark 2.0.1在8台RHEL7.3 x86机器上设置一个8节点集群。start-master.sh .sh可以很好地通过:
Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.102-4.b14.el7.x86_64/jre/bin/java -cp /usr/local/bin/spark-2.0.1-bin-hadoop2.7/conf/:/usr/local/bin/spark-2.0.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host lambda.foo.net --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/12/08 04:26:46 INFO Master: Started daemon with process name: 22181@lambda.foo.net
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for TERM
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for HUP
16/12/08 04:26:46 INFO SignalUtils: Registered signal handler for INT
16/12/08 04:26:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/08 04:26:46 INFO SecurityManager: Changing view acls to: root
16/12/08 04:26:46 INFO SecurityManager: Changing modify acls to: root
16/12/08 04:26:46 INFO SecurityManager: Changing view acls groups to:
16/12/08 04:26:46 INFO SecurityManager: Changing modify acls groups to:
16/12/08 04:26:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
16/12/08 04:26:46 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
16/12/08 04:26:46 INFO Master: Starting Spark master at spark://lambda.foo.net:7077
16/12/08 04:26:46 INFO Master: Running Spark version 2.0.1
16/12/08 04:26:46 INFO Utils: Successfully started service 'MasterUI' on port 8080.
16/12/08 04:26:46 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://19.341.11.212:8080
16/12/08 04:26:46 INFO Utils: Successfully started service on port 6066.
16/12/08 04:26:46 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066
16/12/08 04:26:46 INFO Master: I have been elected leader! New state: ALIVE但是,当我使用start-slves.sh尝试启动工作进程时,我在工作进程日志中看到的是:
Spark Command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.102-4.b14.el7.x86_64/jre/bin/java -cp /usr/local/bin/spark-2.0.1-bin-hadoop2.7/conf/:/usr/local/bin/spark-2.0.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://lambda.foo.net:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/12/08 04:30:00 INFO Worker: Started daemon with process name: 14649@hawk040os4.foo.net
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for TERM
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for HUP
16/12/08 04:30:00 INFO SignalUtils: Registered signal handler for INT
16/12/08 04:30:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/08 04:30:00 INFO SecurityManager: Changing view acls to: root
16/12/08 04:30:00 INFO SecurityManager: Changing modify acls to: root
16/12/08 04:30:00 INFO SecurityManager: Changing view acls groups to:
16/12/08 04:30:00 INFO SecurityManager: Changing modify acls groups to:
16/12/08 04:30:00 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
16/12/08 04:30:00 INFO Utils: Successfully started service 'sparkWorker' on port 35858.
16/12/08 04:30:00 INFO Worker: Starting Spark worker 15.242.22.179:35858 with 24 cores, 1510.2 GB RAM
16/12/08 04:30:00 INFO Worker: Running Spark version 2.0.1
16/12/08 04:30:00 INFO Worker: Spark home: /usr/local/bin/spark-2.0.1-bin-hadoop2.7
16/12/08 04:30:00 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
16/12/08 04:30:00 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://15.242.22.179:8081
16/12/08 04:30:00 INFO Worker: Connecting to master lambda.foo.net:7077...
16/12/08 04:30:00 WARN Worker: Failed to connect to master lambda.foo.net:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
        at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
        at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
        at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
        at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:96)
        at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:216)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to connect to lambda.foo.net/19.341.11.212:7077
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
        at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
        ... 4 more
Caused by: java.net.NoRouteToHostException: No route to host: lambda.foo.net/19.341.11.212:7077
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        ... 1 more
16/12/08 04:30:12 INFO Worker: Retrying connection to master (attempt # 1)
16/12/08 04:30:12 INFO Worker: Connecting to master lambda.foo.net:7077...
16/12/08 04:30:12 WARN Worker: Failed to connect to master lambda.foo.net:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
        at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)所以它说“没有到主机的路由”。但是我可以成功地从工作节点ping主节点,以及从工作节点ping主节点的ssh。
为什么spark说“没有到主机的路径”?
发布于 2017-02-14 04:44:04
问题已解决:防火墙阻止了数据包。
https://stackoverflow.com/questions/41038930
复制相似问题