当我在kubernetes(v1.15.2)集群中启动apache flink 1.10任务管理器服务时,它显示如下日志:
2020-05-01 08:34:55,847 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager..
2020-05-01 08:34:55,847 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
2020-05-01 08:34:55,848 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.NoRouteToHostException: No route to host]
2020-05-01 08:35:08,874 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
2020-05-01 08:35:08,877 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.NoRouteToHostException: No route to host]
2020-05-01 08:35:08,878 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager..
2020-05-01 08:35:21,907 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
任务管理器无法注册成功,于是我登录到taskmanager,发现我可以成功ping jobmanager liket:
flink@flink-taskmanager-54d85f57c7-nl9cf:~$ ping flink-jobmanager
PING flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171) 56(84) bytes of data.
64 bytes from flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171): icmp_seq=1 ttl=64 time=0.045 ms
64 bytes from flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171): icmp_seq=2 ttl=64 time=0.076 ms
64 bytes from flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171): icmp_seq=3 ttl=64 time=0.079 ms
那么为什么会发生这种情况,我应该做些什么来修复它呢?
发布于 2020-05-02 13:41:49
尝试在kubernetes taskmanger的pod容器中安装nmap:
apt-get udpate
apt-get install nmap -y
然后扫描jobmanager并确保pod的公开端口6123是可访问的(在我的示例中,我发现无法从当前pod访问端口6123 )。
nmap -T4 <your-jobmanager's-pod-ip>
希望这能有所帮助。
https://stackoverflow.com/questions/61539300
复制相似问题