[2017-03-03 19:42:42,812] INFO Client session timed out, have not heard from server in 6666ms for sessionid 0x35a933430d50004, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(192.168.91.99:2181)) [2017-03-03 19:42:42,912] INFO State change: SUSPENDED (org.apache.curator.framework.state.ConnectionStateManager:pool-1-thread-1-EventThread) [2017-03-03 19:42:42,947] INFO Opening socket connection to server 192.168.52.92/192.168.52.92:2181. Will not attempt to authenticate using SASL (unknown error ) (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(192.168.52.92:2181)) [2017-03-03 19:42:42,948] INFO Socket connection established to 192.168.92/192.168.52.92:2181, initiating session (org.apache.zookeeper.ClientCnxn:pool-1-th read-1-SendThread(10.125.52.92:2181)) [2017-03-03 19:42:42,951] INFO Session establishment complete on server 192.168.52.92/192.168.52.92:2181, sessionid = 0x35a933430d50004, negotiated timeout = 1 0000 (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(192.168.52.92:2181)) [2017-03-03 19:42:42,951] INFO State change: RECONNECTED (org.apache.curator.framework.state.ConnectionStateManager:pool-1-thread-1-EventThread) [2017-03-03 19:42:42,953] INFO Leader defeated. New leader: 192.168.48.125:8080 (mesosphere.marathon.core.election.impl.CuratorElectionService:pool-1-thread-1 ) [2017-03-03 19:42:42,957] INFO Deleting existing tombstone for old twitter commons leader election (mesosphere.marathon.core.election.impl.CuratorElectionSer vice:pool-1-thread-1) [2017-03-03 19:42:42,959] INFO Lost leadership (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$ea97d137:pool-1-thread-1) [2017-03-03 19:42:42,959] INFO All actors suspended: * Actor[akka://marathon/user/taskTracker#989799113] * Actor[akka://marathon/user/reviveOffersWhenWanted#-1681045213] * Actor[akka://marathon/user/taskKillServiceActor#-1306622116] * Actor[akka://marathon/user/launchQueue#819767243] * Actor[akka://marathon/user/offersWantedForReconciliation#-2099816564] * Actor[akka://marathon/user/rateLimiter#503420309] * Actor[akka://marathon/user/groupManager#-752628876] * Actor[akka://marathon/user/offerMatcherLaunchTokens#-562928907] * Actor[akka://marathon/user/killOverdueStagedTasks#-1773633501] * Actor[akka://marathon/user/offerMatcherManager#123957678] * Actor[akka://marathon/user/expungeOverdueLostTasks#-1479038444] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.de fault-dispatcher-9) [2017-03-03 19:42:42,960] INFO Stopping driver (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$ea97d137:pool-1-thread-1) I0303 19:42:42.960778 10617 sched.cpp:1987] Asked to stop the driver I0303 19:42:42.961051 10679 sched.cpp:1187] Stopping framework '041eee2c-d32b-413b-931e-dc1f47a97971-0000' [2017-03-03 19:42:42,961] ERROR Terminating after loss of leadership (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$ea97d137:pool-1-thread-1 ) [2017-03-03 19:42:42,961] INFO ExpungeOverdueLostTasksActor has stopped (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.a ctor.default-dispatcher-19) [2017-03-03 19:42:42,964] INFO Driver future completed with result=Success(()). (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$ea97d137:Fork JoinPool-2-worker-37) [2017-03-03 19:42:42,964] INFO Stopped appTaskLaunchActor for /php-test version 2017-03-03T09:45:32.125Z (mesosphere.marathon.core.launchqueue.impl.TaskLaunc herActor:marathon-akka.actor.default-dispatcher-21) [2017-03-03 19:42:42,964] INFO Call postDriverRuns callbacks on EntityStoreCache(MarathonStore(app:)), EntityStoreCache(MarathonStore(group:)), EntityStoreCa che(MarathonStore(deployment:)), EntityStoreCache(MarathonStore(framework:)), EntityStoreCache(MarathonStore(taskFailure:)), EntityStoreCache(MarathonStore(e vents:)) (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$ea97d137:ForkJoinPool-2-worker-37) [2017-03-03 19:42:42,965] INFO Finished postDriverRuns callbacks (mesosphere.marathon.MarathonSchedulerService$$EnhancerByGuice$$ea97d137:ForkJoinPool-2-work er-37) [2017-03-03 19:42:42,965] INFO Shutting down services (mesosphere.marathon.Main$:shutdownHook1) [2017-03-03 19:42:42,965] INFO Shutting down actor system akka://marathon (mesosphere.marathon.core.base.ActorsModule:Thread-3) (END)
这个问题是这个样子,如果你的zookeeper集群不稳定,而且此前有部署过marathon集群,这下就经常会出现这种问题。marathon如果开启集群模式(--ha=true),如果zookeeper集群的节点连接出现延迟的问题或者其他问题,进而marathon无法确定其他节点的情况,失去竞选能力,然后自我毁灭。 zookeeper部署的时候要格外注意跟marathon集群的结合,另外如果你不启用marathon的集群模式,你最好关闭marathon的集群模式。
谨记一点,Marathon的选举依赖zookeeper
本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。
我来说两句