最近marathon跑着跑着就进程没影了,我用的版本还算比较稳定,按理不应该啊,挂掉之前日志如下:
[2017-12-08 14:52:40,330] INFO Client session timed out, have not heard from server in 6668ms for sessionid 0x15e230ba348134c, closing socket connection and a
ttempting reconnect (org.apache.zookeeper.ClientCnxn:pool-1-thread-1-SendThread(192.168.1.164:2181))
[2017-12-08 14:52:40,360] INFO Received health result for app [/tapdapi-pro] version [2017-12-07T02:17:20.059Z]: [Healthy(task [tapdapi-pro.c4a1c1d3-daf4-11e7
-950f-00e081dccf47],2017-12-07T02:17:20.059Z,2017-12-08T06:52:40.360Z,true)] (mesosphere.marathon.core.health.impl.HealthCheckActor:marathon-akka.actor.defaul
t-dispatcher-7)
[2017-12-08 14:52:40,444] INFO State change: SUSPENDED (org.apache.curator.framework.state.ConnectionStateManager:pool-1-thread-1-EventThread)
[2017-12-08 14:52:40,444] ERROR ZooKeeper access failed - Committing suicide to avoid invalidating ZooKeeper state (mesosphere.marathon.core.election.impl.Cur
atorElectionService:qtp477319344-1575648)
[2017-12-08 14:52:40,490] ERROR error while getting current leader (mesosphere.marathon.core.election.impl.CuratorElectionService:qtp477319344-1575648)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /marathon/leader-curator
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
...
从日志中可以看出,marathon连接zookeeper集群出现了超时的现象,重新连接之后,让怕自己的本地数据不是最新的,当不了老大,进而自裁 所以如果要避免这个问题,就要关闭他的集群高可用模式 在启动marathon的时候要增加--disable_ha参数