前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >kafka问题的一次不负责任排查

kafka问题的一次不负责任排查

作者头像
陈猿解码
发布2023-02-28 14:52:16
5640
发布2023-02-28 14:52:16
举报
文章被收录于专栏:陈猿解码

2022的第一篇文章,分享总结下一个kafka问题的排查过程。

【问题现象】


元旦前,某测试环境因为网络不稳定,出现了kafka与zookeeper连接断开后,没有重连。

【问题定位】


初步查看了下日志,发现kafka与zookeeper之间出现了会话超时,第一次超时有正确进行重连,之后再次出现会话超时,在这之后就没有进行重连了。

而实际查看zookeeper本身没有任何异常,无解,还是得结合源码来分析。

首先,kafka内部与zookeeper的交互是通过创建一个Zookeeper类的实例对象来进行通信的,同时通过传递一个watch对象(ZookeeperClientWatch),作为事件的回调处理。

代码语言:javascript
复制
@volatile private var zooKeeper = new ZooKeeper(connectString, sessionTimeoutMs, ZooKeeperClientWatcher)

Expire事件(会话超时会产生该事件)发生时,回调该对象的process方法进行处理

在该方法中,由单独的线程负责进行重连。

具体的重连逻辑为,判断zookeeper客户端中的状态,然后关闭客户端,并重建对象以进行重连。

注意,处理超时事件时(scheduleSessionExpiryHandler方法中)有会话超时的日志打印;同时重连时会有一个初始化新会话的日志打印。

分析到这里,一开始会认为是zookeeper客户端中的状态不对,导致没有进入if分支,继而没有进行重连。朝着这个方向,又仔细分析了zookeeper客户端中的代码,发现并不会出现该问题,因为在出现会话超时的时候,首先就将状态进行了变更,然后才回调。

到这里,感觉有点无头绪了,回想当时排查问题时,将kafka内部的线程堆栈信息都打印出来了,索性拿出来看看,结果发现,有这么一个堆栈信息:

从堆栈来看,第二次会话超时确实进入了reinitialize方法,但一直卡在foreach循环中,也就是该方法的第一行调用中。

那么在这里又是为什么会卡住呢,继续分析代码

在callBeforeInitializingSession中,调用handler的beforeInitializingSession

该handler是KafkaController通过调用registerStateChangeHandler注册进来的

也就是说,在重连之前最终会调用kafkaController中的beforeInitializingSession方法,在这个方法中,会new一个Expire事件,然后将该事件放到事件队列中,由事件线程驱动回调处理,而自身则阻塞直到事件处理完成。

结合代码,再对照上面的堆栈信息,可以看到确实是这么个流程。

那问题又来了,为什么该事件会一直没有被处理呢?事件只要处理就会回调process方法,在该方法中通过将计数器减1,阻塞的线程感知计数器变0,也就不会再阻塞,可以继续往下运行了。

再来看事件处理线程的堆栈:

什么?事件处理线程同样也是在卡主死等?

那么就只有一种可能来解释:事件处理线程在处理其他某个事件一直没有返回,因此重新初始化的时候expire事件丢到队列中,一直没有被事件处理线程拿出来处理。从而引起会话过期处理线程也卡住导致没有重连zookeeper!

顺着堆栈信息,跟踪相关代码调用链,发现kafka在连接zookeeper之后,会进行一系列的交互处理(注意之前reinitialize方法中的最后一行代码),包括向zookeeper发送请求,并同步等待直到拿到响应才继续后续的逻辑处理。

handleRequests方法中的那段代码,个人理解为:涉及两个线程的交互:

线程1调用send方法,将数据打包,最终调用zookeeper客户端进行处理(实际上是放到zookeeper客户端内部的一个outgoing队列中,再由zookeeper客户端内部的sendThread进行发送),然后同步等待(这个线程就是事件处理线程)。

线程2通过send方法中的processResult回调将响应放到responseQueue中,然后将计数器减1,当所有请求都回调完成时,计数器归0,上面的线程阻塞解除继续后续逻辑处理。

如果在回调过程中出现异常,是不是就无法调用

代码语言:javascript
复制
countDownLatch.countDown()

从而导致出现事件线程卡主,连锁引起会话超时处理线程卡主,不再重连zookeeper。

而实际日志中,也确实发现有这么一个异常,也恰好是在processResult的处理过程中。

代码语言:javascript
复制
// 第一次超时
[2021-12-27 21:22:48.416][INFO][zk-session-expiry-handler0][Logging.scala:info:66] , [ZooKeeperClient] Session expired.
// 准备重连
[2021-12-27 21:22:48.418][INFO][zk-session-expiry-handler0][Logging.scala:info:66] , [ZooKeeperClient] Initializing a new session to 192.168.224.249:2181.
[2021-12-27 21:22:48.418][INFO][zk-session-expiry-handler0][ZooKeeper.java:<init>:442] , Initiating client connection, connectString=192.168.224.249:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@97e93f1
[2021-12-27 21:22:48.421][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:logStartConnect:1029] , Opening socket connection to server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181. Will not attempt to authenticate using SASL (unknown error)
[2021-12-27 21:22:48.421][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:primeConnection:879] , Socket connection established to dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, initiating session
[2021-12-27 21:22:48.426][INFO][controller-event-thread][Logging.scala:info:66] , Creating /brokers/ids/0 (is it secure? false)
[2021-12-27 21:22:49.419][INFO][executor-Rebalance][Logging.scala:info:66] , [GroupCoordinator 0]: Stabilized group syncGroup generation 50 (__consumer_offsets-20)
[2021-12-27 21:22:49.421][INFO][kafka-request-handler-2][Logging.scala:info:66] , [GroupCoordinator 0]: Assignment received from leader for group syncGroup for generation 50
[2021-12-27 21:22:51.014][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:onConnected:1303] , Session establishment complete on server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, sessionid = 0x1046353e4b50005, negotiated timeout = 6000
[2021-12-27 21:22:52.429][INFO][controller-event-thread][Logging.scala:info:66] , Result of znode creation at /brokers/ids/0 is: OK
[2021-12-27 21:22:52.430][INFO][controller-event-thread][Logging.scala:info:66] , Registered broker 0 at path /brokers/ids/0 with addresses: ArrayBuffer(EndPoint(192.168.224.249,9092,ListenerName(PLAINTEXT),PLAINTEXT))
[2021-12-27 21:23:00.739][WARN][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1112] , Client session timed out, have not heard from server in 4001ms for sessionid 0x1046353e4b50005

// 会话超时 准备重连 (zookeeper客户端中的打印)
[2021-12-27 21:23:00.739][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1160] , Client session timed out, have not heard from server in 4001ms for sessionid 0x1046353e4b50005, closing socket connection and attempting reconnect
// 第一次重连成功之后的请求的回调处理
[2021-12-27 21:23:00.840][ERROR][zk-session-expiry-handler0-EventThread][ClientCnxn.java:processEvent:641] , Caught unexpected throwable
java.lang.NullPointerException
  at kafka.zookeeper.ZooKeeperClient$$anon$10.processResult(ZooKeeperClient.scala:234)
  at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:633)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
[2021-12-27 21:23:02.155][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:logStartConnect:1029] , Opening socket connection to server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181. Will not attempt to authenticate using SASL (unknown error)
[2021-12-27 21:23:02.156][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:primeConnection:879] , Socket connection established to dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, initiating session
[2021-12-27 21:23:02.158][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:onConnected:1303] , Session establishment complete on server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, sessionid = 0x1046353e4b50005, negotiated timeout = 6000
[2021-12-27 21:23:06.159][WARN][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1112] , Client session timed out, have not heard from server in 4002ms for sessionid 0x1046353e4b50005
[2021-12-27 21:23:06.159][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1160] , Client session timed out, have not heard from server in 4002ms for sessionid 0x1046353e4b50005, closing socket connection and attempting reconnect
[2021-12-27 21:23:08.382][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:logStartConnect:1029] , Opening socket connection to server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181. Will not attempt to authenticate using SASL (unknown error)
[2021-12-27 21:23:08.383][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:primeConnection:879] , Socket connection established to dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, initiating session
[2021-12-27 21:23:08.384][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:onConnected:1303] , Session establishment complete on server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, sessionid = 0x1046353e4b50005, negotiated timeout = 6000
[2021-12-27 21:23:09.788][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-0][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.219.98:58044-16010
[2021-12-27 21:23:09.788][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-1][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.219.98:58046-16010
[2021-12-27 21:23:12.385][WARN][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1112] , Client session timed out, have not heard from server in 4001ms for sessionid 0x1046353e4b50005
[2021-12-27 21:23:12.385][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1160] , Client session timed out, have not heard from server in 4001ms for sessionid 0x1046353e4b50005, closing socket connection and attempting reconnect
[2021-12-27 21:23:14.475][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:logStartConnect:1029] , Opening socket connection to server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181. Will not attempt to authenticate using SASL (unknown error)
[2021-12-27 21:23:14.476][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:primeConnection:879] , Socket connection established to dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, initiating session
[2021-12-27 21:23:14.522][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:onConnected:1303] , Session establishment complete on server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, sessionid = 0x1046353e4b50005, negotiated timeout = 6000
[2021-12-27 21:23:53.546][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-2][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.5.215:39062-16009
[2021-12-27 21:23:53.546][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-1][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.4.40:55394-16003
[2021-12-27 21:23:53.547][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-1][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.6.180:60390-16011
[2021-12-27 21:23:53.547][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-0][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.3.19:46474-16001
[2021-12-27 21:23:53.547][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-0][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.5.95:34212-16002
[2021-12-27 21:23:53.548][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-2][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.2.189:40102-16011
[2021-12-27 21:23:55.580][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-0][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.155.109:36376-16014
[2021-12-27 21:23:58.006][WARN][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1112] , Client session timed out, have not heard from server in 4168ms for sessionid 0x1046353e4b50005
[2021-12-27 21:23:58.006][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1160] , Client session timed out, have not heard from server in 4168ms for sessionid 0x1046353e4b50005, closing socket connection and attempting reconnect
[2021-12-27 21:23:59.416][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:logStartConnect:1029] , Opening socket connection to server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181. Will not attempt to authenticate using SASL (unknown error)
[2021-12-27 21:23:59.416][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:primeConnection:879] , Socket connection established to dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, initiating session
[2021-12-27 21:23:59.418][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:onConnected:1303] , Session establishment complete on server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, sessionid = 0x1046353e4b50005, negotiated timeout = 6000
[2021-12-27 21:24:02.900][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-0][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.219.98:58792-16013
[2021-12-27 21:24:04.153][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-1][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.219.98:58794-16013
[2021-12-27 21:24:04.153][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-0][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.219.98:59516-16017
[2021-12-27 21:24:27.679][WARN][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1112] , Client session timed out, have not heard from server in 4240ms for sessionid 0x1046353e4b50005
[2021-12-27 21:24:27.680][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1160] , Client session timed out, have not heard from server in 4240ms for sessionid 0x1046353e4b50005, closing socket connection and attempting reconnect
[2021-12-27 21:24:29.089][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:logStartConnect:1029] , Opening socket connection to server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181. Will not attempt to authenticate using SASL (unknown error)
[2021-12-27 21:24:29.090][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:primeConnection:879] , Socket connection established to dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, initiating session
[2021-12-27 21:24:29.091][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:onConnected:1303] , Session establishment complete on server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, sessionid = 0x1046353e4b50005, negotiated timeout = 6000
[2021-12-27 21:24:41.661][WARN][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1112] , Client session timed out, have not heard from server in 8564ms for sessionid 0x1046353e4b50005
[2021-12-27 21:24:41.661][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1160] , Client session timed out, have not heard from server in 8564ms for sessionid 0x1046353e4b50005, closing socket connection and attempting reconnect
[2021-12-27 21:24:41.662][INFO][executor-Heartbeat][Logging.scala:info:66] , [GroupCoordinator 0]: Member consumer-alarm-consumer-group-3-49755a10-da70-4b0d-9202-66ac210c4c4f in group alarm-consumer-group has failed, removing it from the group
[2021-12-27 21:24:41.665][INFO][executor-Heartbeat][Logging.scala:info:66] , [GroupCoordinator 0]: Preparing to rebalance group alarm-consumer-group in state PreparingRebalance with old generation 4 (__consumer_offsets-38) (reason: removing member consumer-alarm-consumer-group-3-49755a10-da70-4b0d-9202-66ac210c4c4f on heartbeat expiration)
[2021-12-27 21:24:41.666][INFO][executor-Heartbeat][Logging.scala:info:66] , [GroupCoordinator 0]: Group alarm-consumer-group with generation 5 is now empty (__consumer_offsets-38)
[2021-12-27 21:24:41.666][WARN][kafka-network-thread-0-ListenerName(PLAINTEXT)-PLAINTEXT-1][Logging.scala:warn:70] , Attempting to send response via channel for which there is no open connection, connection id 192.168.224.249:9092-192.168.155.109:37028-16016
[2021-12-27 21:24:41.773][INFO][kafka-request-handler-3][Logging.scala:info:66] , [GroupCoordinator 0]: Preparing to rebalance group alarm-consumer-group in state PreparingRebalance with old generation 5 (__consumer_offsets-38) (reason: Adding new member consumer-alarm-consumer-group-3-7c5f4581-912b-4e42-9e13-91b6545c2e3c)
[2021-12-27 21:24:42.936][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:logStartConnect:1029] , 
Opening socket connection to server dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181. Will not attempt to authenticate using SASL (unknown error)

[2021-12-27 21:24:42.937][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:primeConnection:879] , 
Socket connection established to dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local/192.168.224.249:2181, initiating session

[2021-12-27 21:24:42.938][WARN][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:onConnected:1289] , 
Unable to reconnect to ZooKeeper service, session 0x1046353e4b50005 has expired

// 会话再次超时 无重连的日志打印
[2021-12-27 21:24:42.938][INFO][zk-session-expiry-handler0][Logging.scala:info:66] , [ZooKeeperClient] Session expired.
[2021-12-27 21:24:42.938][INFO][zk-session-expiry-handler0-EventThread][ClientCnxn.java:run:522] , EventThread shut down for session: 0x1046353e4b50005
[2021-12-27 21:24:42.938][INFO][zk-session-expiry-handler0-SendThread(dcp-middleware-hdp-dcpkafka-service-0.dcp-middleware-hdp-dcpkafka-service.dip-auto-test.svc.cluster.local:2181)][ClientCnxn.java:run:1158] , Unable to reconnect to ZooKeeper service, session 0x1046353e4b50005 has expired, closing socket connection

同样,controller日志最后的内容为向zookeeper发送删除请求。

代码语言:javascript
复制
2021-12-27 21:22:57.342][INFO][controller-event-thread][Logging.scala:info:66] , [Controller id=0] Ready to serve as the new controller with epoch 9
[2021-12-27 21:22:57.342][INFO][controller-event-thread][Logging.scala:info:66] , [Controller id=0] Removing partitions Set() from the list of reassigned partitions in zookeeper
[2021-12-27 21:22:57.342][INFO][controller-event-thread][Logging.scala:info:66] , [Controller id=0] No more partitions need to be reassigned. Deleting zk path /admin/reassign_partitions

种种迹象表明,如果在请求的回调响应过程中出现异常,可能会导致线程的死等。这样就可以解释为什么没有重连,堆栈的情况也就可以清楚交代了。至于为什么会出现空指针异常,这里没有去深究,后续有空再深入进行研究。

好了,小结一下,本文从源码角度,分析了kafka没有重连zookeeper的问题。问题最后是通过重启kafka进行了恢复,暂时也还未对可疑的代码尝试修改并复现验证。后续有时间再近一步跟进处理,本文如有分析不正确的地方,也欢迎交流指正~

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2022-01-06,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 陈猿解码 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档