我有一个kafka主题被划分成60个分区。生产者以默认的循环方式分发消息。到目前一切尚好。但在消费方面,滞后正在加剧。没什么好奇怪的。令我惊讶的是,这种滞后有时是如何分布的:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
a_group a_topic 24 2013567 2013573 6 kafka-python-2.0.2-50435c75-9f50-4053-b6d1-5b0c0a59a927 /88.88.49.27 kafka-python-2.0.2
a_group a_topic 25 2013105 2013118 13 kafka-python-2.0.2-50435c75-9f50-4053-b6d1-5b0c0a59a927 /88.88.49.27 kafka-python-2.0.2
a_group a_topic 26 2011359 2011365 6 kafka-python-2.0.2-50435c75-9f50-4053-b6d1-5b0c0a59a927 /88.88.49.27 kafka-python-2.0.2
a_group a_topic 0 2015024 2015025 1 kafka-python-2.0.2-011c0c93-27e7-4c4f-a0c1-909213db6164 /88.88.55.212 kafka-python-2.0.2
a_group a_topic 1 2011950 2011950 0 kafka-python-2.0.2-011c0c93-27e7-4c4f-a0c1-909213db6164 /88.88.55.212 kafka-python-2.0.2
a_group a_topic 2 2013069 2013069 0 kafka-python-2.0.2-011c0c93-27e7-4c4f-a0c1-909213db6164 /88.88.55.212 kafka-python-2.0.2
a_group a_topic 9 2010089 2010464 375 kafka-python-2.0.2-1602a64f-c1d3-43ea-9090-bef8ac9ce411 /88.88.30.213 kafka-python-2.0.2
a_group a_topic 10 2011620 2011870 250 kafka-python-2.0.2-1602a64f-c1d3-43ea-9090-bef8ac9ce411 /88.88.30.213 kafka-python-2.0.2
a_group a_topic 11 2011571 2012179 608 kafka-python-2.0.2-1602a64f-c1d3-43ea-9090-bef8ac9ce411 /88.88.30.213 kafka-python-2.0.2
a_group a_topic 12 2011892 2011893 1 kafka-python-2.0.2-26775cd6-966e-40af-b38e-29cd8a5d2519 /88.88.71.88 kafka-python-2.0.2
a_group a_topic 13 2012234 2012235 1 kafka-python-2.0.2-26775cd6-966e-40af-b38e-29cd8a5d2519 /88.88.71.88 kafka-python-2.0.2
a_group a_topic 14 2011195 2011196 1 kafka-python-2.0.2-26775cd6-966e-40af-b38e-29cd8a5d2519 /88.88.71.88 kafka-python-2.0.2
a_group a_topic 15 2009822 2009826 4 kafka-python-2.0.2-33c2cc43-c328-41c1-b1a5-684a3a970231 /88.88.28.166 kafka-python-2.0.2
a_group a_topic 16 2013803 2013804 1 kafka-python-2.0.2-33c2cc43-c328-41c1-b1a5-684a3a970231 /88.88.28.166 kafka-python-2.0.2
a_group a_topic 17 2014154 2014156 2 kafka-python-2.0.2-33c2cc43-c328-41c1-b1a5-684a3a970231 /88.88.28.166 kafka-python-2.0.2
a_group a_topic 21 2016745 2016746 1 kafka-python-2.0.2-4f570aa2-b328-414f-b954-dfa94ec3a170 /88.88.69.73 kafka-python-2.0.2
a_group a_topic 22 2013930 2013932 2 kafka-python-2.0.2-4f570aa2-b328-414f-b954-dfa94ec3a170 /88.88.69.73 kafka-python-2.0.2
a_group a_topic 23 2009764 2009765 1 kafka-python-2.0.2-4f570aa2-b328-414f-b954-dfa94ec3a170 /88.88.69.73 kafka-python-2.0.2
a_group a_topic 30 2012859 2012859 0 kafka-python-2.0.2-596a8b98-ef88-4a9f-8509-3a6432f8b693 /88.88.55.176 kafka-python-2.0.2
a_group a_topic 31 2012474 2012474 0 kafka-python-2.0.2-596a8b98-ef88-4a9f-8509-3a6432f8b693 /88.88.55.176 kafka-python-2.0.2
a_group a_topic 32 2011670 2011672 2 kafka-python-2.0.2-596a8b98-ef88-4a9f-8509-3a6432f8b693 /88.88.55.176 kafka-python-2.0.2
a_group a_topic 18 2010723 2010723 0 kafka-python-2.0.2-41fdba78-5e87-4cfc-80d2-96c5810aedce /88.88.39.202 kafka-python-2.0.2
a_group a_topic 19 2012150 2012150 0 kafka-python-2.0.2-41fdba78-5e87-4cfc-80d2-96c5810aedce /88.88.39.202 kafka-python-2.0.2
a_group a_topic 20 2013224 2013225 1 kafka-python-2.0.2-41fdba78-5e87-4cfc-80d2-96c5810aedce /88.88.39.202 kafka-python-2.0.2
a_group a_topic 3 2009559 2009559 0 kafka-python-2.0.2-0d9ceb1f-0517-4896-a0c4-56d2255a2805 /88.88.91.145 kafka-python-2.0.2
a_group a_topic 4 2013069 2013071 2 kafka-python-2.0.2-0d9ceb1f-0517-4896-a0c4-56d2255a2805 /88.88.91.145 kafka-python-2.0.2
a_group a_topic 5 2010672 2010674 2 kafka-python-2.0.2-0d9ceb1f-0517-4896-a0c4-56d2255a2805 /88.88.91.145 kafka-python-2.0.2
a_group a_topic 33 2013696 2013696 0 kafka-python-2.0.2-62d88a9e-5f7e-4133-86f5-d6fa372d6b13 /88.88.20.100 kafka-python-2.0.2
a_group a_topic 34 2009381 2009381 0 kafka-python-2.0.2-62d88a9e-5f7e-4133-86f5-d6fa372d6b13 /88.88.20.100 kafka-python-2.0.2
a_group a_topic 35 2012901 2012902 1 kafka-python-2.0.2-62d88a9e-5f7e-4133-86f5-d6fa372d6b13 /88.88.20.100 kafka-python-2.0.2
a_group a_topic 6 2011941 2011942 1 kafka-python-2.0.2-15647249-be71-4945-9b47-ff0b508978bc /88.88.55.113 kafka-python-2.0.2
a_group a_topic 7 2011215 2011215 0 kafka-python-2.0.2-15647249-be71-4945-9b47-ff0b508978bc /88.88.55.113 kafka-python-2.0.2
a_group a_topic 8 2011842 2011845 3 kafka-python-2.0.2-15647249-be71-4945-9b47-ff0b508978bc /88.88.55.113 kafka-python-2.0.2
a_group a_topic 27 2012368 2012370 2 kafka-python-2.0.2-514bd2a8-1332-45f4-b240-3029fa3c471f /88.88.85.27 kafka-python-2.0.2
a_group a_topic 28 2012921 2012925 4 kafka-python-2.0.2-514bd2a8-1332-45f4-b240-3029fa3c471f /88.88.85.27 kafka-python-2.0.2
a_group a_topic 29 2008526 2008530 4 kafka-python-2.0.2-514bd2a8-1332-45f4-b240-3029fa3c471f /88.88.85.27 kafka-python-2.0.2
a_group a_topic 56 2011315 2011315 0 kafka-python-2.0.2-f45f80be-49fe-4d4a-b91e-3f4dbc91afc1 /88.88.84.132 kafka-python-2.0.2
a_group a_topic 57 2013118 2013118 0 kafka-python-2.0.2-f45f80be-49fe-4d4a-b91e-3f4dbc91afc1 /88.88.84.132 kafka-python-2.0.2
a_group a_topic 50 2010863 2010865 2 kafka-python-2.0.2-e648adf2-2674-45b2-84f6-71a508d36270 /88.88.72.52 kafka-python-2.0.2
a_group a_topic 51 2011747 2011749 2 kafka-python-2.0.2-e648adf2-2674-45b2-84f6-71a508d36270 /88.88.72.52 kafka-python-2.0.2
a_group a_topic 36 2012567 2012568 1 kafka-python-2.0.2-65437c50-b4b8-4f4c-a396-5fc38b15232e /88.88.33.201 kafka-python-2.0.2
a_group a_topic 37 2012459 2012460 1 kafka-python-2.0.2-65437c50-b4b8-4f4c-a396-5fc38b15232e /88.88.33.201 kafka-python-2.0.2
a_group a_topic 58 2014003 2014003 0 kafka-python-2.0.2-fada16f0-1ee1-4397-afbe-225a69718181 /88.88.14.72 kafka-python-2.0.2
a_group a_topic 59 2012843 2012845 2 kafka-python-2.0.2-fada16f0-1ee1-4397-afbe-225a69718181 /88.88.14.72 kafka-python-2.0.2
a_group a_topic 48 2013246 2013246 0 kafka-python-2.0.2-e56dd468-f905-49ac-84db-493e75013973 /88.88.81.234 kafka-python-2.0.2
a_group a_topic 49 2011876 2011877 1 kafka-python-2.0.2-e56dd468-f905-49ac-84db-493e75013973 /88.88.81.234 kafka-python-2.0.2
a_group a_topic 54 2015293 2015294 1 kafka-python-2.0.2-f31dd554-6b24-4623-8475-7df47728b608 /88.88.55.197 kafka-python-2.0.2
a_group a_topic 55 2013566 2013567 1 kafka-python-2.0.2-f31dd554-6b24-4623-8475-7df47728b608 /88.88.55.197 kafka-python-2.0.2
a_group a_topic 52 2012371 2012371 0 kafka-python-2.0.2-ea20755a-9d43-4234-a5e8-f25ae62f4d2c /88.88.7.18 kafka-python-2.0.2
a_group a_topic 53 2009590 2009590 0 kafka-python-2.0.2-ea20755a-9d43-4234-a5e8-f25ae62f4d2c /88.88.7.18 kafka-python-2.0.2
a_group a_topic 40 2014499 2014500 1 kafka-python-2.0.2-8dfc86d8-0d59-4ab8-a042-9217d8ed185b /88.88.91.216 kafka-python-2.0.2
a_group a_topic 41 2010624 2010625 1 kafka-python-2.0.2-8dfc86d8-0d59-4ab8-a042-9217d8ed185b /88.88.91.216 kafka-python-2.0.2
a_group a_topic 46 2013292 2013293 1 kafka-python-2.0.2-c148c72c-43b4-4547-b2c2-cd58810b94f3 /88.88.18.170 kafka-python-2.0.2
a_group a_topic 47 2009611 2009611 0 kafka-python-2.0.2-c148c72c-43b4-4547-b2c2-cd58810b94f3 /88.88.18.170 kafka-python-2.0.2
a_group a_topic 38 2012883 2012883 0 kafka-python-2.0.2-6a8e6817-edca-4c6d-b35a-50c908040c7b /88.88.29.146 kafka-python-2.0.2
a_group a_topic 39 2013983 2013983 0 kafka-python-2.0.2-6a8e6817-edca-4c6d-b35a-50c908040c7b /88.88.29.146 kafka-python-2.0.2
a_group a_topic 42 2011620 2011620 0 kafka-python-2.0.2-b1816769-1189-4f5f-a5f4-bded61383c8b /88.88.25.36 kafka-python-2.0.2
a_group a_topic 43 2011800 2011802 2 kafka-python-2.0.2-b1816769-1189-4f5f-a5f4-bded61383c8b /88.88.25.36 kafka-python-2.0.2
a_group a_topic 44 2012131 2012131 0 kafka-python-2.0.2-b9111d9a-940f-4f3c-9087-6b739abf06b4 /88.88.54.169 kafka-python-2.0.2
a_group a_topic 45 2010538 2010538 0 kafka-python-2.0.2-b9111d9a-940f-4f3c-9087-6b739abf06b4 /88.88.54.169 kafka-python-2.0.2
上面的例子显示,几乎整个滞后都存在于一个消费者身上。这提供了这样一种情况:因为一个使用者不能处理这么多消息,所以延迟不会被消耗到0。
您还可以看到(在偏移字段上)消息的分布是相当平等的。
那么,造成这种不平衡滞后分布的原因又是什么呢?有什么想法吗?如何调试/调查找出原因,有什么线索吗?
发布于 2022-10-03 11:05:44
经过几天的调查,我发现了我们案件中不平衡滞后的原因。我决定分享我的经验,这样可以帮助别人。
在我们的例子中,原因与使用者缩放设置和消费者端的默认partition_assignment_strategy
有关。
我们使用的是基于滞后的Apache Kafka的KEDA scaler,并且有一个默认的range
-based partition_assignment_strategy
。问题是我们已经建立了温和的lagThreshold
。因为它,缩放是上升和下降,但没有到最大数目的实例在任何时候。因此,大多数情况下,分配给消费者的分区是不均匀的。例如,根据实例的数量,一些消费者分配了3个分区,而另一些用户分配了4个分区。由于所有分区都有相同数量的消息,并且消费者在相当相似的时间内消费,因此具有较多分区的消费者有更多的消息可供使用。同时,分配的分区数量较少的消费者有足够的时间使用他们的消息。所有的事情都发生了这样的情况:我们过于温和的lagThreshold
和range
-based partition_assignment_strategy
造成了大多数滞后都是在较低数目的分区上积累起来的。
在这种情况下有一点帮助,就是将使用者partition_assignment_strategy
切换到round robin
,这会导致将使用者以一种稍微不同的方式分配给分区。不幸的是,这并没有完全起作用。它只是帮助将累积的延迟转移到不同的分区,并对其进行稍微的洗牌。
对我们来说,最终的解决方案是更积极地设置lagThreshold
,这样消费者实例就可以扩展到最多60个实例。在我们的例子中,每个分区都有一个使用者实例,这使得消费者能够平等地使用消息。
https://stackoverflow.com/questions/73869810
复制相似问题