2020年7月的总结文章,2021年再回顾的话发现kafka exporter原作者目前花在本项目的时间过少,很多PR没有处理。好在代码比较清晰,自己拉个独立分支动手修改代码问题也不大。
项目watch、star、fork数量均领先竞品,issue、pull request也比较活跃。
截至2020-07-07:
kafka exporter 通过 Kafka Protocol Specification 收集 Brokers, Topics 以及 Consumer Groups的相关指标。
相比于以往通过kafka内置的脚本进行收集,由于没有了每次脚本启动JVM的开销,指标收集时间从分钟级别降到秒级别,便于大规模集群的监控。
KIP-575: build a Kafka-Exporter by Java
kafka后面可能会推出官方的Kafka-Exporter
kafka exporter代码层借助大量开源库,所以功能强大但代码量极少,仅600+行,大致架构如下:
github上指标说明有些滞后,这里加上新的一些指标说明
Name | Exposed informations |
---|---|
| Number of Brokers in the Kafka Cluster |
# HELP kafka_brokers Number of Brokers in the Kafka Cluster.
# TYPE kafka_brokers gauge
kafka_brokers 3
Name | Exposed informations |
---|---|
| Number of partitions for this Topic |
| Current Offset of a Broker at Topic/Partition |
| Oldest Offset of a Broker at Topic/Partition |
| Number of In-Sync Replicas for this Topic/Partition |
| Leader Broker ID of this Topic/Partition |
| 1 if Topic/Partition is using the Preferred Broker |
| Number of Replicas for this Topic/Partition |
| 1 if Topic/Partition is under Replicated |
# HELP kafka_topic_partitions Number of partitions for this Topic
# TYPE kafka_topic_partitions gauge
kafka_topic_partitions{topic="__consumer_offsets"} 50
# HELP kafka_topic_partition_current_offset Current Offset of a Broker at Topic/Partition
# TYPE kafka_topic_partition_current_offset gauge
kafka_topic_partition_current_offset{partition="0",topic="__consumer_offsets"} 0
# HELP kafka_topic_partition_oldest_offset Oldest Offset of a Broker at Topic/Partition
# TYPE kafka_topic_partition_oldest_offset gauge
kafka_topic_partition_oldest_offset{partition="0",topic="__consumer_offsets"} 0
# HELP kafka_topic_partition_in_sync_replica Number of In-Sync Replicas for this Topic/Partition
# TYPE kafka_topic_partition_in_sync_replica gauge
kafka_topic_partition_in_sync_replica{partition="0",topic="__consumer_offsets"} 3
# HELP kafka_topic_partition_leader Leader Broker ID of this Topic/Partition
# TYPE kafka_topic_partition_leader gauge
kafka_topic_partition_leader{partition="0",topic="__consumer_offsets"} 0
# HELP kafka_topic_partition_leader_is_preferred 1 if Topic/Partition is using the Preferred Broker
# TYPE kafka_topic_partition_leader_is_preferred gauge
kafka_topic_partition_leader_is_preferred{partition="0",topic="__consumer_offsets"} 1
# HELP kafka_topic_partition_replicas Number of Replicas for this Topic/Partition
# TYPE kafka_topic_partition_replicas gauge
kafka_topic_partition_replicas{partition="0",topic="__consumer_offsets"} 3
# HELP kafka_topic_partition_under_replicated_partition 1 if Topic/Partition is under Replicated
# TYPE kafka_topic_partition_under_replicated_partition gauge
kafka_topic_partition_under_replicated_partition{partition="0",topic="__consumer_offsets"} 0
Name | Exposed informations |
---|---|
| Current Offset of a ConsumerGroup at Topic/Partition |
| Current Approximate Lag of a ConsumerGroup at Topic/Partition (broker consumer) |
| Current Approximate Lag of a ConsumerGroup at Topic/Partition (zk consumer) |
# HELP kafka_consumergroup_current_offset Current Offset of a ConsumerGroup at Topic/Partition
# TYPE kafka_consumergroup_current_offset gauge
kafka_consumergroup_current_offset{consumergroup="KMOffsetCache-kafka-manager-3806276532-ml44w",partition="0",topic="__consumer_offsets"} -1
# HELP kafka_consumergroup_lag Current Approximate Lag of a ConsumerGroup at Topic/Partition
# TYPE kafka_consumergroup_lag gauge
kafka_consumergroup_lag{consumergroup="KMOffsetCache-kafka-manager-3806276532-ml44w",partition="0",topic="__consumer_offsets"} 1
目前grafana主要是基于这2个模板的dashboard按需构建。链接:
我们现网kafka的zk都是带有chroot,如host1:2181,host2:2181/kafka1
,试用发现kafka exporter
并不支持这种用法。分析代码发现,kafka exporter
使用zk库kazoo
的姿势不太对,使用NewKazooFromConnectionString
代替NewKazoo
方法就能兼容我们的场景,目前这种改进方案已经提交pr给作者。
一个kafka exporter
实例目前只能采集一个集群,自然而然我们就想通过改造源程序支持采集多个kafka集群的指标。然而试用中发现我们的kafka集群topic\partition\group数量极多,几乎达到LinkedIn内部最繁忙集群的水平,所以prometheus pull单集群指标的时间会比较长(30s左右,开启zk lag采集会更长)且指标内容较大(3w行左右),如果多个集群的指标汇总在一个实例里,prometheus压力会很大,所以放弃了这种方案。
kafka exporter
实例打标签经过取舍后,我们决定采用一个集群一个kafka exporter
方案,利用kafka.labels
的特性,加上clusterId
信息来区分不同集群的指标。参考方法:
./kafka_exporter --kafka.server=broker-host:9092 --kafka.labels="clusterId=203" --use.consumelag.zookeeper --zookeeper.server="zk-host:2181/kafka1" --kafka.version ="2.2.0"
打包成一个docker镜像部署在k8s上。
感兴趣的同学可以关注我的微信公众号~
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。