前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Kafka的2种日志清理策略感受一下

Kafka的2种日志清理策略感受一下

作者头像
Spark学习技巧
发布2019-08-06 16:34:01
1.9K0
发布2019-08-06 16:34:01
举报
文章被收录于专栏:Spark学习技巧Spark学习技巧

Kafka是一个基于日志的流处理平台,一个topic可以有多个分区(partition),分区是复制的基本单元,在单节点上,一个分区的数据文件可以存储在多个磁盘目录中,配置项是:

代码语言:javascript
复制
# A comma separated list of directories under which to store log files
log.dirs=/home/storm/dev/kafka-logs

每个分区的日志文件存储的时候又会分成一个个的segment,默认日志段(segment)的大小是1GB,segment是日志清理的基本单元,当前正在使用的segment是不会被清理的。

代码语言:javascript
复制
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

日志清理

Kafka Broker 的日志清理功能在配置 log.cleaner.enable=true 后会开启一些清理线程,执行定时清理任务。在kafka 0.9.0之后 log.cleaner.enable 默认是true。支持的清理策略(log.cleanup.policy)有2种:delete和compact,默认是delete。

compact 清理策略(log compaction)

log compaction 实现的是一个topic的一个分区中,只保留最近的某个key对应的value,如果要删除某个消息可以发送一个墓碑消息(tomestone):(key, null)。为了展示这个过程,修改 Broker 的配置:把segment的大小调小点,清理策略改为 compact。

代码语言:javascript
复制
# 25KB
log.segment.bytes=25600 
log.cleanup.policy=compact
代码语言:javascript
复制
批量发送一些带有key的消息。
代码语言:javascript
复制
代码语言:javascript
复制
然后可以在日志目录中看到日志文件的结构。
代码语言:javascript
复制
➜  test-0 ls -alh
total 160K
drwxrwxr-x  2 storm storm 4.0K Sep 11 18:47 .
drwxrwxr-x 53 storm storm 4.0K Sep 11 18:47 ..
-rw-rw-r--  1 storm storm    0 Sep 11 17:27 00000000000000000000.index
-rw-rw-r--  1 storm storm   78 Sep 11 17:27 00000000000000000000.log
-rw-rw-r--  1 storm storm   12 Sep 11 17:27 00000000000000000000.timeindex
-rw-rw-r--  1 storm storm    0 Sep 11 17:27 00000000000000000153.index
-rw-rw-r--  1 storm storm  175 Sep 11 17:27 00000000000000000153.log
-rw-rw-r--  1 storm storm   10 Sep 11 17:27 00000000000000000153.snapshot
-rw-rw-r--  1 storm storm   12 Sep 11 17:27 00000000000000000153.timeindex
-rw-rw-r--  1 storm storm    8 Sep 11 18:47 00000000000000000296.index
-rw-rw-r--  1 storm storm  25K Sep 11 17:27 00000000000000000296.log
-rw-rw-r--  1 storm storm   10 Sep 11 17:27 00000000000000000296.snapshot
-rw-rw-r--  1 storm storm   12 Sep 11 18:47 00000000000000000296.timeindex
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000522.index
-rw-rw-r--  1 storm storm  16K Sep 11 18:47 00000000000000000522.log
-rw-rw-r--  1 storm storm   10 Sep 11 18:47 00000000000000000522.snapshot
-rw-rw-r--  1 storm storm   12 Sep 11 18:47 00000000000000000522.timeindex
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000665.index
-rw-rw-r--  1 storm storm  16K Sep 11 18:47 00000000000000000665.log
-rw-rw-r--  1 storm storm   10 Sep 11 18:47 00000000000000000665.snapshot
-rw-rw-r--  1 storm storm   12 Sep 11 18:47 00000000000000000665.timeindex
-rw-rw-r--  1 storm storm  10M Sep 11 18:47 00000000000000000808.index
-rw-rw-r--  1 storm storm  25K Sep 11 18:47 00000000000000000808.log
-rw-rw-r--  1 storm storm   10 Sep 11 18:47 00000000000000000808.snapshot
-rw-rw-r--  1 storm storm  10M Sep 11 18:47 00000000000000000808.timeindex
-rw-rw-r--  1 storm storm    8 Sep 11 10:44 leader-epoch-checkpoint

➜  test-0 ls -alh
total 164K
drwxrwxr-x  2 storm storm 4.0K Sep 11 18:48 .
drwxrwxr-x 53 storm storm 4.0K Sep 11 18:48 ..
-rw-rw-r--  1 storm storm    0 Sep 11 17:27 00000000000000000000.index
-rw-rw-r--  1 storm storm    0 Sep 11 17:27 00000000000000000000.index.deleted
-rw-rw-r--  1 storm storm   73 Sep 11 17:27 00000000000000000000.log
-rw-rw-r--  1 storm storm   78 Sep 11 17:27 00000000000000000000.log.deleted
-rw-rw-r--  1 storm storm   12 Sep 11 17:27 00000000000000000000.timeindex
-rw-rw-r--  1 storm storm   12 Sep 11 17:27 00000000000000000000.timeindex.deleted
-rw-rw-r--  1 storm storm    0 Sep 11 17:27 00000000000000000153.index.deleted
-rw-rw-r--  1 storm storm  175 Sep 11 17:27 00000000000000000153.log.deleted
-rw-rw-r--  1 storm storm   12 Sep 11 17:27 00000000000000000153.timeindex.deleted
-rw-rw-r--  1 storm storm    8 Sep 11 18:47 00000000000000000296.index.deleted
-rw-rw-r--  1 storm storm  25K Sep 11 17:27 00000000000000000296.log.deleted
-rw-rw-r--  1 storm storm   12 Sep 11 18:47 00000000000000000296.timeindex.deleted
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000522.index
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000522.index.deleted
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000522.log
-rw-rw-r--  1 storm storm  16K Sep 11 18:47 00000000000000000522.log.deleted
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000522.timeindex
-rw-rw-r--  1 storm storm   12 Sep 11 18:47 00000000000000000522.timeindex.deleted
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000665.index
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000665.index.deleted
-rw-rw-r--  1 storm storm  175 Sep 11 18:47 00000000000000000665.log
-rw-rw-r--  1 storm storm  16K Sep 11 18:47 00000000000000000665.log.deleted
-rw-rw-r--  1 storm storm   10 Sep 11 18:47 00000000000000000665.snapshot
-rw-rw-r--  1 storm storm   12 Sep 11 18:47 00000000000000000665.timeindex
-rw-rw-r--  1 storm storm   12 Sep 11 18:47 00000000000000000665.timeindex.deleted
-rw-rw-r--  1 storm storm  10M Sep 11 18:47 00000000000000000808.index
-rw-rw-r--  1 storm storm  25K Sep 11 18:47 00000000000000000808.log
-rw-rw-r--  1 storm storm   10 Sep 11 18:47 00000000000000000808.snapshot
-rw-rw-r--  1 storm storm  10M Sep 11 18:47 00000000000000000808.timeindex
-rw-rw-r--  1 storm storm    8 Sep 11 10:44 leader-epoch-checkpoint

可以看到除了当前segment之外,前面的segments都已经得到了清理/压缩,

从偏移量(offset)出现缺失可到看出来。

代码语言:javascript
复制

kafka_2.11-2.0.0 ./bin/kafka-run-class.sh kafka.tools.DumpLogSegments  --deep-iteration --files /home/storm/dev/kafka-logs/test-0/00000000000000000000.log
代码语言:javascript
复制
Dumping /home/storm/dev/kafka-logs/test-0/00000000000000000000.log
Starting offset: 0
offset: 521 position: 0 CreateTime: 1536658031117 isvalid: true keysize: 4 valuesize: 0 magic: 2 compresscodec: NONE producerId: -1 producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
➜  kafka_2.11-2.0.0 ./bin/kafka-run-class.sh kafka.tools.DumpLogSegments  --deep-iteration --files /home/storm/dev/kafka-logs/test-0/00000000000000000665.log
Dumping /home/storm/dev/kafka-logs/test-0/00000000000000000665.log
Starting offset: 665
offset: 807 position: 0 CreateTime: 1536662844868 isvalid: true keysize: 4 valuesize: 100 magic: 2 compresscodec: NONE producerId: -1 producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
代码语言:javascript
复制
标记为deleted的segments会在1天后被清除。
代码语言:javascript
复制
/home/storm/dev/kafka-logs/test-0
➜  test-0 date
Wed Sep 12 09:48:45 CST 2018
➜  test-0 ls -alh
total 72K
drwxrwxr-x  2 storm storm 4.0K Sep 11 18:48 .
drwxrwxr-x 53 storm storm 4.0K Sep 11 19:51 ..
-rw-rw-r--  1 storm storm    0 Sep 11 17:27 00000000000000000000.index
-rw-rw-r--  1 storm storm   73 Sep 11 17:27 00000000000000000000.log
-rw-rw-r--  1 storm storm   12 Sep 11 17:27 00000000000000000000.timeindex
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000522.index
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000522.log
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000522.timeindex
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000665.index
-rw-rw-r--  1 storm storm  175 Sep 11 18:47 00000000000000000665.log
-rw-rw-r--  1 storm storm   10 Sep 11 18:47 00000000000000000665.snapshot
-rw-rw-r--  1 storm storm   12 Sep 11 18:47 00000000000000000665.timeindex
-rw-rw-r--  1 storm storm  10M Sep 11 18:47 00000000000000000808.index
-rw-rw-r--  1 storm storm  25K Sep 11 18:47 00000000000000000808.log
-rw-rw-r--  1 storm storm   10 Sep 11 18:47 00000000000000000808.snapshot
-rw-rw-r--  1 storm storm  10M Sep 11 18:47 00000000000000000808.timeindex
-rw-rw-r--  1 storm storm    8 Sep 11 10:44 leader-epoch-checkpoint

delete 清理策略(默认)

再来看看 delete 清理策略,这种策略就是我们默认看到的数据保留特点,超过特定的数据量或者时间,日志就会被删除,这里涉及的 Broker 配置参数是:log.retention.byteslog.retention.hours(等价于 log.retention.minuteslog.retention.ms)默认值为:

代码语言:javascript
复制
# 需要自己根据实际情况设置
log.retention.bytes=-1

# 默认的保留时间是7天
log.retention.hours=168

为了能看出日志删除的效果,这里把保留时间调小,设置为60分钟,然后可以看到,除了当前正在使用的segment,前面的segments都被删除了(标记为deleted,1天后会物理删除)。

代码语言:javascript
复制
# The minimum age of a log file to be eligible for deletion due to age
log.retention.minutes=60
➜  kafka-logs ls -alh test-0
total 220K
drwxrwxr-x  2 storm storm 4.0K Sep 13 11:12 .
drwxrwxr-x 53 storm storm 4.0K Sep 13 11:12 ..
-rw-rw-r--  1 storm storm    0 Sep 11 17:27 00000000000000000000.index.deleted
-rw-rw-r--  1 storm storm   73 Sep 11 17:27 00000000000000000000.log.deleted
-rw-rw-r--  1 storm storm   12 Sep 11 17:27 00000000000000000000.timeindex.deleted
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000522.index.deleted
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000522.log.deleted
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000522.timeindex.deleted
-rw-rw-r--  1 storm storm    0 Sep 11 18:47 00000000000000000665.index.deleted
-rw-rw-r--  1 storm storm  175 Sep 11 18:47 00000000000000000665.log.deleted
-rw-rw-r--  1 storm storm   12 Sep 11 18:47 00000000000000000665.timeindex.deleted
-rw-rw-r--  1 storm storm    8 Sep 12 10:50 00000000000000000808.index.deleted
-rw-rw-r--  1 storm storm  25K Sep 11 18:47 00000000000000000808.log.deleted
-rw-rw-r--  1 storm storm   12 Sep 12 10:50 00000000000000000808.timeindex.deleted
-rw-rw-r--  1 storm storm    0 Sep 12 10:50 00000000000000001034.index.deleted
-rw-rw-r--  1 storm storm  16K Sep 12 10:50 00000000000000001034.log.deleted
-rw-rw-r--  1 storm storm   12 Sep 12 10:50 00000000000000001034.timeindex.deleted
-rw-rw-r--  1 storm storm    0 Sep 12 10:50 00000000000000001177.index.deleted
-rw-rw-r--  1 storm storm  16K Sep 12 10:50 00000000000000001177.log.deleted
-rw-rw-r--  1 storm storm   12 Sep 12 10:50 00000000000000001177.timeindex.deleted
-rw-rw-r--  1 storm storm    8 Sep 12 10:51 00000000000000001320.index.deleted
-rw-rw-r--  1 storm storm  25K Sep 12 10:50 00000000000000001320.log.deleted
-rw-rw-r--  1 storm storm   12 Sep 12 10:51 00000000000000001320.timeindex.deleted
-rw-rw-r--  1 storm storm    0 Sep 12 10:51 00000000000000001546.index.deleted
-rw-rw-r--  1 storm storm  16K Sep 12 10:51 00000000000000001546.log.deleted
-rw-rw-r--  1 storm storm   12 Sep 12 10:51 00000000000000001546.timeindex.deleted
-rw-rw-r--  1 storm storm    0 Sep 12 10:51 00000000000000001689.index.deleted
-rw-rw-r--  1 storm storm  16K Sep 12 10:51 00000000000000001689.log.deleted
-rw-rw-r--  1 storm storm   12 Sep 12 10:51 00000000000000001689.timeindex.deleted
-rw-rw-r--  1 storm storm    8 Sep 13 11:12 00000000000000001832.index.deleted
-rw-rw-r--  1 storm storm  25K Sep 12 10:51 00000000000000001832.log.deleted
-rw-rw-r--  1 storm storm   12 Sep 13 11:12 00000000000000001832.timeindex.deleted
-rw-rw-r--  1 storm storm  10M Sep 13 11:12 00000000000000002058.index
-rw-rw-r--  1 storm storm    0 Sep 13 11:12 00000000000000002058.log
-rw-rw-r--  1 storm storm   10 Sep 13 11:08 00000000000000002058.snapshot
-rw-rw-r--  1 storm storm  10M Sep 13 11:12 00000000000000002058.timeindex
-rw-rw-r--  1 storm storm   11 Sep 13 11:12 leader-epoch-checkpoint

➜  kafka-logs date
Fri Sep 14 09:19:41 CST 2018
➜  kafka-logs ls -alh test-0
total 16K
drwxrwxr-x  2 storm storm 4.0K Sep 13 11:13 .
drwxrwxr-x 53 storm storm 4.0K Sep 14 09:19 ..
-rw-rw-r--  1 storm storm  10M Sep 13 11:12 00000000000000002058.index
-rw-rw-r--  1 storm storm    0 Sep 13 11:12 00000000000000002058.log
-rw-rw-r--  1 storm storm   10 Sep 13 11:08 00000000000000002058.snapshot
-rw-rw-r--  1 storm storm  10M Sep 13 11:12 00000000000000002058.timeindex
-rw-rw-r--  1 storm storm   11 Sep 13 11:12 leader-epoch-checkpoint

参考

https://zhuanlan.zhihu.com/p/44520004

Kafka Architecture: Log Compaction

4.8 Log Compaction

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2019-08-03,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 浪尖聊大数据 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 日志清理
  • delete 清理策略(默认)
  • 参考
相关产品与服务
文件存储
文件存储(Cloud File Storage,CFS)为您提供安全可靠、可扩展的共享文件存储服务。文件存储可与腾讯云服务器、容器服务、批量计算等服务搭配使用,为多个计算节点提供容量和性能可弹性扩展的高性能共享存储。腾讯云文件存储的管理界面简单、易使用,可实现对现有应用的无缝集成;按实际用量付费,为您节约成本,简化 IT 运维工作。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档