message
都带有Key
, 合并相同Key
的message
, 只留下最新的message
;message
的payload为null
的也将会去除掉;110.png
LogCleaningInProgress
, LogCleaningAborted
,和LogCleaningPaused
, 从字面上就很容易理解是什么意思,下面是源码中的注释:LogCleanerManager
类 管理所有清理的log的状态及转换:def abortCleaning(topicAndPartition: TopicAndPartition)
def abortAndPauseCleaning(topicAndPartition: TopicAndPartition)
def resumeCleaning(topicAndPartition: TopicAndPartition)
def checkCleaningAborted(topicAndPartition: TopicAndPartition)
TopicAndPartiton
的log;LogToClean
类来表示要被清理的Log:private case class LogToClean(topicPartition: TopicAndPartition, log: Log, firstDirtyOffset: Long) extends Ordered[LogToClean] {
val cleanBytes = log.logSegments(-1, firstDirtyOffset).map(_.size).sum
val dirtyBytes = log.logSegments(firstDirtyOffset, math.max(firstDirtyOffset, log.activeSegment.baseOffset)).map(_.size).sum
val cleanableRatio = dirtyBytes / totalBytes.toDouble
def totalBytes = cleanBytes + dirtyBytes
override def compare(that: LogToClean): Int = math.signum(this.cleanableRatio - that.cleanableRatio).toInt
}
firstDirtyOffset
:表示本次清理的起始点, 其前边的offset将被作清理,与在其后的message
作key
的合并;val cleanableRatio = dirtyBytes / totalBytes.toDouble
, 需要清理的log的比例,这个值越大,越可能被最后选中作清理;cleaner-offset-checkpoint
文件中,作为下一次清理时生成firstDirtyOffset
的参考;def updateCheckpoints(dataDir: File, update: Option[(TopicAndPartition,Long)]) {
inLock(lock) {
val checkpoint = checkpoints(dataDir)
val existing = checkpoint.read().filterKeys(logs.keys) ++ update
checkpoint.write(existing)
}
}
def grabFilthiestLog(): Option[LogToClean] = {
inLock(lock) {
val lastClean = allCleanerCheckpoints()
val dirtyLogs = logs.filter {
case (topicAndPartition, log) => log.config.compact // skip any logs marked for delete rather than dedupe
}.filterNot {
case (topicAndPartition, log) => inProgress.contains(topicAndPartition) // skip any logs already in-progress
}.map {
case (topicAndPartition, log) => // create a LogToClean instance for each
// if the log segments are abnormally truncated and hence the checkpointed offset
// is no longer valid, reset to the log starting offset and log the error event
val logStartOffset = log.logSegments.head.baseOffset
val firstDirtyOffset = {
val offset = lastClean.getOrElse(topicAndPartition, logStartOffset)
if (offset < logStartOffset) {
error("Resetting first dirty offset to log start offset %d since the checkpointed offset %d is invalid."
.format(logStartOffset, offset))
logStartOffset
} else {
offset
}
}
LogToClean(topicAndPartition, log, firstDirtyOffset)
}.filter(ltc => ltc.totalBytes > 0) // skip any empty logs
this.dirtiestLogCleanableRatio = if (!dirtyLogs.isEmpty) dirtyLogs.max.cleanableRatio else 0
// and must meet the minimum threshold for dirty byte ratio
val cleanableLogs = dirtyLogs.filter(ltc => ltc.cleanableRatio > ltc.log.config.minCleanableRatio)
if(cleanableLogs.isEmpty) {
None
} else {
val filthiest = cleanableLogs.max
inProgress.put(filthiest.topicPartition, LogCleaningInProgress)
Some(filthiest)
}
}
}
代码看着多,实在比较简单:
LogToClean
对象列表;LogToClean
列表中过滤过cleanableRatio
大于config中配置的清理比率的LogToClean
;LogToClean
列表中取cleanableRatio
最大的,即为当前最需要被清理的.111.png
CleanerPoint
就是我们上面说的firstDirtyOffset
;Log Tail
中的key将被合并到 LogHead
中,实际上因为构建OffsetMap
是在Log Head
部分,因此合并Key
的部分还包括构建OffsetMap
最后到达的Offset
位置;下面这个是整个压缩合并的过程, Kafka的代码就是把这个过程翻译成Code
112.png
LogHead
部分的所有日志的OffsetMap, 此Map中的key即为message.key
的hash值, value即为当前message的offset
private[log] def buildOffsetMap(log: Log, start: Long, end: Long, map: OffsetMap): Long = {
map.clear()
val dirty = log.logSegments(start, end).toSeq
info("Building offset map for log %s for %d segments in offset range [%d, %d).".format(log.name, dirty.size, start, end))
// Add all the dirty segments. We must take at least map.slots * load_factor,
// but we may be able to fit more (if there is lots of duplication in the dirty section of the log)
var offset = dirty.head.baseOffset
require(offset == start, "Last clean offset is %d but segment base offset is %d for log %s.".format(start, offset, log.name))
val maxDesiredMapSize = (map.slots * this.dupBufferLoadFactor).toInt
var full = false
for (segment <- dirty if !full) {
checkDone(log.topicAndPartition)
val segmentSize = segment.nextOffset() - segment.baseOffset
require(segmentSize <= maxDesiredMapSize, "%d messages in segment %s/%s but offset map can fit only %d. You can increase log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads".format(segmentSize, log.name, segment.log.file.getName, maxDesiredMapSize))
if (map.size + segmentSize <= maxDesiredMapSize)
offset = buildOffsetMapForSegment(log.topicAndPartition, segment, map)
else
full = true
}
info("Offset map for log %s complete.".format(log.name))
offset
}
OffsetMap
, 其中的key
是message.key
的hash值, 这个地方有个坑,如果出现了hash碰撞怎么?val maxDesiredMapSize = (map.slots * this.dupBufferLoadFactor).toInt
.LogSegment
势必大小要减少,因此需要重新分组来为重写Log
和Index
文件作准备;segmentsize
和indexsize
进行分组,这个分组是每一组的segmentsize
不能超过segmentSize
的配置大小,indexfile
不能超过配置的最大indexsize
的大小,同时条数不能超过int.maxvalue
.private[log] def groupSegmentsBySize(segments: Iterable[LogSegment], maxSize: Int, maxIndexSize: Int): List[Seq[LogSegment]] = {
var grouped = List[List[LogSegment]]()
var segs = segments.toList
while(!segs.isEmpty) {
var group = List(segs.head)
var logSize = segs.head.size
var indexSize = segs.head.index.sizeInBytes
segs = segs.tail
while(!segs.isEmpty &&
logSize + segs.head.size <= maxSize &&
indexSize + segs.head.index.sizeInBytes <= maxIndexSize &&
segs.head.index.lastOffset - group.last.index.baseOffset <= Int.MaxValue) {
group = segs.head :: group
logSize += segs.head.size
indexSize += segs.head.index.sizeInBytes
segs = segs.tail
}
grouped ::= group.reverse
}
grouped.reverse
}
LogSegment
, 按一定的规则过滤出需要保留的msg重定入新的Log文件中;message
将被保留 message
的key
在OffsetMap
中能找到,同时当前的message
的offset
不小于offsetMap
中存储的offset
;segment
的最后修改时间大于最大的保留时间,同时这个消息的value
是有效的value,即不为null;private def shouldRetainMessage(source: kafka.log.LogSegment,
map: kafka.log.OffsetMap,
retainDeletes: Boolean,
entry: kafka.message.MessageAndOffset): Boolean = {
val key = entry.message.key
if (key != null) {
val foundOffset = map.get(key)
/* two cases in which we can get rid of a message:
* 1) if there exists a message with the same key but higher offset
* 2) if the message is a delete "tombstone" marker and enough time has passed
*/
val redundant = foundOffset >= 0 && entry.offset < foundOffset
val obsoleteDelete = !retainDeletes && entry.message.isNull
!redundant && !obsoleteDelete
} else {
stats.invalidMessage()
false
}
}
message
后重写Log
和Index
文件过程;private[log] def cleanSegments(log: Log,
segments: Seq[LogSegment],
map: OffsetMap,
deleteHorizonMs: Long) {
// create a new segment with the suffix .cleaned appended to both the log and index name
val logFile = new File(segments.head.log.file.getPath + Log.CleanedFileSuffix)
logFile.delete()
val indexFile = new File(segments.head.index.file.getPath + Log.CleanedFileSuffix)
indexFile.delete()
val messages = new FileMessageSet(logFile, fileAlreadyExists = false, initFileSize = log.initFileSize(), preallocate = log.config.preallocate)
val index = new OffsetIndex(indexFile, segments.head.baseOffset, segments.head.index.maxIndexSize)
val cleaned = new LogSegment(messages, index, segments.head.baseOffset, segments.head.indexIntervalBytes, log.config.randomSegmentJitter, time)
try {
// clean segments into the new destination segment
for (old <- segments) {
val retainDeletes = old.lastModified > deleteHorizonMs
info("Cleaning segment %s in log %s (last modified %s) into %s, %s deletes."
.format(old.baseOffset, log.name, new Date(old.lastModified), cleaned.baseOffset, if(retainDeletes) "retaining" else "discarding"))
cleanInto(log.topicAndPartition, old, cleaned, map, retainDeletes)
}
// trim excess index
index.trimToValidSize()
// flush new segment to disk before swap
cleaned.flush()
// update the modification date to retain the last modified date of the original files
val modified = segments.last.lastModified
cleaned.lastModified = modified
// swap in new segment
info("Swapping in cleaned segment %d for segment(s) %s in log %s.".format(cleaned.baseOffset, segments.map(_.baseOffset).mkString(","), log.name))
log.replaceSegments(cleaned, segments)
} catch {
case e: LogCleaningAbortedException =>
cleaned.delete()
throw e
}
}