Golang源码探索----GC的实现原理(3)

李海彬

发布于 2020-12-03 11:21:49

9890

发布于 2020-12-03 11:21:49

根对象

在GC的标记阶段首先需要标记的就是"根对象", 从根对象开始可到达的所有对象都会被认为是存活的. 根对象包含了全局变量, 各个G的栈上的变量等, GC会先扫描根对象然后再扫描根对象可到达的所有对象. 扫描根对象包含了一系列的工作, 它们定义在[https://github.com/golang/go/blob/go1.9.2/src/runtime/mgcmark.go#L54]函数:

Fixed Roots: 特殊的扫描工作
- fixedRootFinalizers: 扫描析构器队列
- fixedRootFreeGStacks: 释放已中止的G的栈
Flush Cache Roots: 释放mcache中的所有span, 要求STW
Data Roots: 扫描可读写的全局变量
BSS Roots: 扫描只读的全局变量
Span Roots: 扫描各个span中特殊对象(析构器列表)
Stack Roots: 扫描各个G的栈

标记阶段(Mark)会做其中的"Fixed Roots", "Data Roots", "BSS Roots", "Span Roots", "Stack Roots". 完成标记阶段(Mark Termination)会做其中的"Fixed Roots", "Flush Cache Roots".

标记队列

GC的标记阶段会使用"标记队列"来确定所有可从根对象到达的对象都已标记, 上面提到的"灰色"的对象就是在标记队列中的对象. 举例来说, 如果当前有[A, B, C]这三个根对象, 那么扫描根对象时就会把它们放到标记队列:

work queue: [A, B, C]

后台标记任务从标记队列中取出A, 如果A引用了D, 则把D放入标记队列:

work queue: [B, C, D]

后台标记任务从标记队列取出B, 如果B也引用了D, 这时因为D在gcmarkBits中对应的bit已经是1所以会跳过:

work queue: [C, D]

如果并行运行的go代码分配了一个对象E, 对象E会被立刻标记, 但不会进入标记队列(因为确定E没有引用其他对象). 然后并行运行的go代码把对象F设置给对象E的成员, 写屏障会标记对象F然后把对象F加到运行队列:

work queue: [C, D, F]

后台标记任务从标记队列取出C, 如果C没有引用其他对象, 则不需要处理:

work queue: [D, F]

后台标记任务从标记队列取出D, 如果D引用了X, 则把X放入标记队列:

work queue: [F, X]

后台标记任务从标记队列取出F, 如果F没有引用其他对象, 则不需要处理. 后台标记任务从标记队列取出X, 如果X没有引用其他对象, 则不需要处理. 最后标记队列为空, 标记完成, 存活的对象有[A, B, C, D, E, F, X].

实际的状况会比上面介绍的状况稍微复杂一点. 标记队列会分为全局标记队列和各个P的本地标记队列, 这点和协程中的运行队列相似. 并且标记队列为空以后, 还需要停止整个世界并禁止写屏障, 然后再次检查是否为空.

源代码分析

go触发gc会从gcStart函数开始:

// gcStart transitions the GC from _GCoff to _GCmark (if
// !mode.stwMark) or _GCmarktermination (if mode.stwMark) by
// performing sweep termination and GC initialization.
//
// This may return without performing this transition in some cases,
// such as when called on a system stack or with locks held.
func gcStart(mode gcMode, trigger gcTrigger) {
    // 判断当前G是否可抢占, 不可抢占时不触发GC
    // Since this is called from malloc and malloc is called in
    // the guts of a number of libraries that might be holding
    // locks, don't attempt to start GC in non-preemptible or
    // potentially unstable situations.
    mp := acquirem()
    if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" {
        releasem(mp)
        return
    }
    releasem(mp)
    mp = nil
    // 并行清扫上一轮GC未清扫的span
    // Pick up the remaining unswept/not being swept spans concurrently
    //
    // This shouldn't happen if we're being invoked in background
    // mode since proportional sweep should have just finished
    // sweeping everything, but rounding errors, etc, may leave a
    // few spans unswept. In forced mode, this is necessary since
    // GC can be forced at any point in the sweeping cycle.
    //
    // We check the transition condition continuously here in case
    // this G gets delayed in to the next GC cycle.
    for trigger.test() && gosweepone() != ^uintptr(0) {
        sweep.nbgsweep++
    }
    // 上锁, 然后重新检查gcTrigger的条件是否成立, 不成立时不触发GC
    // Perform GC initialization and the sweep termination
    // transition.
    semacquire(&work.startSema)
    // Re-check transition condition under transition lock.
    if !trigger.test() {
        semrelease(&work.startSema)
        return
    }
    // 记录是否强制触发, gcTriggerCycle是runtime.GC用的
    // For stats, check if this GC was forced by the user.
    work.userForced = trigger.kind == gcTriggerAlways || trigger.kind == gcTriggerCycle
    // 判断是否指定了禁止并行GC的参数
    // In gcstoptheworld debug mode, upgrade the mode accordingly.
    // We do this after re-checking the transition condition so
    // that multiple goroutines that detect the heap trigger don't
    // start multiple STW GCs.
    if mode == gcBackgroundMode {
        if debug.gcstoptheworld == 1 {
            mode = gcForceMode
        } else if debug.gcstoptheworld == 2 {
            mode = gcForceBlockMode
        }
    }
    // Ok, we're doing it!  Stop everybody else
    semacquire(&worldsema)
    // 跟踪处理
    if trace.enabled {
        traceGCStart()
    }
    // 启动后台扫描任务(G)
    if mode == gcBackgroundMode {
        gcBgMarkStartWorkers()
    }
    // 重置标记相关的状态
    gcResetMarkState()
    // 重置参数
    work.stwprocs, work.maxprocs = gcprocs(), gomaxprocs
    work.heap0 = atomic.Load64(&memstats.heap_live)
    work.pauseNS = 0
    work.mode = mode
    // 记录开始时间
    now := nanotime()
    work.tSweepTerm = now
    work.pauseStart = now
    // 停止所有运行中的G, 并禁止它们运行
    systemstack(stopTheWorldWithSema)
    // !!!!!!!!!!!!!!!!
    // 世界已停止(STW)...
    // !!!!!!!!!!!!!!!!
    // 清扫上一轮GC未清扫的span, 确保上一轮GC已完成
    // Finish sweep before we start concurrent scan.
    systemstack(func() {
        finishsweep_m()
    })
    // 清扫sched.sudogcache和sched.deferpool
    // clearpools before we start the GC. If we wait they memory will not be
    // reclaimed until the next GC cycle.
    clearpools()
    // 增加GC计数
    work.cycles++
    // 判断是否并行GC模式
    if mode == gcBackgroundMode { // Do as much work concurrently as possible
        // 标记新一轮GC已开始
        gcController.startCycle()
        work.heapGoal = memstats.next_gc
        // 设置全局变量中的GC状态为_GCmark
        // 然后启用写屏障
        // Enter concurrent mark phase and enable
        // write barriers.
        //
        // Because the world is stopped, all Ps will
        // observe that write barriers are enabled by
        // the time we start the world and begin
        // scanning.
        //
        // Write barriers must be enabled before assists are
        // enabled because they must be enabled before
        // any non-leaf heap objects are marked. Since
        // allocations are blocked until assists can
        // happen, we want enable assists as early as
        // possible.
        setGCPhase(_GCmark)
        // 重置后台标记任务的计数
        gcBgMarkPrepare() // Must happen before assist enable.
        // 计算扫描根对象的任务数量
        gcMarkRootPrepare()
        // 标记所有tiny alloc等待合并的对象
        // Mark all active tinyalloc blocks. Since we're
        // allocating from these, they need to be black like
        // other allocations. The alternative is to blacken
        // the tiny block on every allocation from it, which
        // would slow down the tiny allocator.
        gcMarkTinyAllocs()
        // 启用辅助GC
        // At this point all Ps have enabled the write
        // barrier, thus maintaining the no white to
        // black invariant. Enable mutator assists to
        // put back-pressure on fast allocating
        // mutators.
        atomic.Store(&gcBlackenEnabled, 1)
        // 记录标记开始的时间
        // Assists and workers can start the moment we start
        // the world.
        gcController.markStartTime = now
        // 重新启动世界
        // 前面创建的后台标记任务会开始工作, 所有后台标记任务都完成工作后, 进入完成标记阶段
        // Concurrent mark.
        systemstack(startTheWorldWithSema)
        // !!!!!!!!!!!!!!!
        // 世界已重新启动...
        // !!!!!!!!!!!!!!!
        // 记录停止了多久, 和标记阶段开始的时间
        now = nanotime()
        work.pauseNS += now - work.pauseStart
        work.tMark = now
    } else {
        // 不是并行GC模式
        // 记录完成标记阶段开始的时间
        t := nanotime()
        work.tMark, work.tMarkTerm = t, t
        work.heapGoal = work.heap0
        // 跳过标记阶段, 执行完成标记阶段
        // 所有标记工作都会在世界已停止的状态执行
        // (标记阶段会设置work.markrootDone=true, 如果跳过则它的值是false, 完成标记阶段会执行所有工作)
        // 完成标记阶段会重新启动世界
        // Perform mark termination. This will restart the world.
        gcMarkTermination(memstats.triggerRatio)
    }
    semrelease(&work.startSema)
}

接下来一个个分析gcStart调用的函数, 建议配合上面的"回收对象的流程"中的图理解.

函数gcBgMarkStartWorkers用于启动后台标记任务, 先分别对每个P启动一个:

// gcBgMarkStartWorkers prepares background mark worker goroutines.
// These goroutines will not run until the mark phase, but they must
// be started while the work is not stopped and from a regular G
// stack. The caller must hold worldsema.
func gcBgMarkStartWorkers() {
    // Background marking is performed by per-P G's. Ensure that
    // each P has a background GC G.
    for _, p := range &allp {
        if p == nil || p.status == _Pdead {
            break
        }
        // 如果已启动则不重复启动
        if p.gcBgMarkWorker == 0 {
            go gcBgMarkWorker(p)
            // 启动后等待该任务通知信号量bgMarkReady再继续
            notetsleepg(&work.bgMarkReady, -1)
            noteclear(&work.bgMarkReady)
        }
    }
}

这里虽然为每个P启动了一个后台标记任务, 但是可以同时工作的只有25%, 这个逻辑在协程M获取G时调用的findRunnableGCWorker中:

// findRunnableGCWorker returns the background mark worker for _p_ if it
// should be run. This must only be called when gcBlackenEnabled != 0.
func (c *gcControllerState) findRunnableGCWorker(_p_ *p) *g {
    if gcBlackenEnabled == 0 {
        throw("gcControllerState.findRunnable: blackening not enabled")
    }
    if _p_.gcBgMarkWorker == 0 {
        // The mark worker associated with this P is blocked
        // performing a mark transition. We can't run it
        // because it may be on some other run or wait queue.
        return nil
    }
    if !gcMarkWorkAvailable(_p_) {
        // No work to be done right now. This can happen at
        // the end of the mark phase when there are still
        // assists tapering off. Don't bother running a worker
        // now because it'll just return immediately.
        return nil
    }
    // 原子减少对应的值, 如果减少后大于等于0则返回true, 否则返回false
    decIfPositive := func(ptr *int64) bool {
        if *ptr > 0 {
            if atomic.Xaddint64(ptr, -1) >= 0 {
                return true
            }
            // We lost a race
            atomic.Xaddint64(ptr, +1)
        }
        return false
    }
    // 减少dedicatedMarkWorkersNeeded, 成功时后台标记任务的模式是Dedicated
    // dedicatedMarkWorkersNeeded是当前P的数量的25%去除小数点
    // 详见startCycle函数
    if decIfPositive(&c.dedicatedMarkWorkersNeeded) {
        // This P is now dedicated to marking until the end of
        // the concurrent mark phase.
        _p_.gcMarkWorkerMode = gcMarkWorkerDedicatedMode
    } else {
        // 减少fractionalMarkWorkersNeeded, 成功是后台标记任务的模式是Fractional
        // 上面的计算如果小数点后有数值(不能够整除)则fractionalMarkWorkersNeeded为1, 否则为0
        // 详见startCycle函数
        // 举例来说, 4个P时会执行1个Dedicated模式的任务, 5个P时会执行1个Dedicated模式和1个Fractional模式的任务
        if !decIfPositive(&c.fractionalMarkWorkersNeeded) {
            // No more workers are need right now.
            return nil
        }
        // 按Dedicated模式的任务的执行时间判断cpu占用率是否超过预算值, 超过时不启动
        // This P has picked the token for the fractional worker.
        // Is the GC currently under or at the utilization goal?
        // If so, do more work.
        //
        // We used to check whether doing one time slice of work
        // would remain under the utilization goal, but that has the
        // effect of delaying work until the mutator has run for
        // enough time slices to pay for the work. During those time
        // slices, write barriers are enabled, so the mutator is running slower.
        // Now instead we do the work whenever we're under or at the
        // utilization work and pay for it by letting the mutator run later.
        // This doesn't change the overall utilization averages, but it
        // front loads the GC work so that the GC finishes earlier and
        // write barriers can be turned off sooner, effectively giving
        // the mutator a faster machine.
        //
        // The old, slower behavior can be restored by setting
        //  gcForcePreemptNS = forcePreemptNS.
        const gcForcePreemptNS = 0
        // TODO(austin): We could fast path this and basically
        // eliminate contention on c.fractionalMarkWorkersNeeded by
        // precomputing the minimum time at which it's worth
        // next scheduling the fractional worker. Then Ps
        // don't have to fight in the window where we've
        // passed that deadline and no one has started the
        // worker yet.
        //
        // TODO(austin): Shorter preemption interval for mark
        // worker to improve fairness and give this
        // finer-grained control over schedule?
        now := nanotime() - gcController.markStartTime
        then := now + gcForcePreemptNS
        timeUsed := c.fractionalMarkTime + gcForcePreemptNS
        if then > 0 && float64(timeUsed)/float64(then) > c.fractionalUtilizationGoal {
            // Nope, we'd overshoot the utilization goal
            atomic.Xaddint64(&c.fractionalMarkWorkersNeeded, +1)
            return nil
        }
        _p_.gcMarkWorkerMode = gcMarkWorkerFractionalMode
    }
    // 安排后台标记任务执行
    // Run the background mark worker
    gp := _p_.gcBgMarkWorker.ptr()
    casgstatus(gp, _Gwaiting, _Grunnable)
    if trace.enabled {
        traceGoUnpark(gp, 0)
    }
    return gp
}

gcResetMarkState函数会重置标记相关的状态:

// gcResetMarkState resets global state prior to marking (concurrent
// or STW) and resets the stack scan state of all Gs.
//
// This is safe to do without the world stopped because any Gs created
// during or after this will start out in the reset state.
func gcResetMarkState() {
    // This may be called during a concurrent phase, so make sure
    // allgs doesn't change.
    lock(&allglock)
    for _, gp := range allgs {
        gp.gcscandone = false  // set to true in gcphasework
        gp.gcscanvalid = false // stack has not been scanned
        gp.gcAssistBytes = 0
    }
    unlock(&allglock)
    work.bytesMarked = 0
    work.initialHeapLive = atomic.Load64(&memstats.heap_live)
    work.markrootDone = false
}

stopTheWorldWithSema函数会停止整个世界, 这个函数必须在g0中运行:

// stopTheWorldWithSema is the core implementation of stopTheWorld.
// The caller is responsible for acquiring worldsema and disabling
// preemption first and then should stopTheWorldWithSema on the system
// stack:
//
//  semacquire(&worldsema, 0)
//  m.preemptoff = "reason"
//  systemstack(stopTheWorldWithSema)
//
// When finished, the caller must either call startTheWorld or undo
// these three operations separately:
//
//  m.preemptoff = ""
//  systemstack(startTheWorldWithSema)
//  semrelease(&worldsema)
//
// It is allowed to acquire worldsema once and then execute multiple
// startTheWorldWithSema/stopTheWorldWithSema pairs.
// Other P's are able to execute between successive calls to
// startTheWorldWithSema and stopTheWorldWithSema.
// Holding worldsema causes any other goroutines invoking
// stopTheWorld to block.
func stopTheWorldWithSema() {
    _g_ := getg()
    // If we hold a lock, then we won't be able to stop another M
    // that is blocked trying to acquire the lock.
    if _g_.m.locks > 0 {
        throw("stopTheWorld: holding locks")
    }
    lock(&sched.lock)
    // 需要停止的P数量
    sched.stopwait = gomaxprocs
    // 设置gc等待标记, 调度时看见此标记会进入等待
    atomic.Store(&sched.gcwaiting, 1)
    // 抢占所有运行中的G
    preemptall()
    // 停止当前的P
    // stop current P
    _g_.m.p.ptr().status = _Pgcstop // Pgcstop is only diagnostic.
    // 减少需要停止的P数量(当前的P算一个)
    sched.stopwait--
    // 抢占所有在Psyscall状态的P, 防止它们重新参与调度
    // try to retake all P's in Psyscall status
    for i := 0; i < int(gomaxprocs); i++ {
        p := allp[i]
        s := p.status
        if s == _Psyscall && atomic.Cas(&p.status, s, _Pgcstop) {
            if trace.enabled {
                traceGoSysBlock(p)
                traceProcStop(p)
            }
            p.syscalltick++
            sched.stopwait--
        }
    }
    // 防止所有空闲的P重新参与调度
    // stop idle P's
    for {
        p := pidleget()
        if p == nil {
            break
        }
        p.status = _Pgcstop
        sched.stopwait--
    }
    wait := sched.stopwait > 0
    unlock(&sched.lock)
    // 如果仍有需要停止的P, 则等待它们停止
    // wait for remaining P's to stop voluntarily
    if wait {
        for {
            // 循环等待 + 抢占所有运行中的G
            // wait for 100us, then try to re-preempt in case of any races
            if notetsleep(&sched.stopnote, 100*1000) {
                noteclear(&sched.stopnote)
                break
            }
            preemptall()
        }
    }
    // 逻辑正确性检查
    // sanity checks
    bad := ""
    if sched.stopwait != 0 {
        bad = "stopTheWorld: not stopped (stopwait != 0)"
    } else {
        for i := 0; i < int(gomaxprocs); i++ {
            p := allp[i]
            if p.status != _Pgcstop {
                bad = "stopTheWorld: not stopped (status != _Pgcstop)"
            }
        }
    }
    if atomic.Load(&freezing) != 0 {
        // Some other thread is panicking. This can cause the
        // sanity checks above to fail if the panic happens in
        // the signal handler on a stopped thread. Either way,
        // we should halt this thread.
        lock(&deadlock)
        lock(&deadlock)
    }
    if bad != "" {
        throw(bad)
    }
    // 到这里所有运行中的G都会变为待运行, 并且所有的P都不能被M获取
    // 也就是说所有的go代码(除了当前的)都会停止运行, 并且不能运行新的go代码
}

finishsweep_m函数会清扫上一轮GC未清扫的span, 确保上一轮GC已完成:

// finishsweep_m ensures that all spans are swept.
//
// The world must be stopped. This ensures there are no sweeps in
// progress.
//
//go:nowritebarrier
func finishsweep_m() {
    // sweepone会取出一个未sweep的span然后执行sweep
    // 详细将在下面sweep阶段时分析
    // Sweeping must be complete before marking commences, so
    // sweep any unswept spans. If this is a concurrent GC, there
    // shouldn't be any spans left to sweep, so this should finish
    // instantly. If GC was forced before the concurrent sweep
    // finished, there may be spans to sweep.
    for sweepone() != ^uintptr(0) {
        sweep.npausesweep++
    }
    // 所有span都sweep完成后, 启动一个新的markbit时代
    // 这个函数是实现span的gcmarkBits和allocBits的分配和复用的关键, 流程如下
    // - span分配gcmarkBits和allocBits
    // - span完成sweep
    //   - 原allocBits不再被使用
    //   - gcmarkBits变为allocBits
    //   - 分配新的gcmarkBits
    // - 开启新的markbit时代
    // - span完成sweep, 同上
    // - 开启新的markbit时代
    //   - 2个时代之前的bitmap将不再被使用, 可以复用这些bitmap
    nextMarkBitArenaEpoch()
}

clearpools函数会清理sched.sudogcache和sched.deferpool, 让它们的内存可以被回收:

func clearpools() {
    // clear sync.Pools
    if poolcleanup != nil {
        poolcleanup()
    }
    // Clear central sudog cache.
    // Leave per-P caches alone, they have strictly bounded size.
    // Disconnect cached list before dropping it on the floor,
    // so that a dangling ref to one entry does not pin all of them.
    lock(&sched.sudoglock)
    var sg, sgnext *sudog
    for sg = sched.sudogcache; sg != nil; sg = sgnext {
        sgnext = sg.next
        sg.next = nil
    }
    sched.sudogcache = nil
    unlock(&sched.sudoglock)
    // Clear central defer pools.
    // Leave per-P pools alone, they have strictly bounded size.
    lock(&sched.deferlock)
    for i := range sched.deferpool {
        // disconnect cached list before dropping it on the floor,
        // so that a dangling ref to one entry does not pin all of them.
        var d, dlink *_defer
        for d = sched.deferpool[i]; d != nil; d = dlink {
            dlink = d.link
            d.link = nil
        }
        sched.deferpool[i] = nil
    }
    unlock(&sched.deferlock)
}

startCycle标记开始了新一轮的GC:

// startCycle resets the GC controller's state and computes estimates
// for a new GC cycle. The caller must hold worldsema.
func (c *gcControllerState) startCycle() {
    c.scanWork = 0
    c.bgScanCredit = 0
    c.assistTime = 0
    c.dedicatedMarkTime = 0
    c.fractionalMarkTime = 0
    c.idleMarkTime = 0
    // 伪装heap_marked的值如果gc_trigger的值很小, 防止后面对triggerRatio做出错误的调整
    // If this is the first GC cycle or we're operating on a very
    // small heap, fake heap_marked so it looks like gc_trigger is
    // the appropriate growth from heap_marked, even though the
    // real heap_marked may not have a meaningful value (on the
    // first cycle) or may be much smaller (resulting in a large
    // error response).
    if memstats.gc_trigger <= heapminimum {
        memstats.heap_marked = uint64(float64(memstats.gc_trigger) / (1 + memstats.triggerRatio))
    }
    // 重新计算next_gc, 注意next_gc的计算跟gc_trigger不一样
    // Re-compute the heap goal for this cycle in case something
    // changed. This is the same calculation we use elsewhere.
    memstats.next_gc = memstats.heap_marked + memstats.heap_marked*uint64(gcpercent)/100
    if gcpercent < 0 {
        memstats.next_gc = ^uint64(0)
    }
    // 确保next_gc和heap_live之间最少有1MB
    // Ensure that the heap goal is at least a little larger than
    // the current live heap size. This may not be the case if GC
    // start is delayed or if the allocation that pushed heap_live
    // over gc_trigger is large or if the trigger is really close to
    // GOGC. Assist is proportional to this distance, so enforce a
    // minimum distance, even if it means going over the GOGC goal
    // by a tiny bit.
    if memstats.next_gc < memstats.heap_live+1024*1024 {
        memstats.next_gc = memstats.heap_live + 1024*1024
    }
    // 计算可以同时执行的后台标记任务的数量
    // dedicatedMarkWorkersNeeded等于P的数量的25%去除小数点
    // 如果可以整除则fractionalMarkWorkersNeeded等于0否则等于1
    // totalUtilizationGoal是GC所占的P的目标值(例如P一共有5个时目标是1.25个P)
    // fractionalUtilizationGoal是Fractiona模式的任务所占的P的目标值(例如P一共有5个时目标是0.25个P)
    // Compute the total mark utilization goal and divide it among
    // dedicated and fractional workers.
    totalUtilizationGoal := float64(gomaxprocs) * gcGoalUtilization
    c.dedicatedMarkWorkersNeeded = int64(totalUtilizationGoal)
    c.fractionalUtilizationGoal = totalUtilizationGoal - float64(c.dedicatedMarkWorkersNeeded)
    if c.fractionalUtilizationGoal > 0 {
        c.fractionalMarkWorkersNeeded = 1
    } else {
        c.fractionalMarkWorkersNeeded = 0
    }
    // 重置P中的辅助GC所用的时间统计
    // Clear per-P state
    for _, p := range &allp {
        if p == nil {
            break
        }
        p.gcAssistTime = 0
    }
    // 计算辅助GC的参数
    // 参考上面对计算assistWorkPerByte的公式的分析
    // Compute initial values for controls that are updated
    // throughout the cycle.
    c.revise()
    if debug.gcpacertrace > 0 {
        print("pacer: assist ratio=", c.assistWorkPerByte,
            " (scan ", memstats.heap_scan>>20, " MB in ",
            work.initialHeapLive>>20, "->",
            memstats.next_gc>>20, " MB)",
            " workers=", c.dedicatedMarkWorkersNeeded,
            "+", c.fractionalMarkWorkersNeeded, "\n")
    }
}

setGCPhase函数会修改表示当前GC阶段的全局变量和是否开启写屏障的全局变量:

//go:nosplit
func setGCPhase(x uint32) {
    atomic.Store(&gcphase, x)
    writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination
    writeBarrier.enabled = writeBarrier.needed || writeBarrier.cgo
}

gcBgMarkPrepare函数会重置后台标记任务的计数:

// gcBgMarkPrepare sets up state for background marking.
// Mutator assists must not yet be enabled.
func gcBgMarkPrepare() {
    // Background marking will stop when the work queues are empty
    // and there are no more workers (note that, since this is
    // concurrent, this may be a transient state, but mark
    // termination will clean it up). Between background workers
    // and assists, we don't really know how many workers there
    // will be, so we pretend to have an arbitrarily large number
    // of workers, almost all of which are "waiting". While a
    // worker is working it decrements nwait. If nproc == nwait,
    // there are no workers.
    work.nproc = ^uint32(0)
    work.nwait = ^uint32(0)
}

gcMarkRootPrepare函数会计算扫描根对象的任务数量:

// gcMarkRootPrepare queues root scanning jobs (stacks, globals, and
// some miscellany) and initializes scanning-related state.
//
// The caller must have call gcCopySpans().
//
// The world must be stopped.
//
//go:nowritebarrier
func gcMarkRootPrepare() {
    // 释放mcache中的所有span的任务, 只在完成标记阶段(mark termination)中执行
    if gcphase == _GCmarktermination {
        work.nFlushCacheRoots = int(gomaxprocs)
    } else {
        work.nFlushCacheRoots = 0
    }
    // 计算block数量的函数, rootBlockBytes是256KB
    // Compute how many data and BSS root blocks there are.
    nBlocks := func(bytes uintptr) int {
        return int((bytes + rootBlockBytes - 1) / rootBlockBytes)
    }
    work.nDataRoots = 0
    work.nBSSRoots = 0
    // data和bss每一轮GC只扫描一次
    // 并行GC中会在后台标记任务中扫描, 完成标记阶段(mark termination)中不扫描
    // 非并行GC会在完成标记阶段(mark termination)中扫描
    // Only scan globals once per cycle; preferably concurrently.
    if !work.markrootDone {
        // 计算扫描可读写的全局变量的任务数量
        for _, datap := range activeModules() {
            nDataRoots := nBlocks(datap.edata - datap.data)
            if nDataRoots > work.nDataRoots {
                work.nDataRoots = nDataRoots
            }
        }
        // 计算扫描只读的全局变量的任务数量
        for _, datap := range activeModules() {
            nBSSRoots := nBlocks(datap.ebss - datap.bss)
            if nBSSRoots > work.nBSSRoots {
                work.nBSSRoots = nBSSRoots
            }
        }
    }
    // span中的finalizer和各个G的栈每一轮GC只扫描一次
    // 同上
    if !work.markrootDone {
        // 计算扫描span中的finalizer的任务数量
        // On the first markroot, we need to scan span roots.
        // In concurrent GC, this happens during concurrent
        // mark and we depend on addfinalizer to ensure the
        // above invariants for objects that get finalizers
        // after concurrent mark. In STW GC, this will happen
        // during mark termination.
        //
        // We're only interested in scanning the in-use spans,
        // which will all be swept at this point. More spans
        // may be added to this list during concurrent GC, but
        // we only care about spans that were allocated before
        // this mark phase.
        work.nSpanRoots = mheap_.sweepSpans[mheap_.sweepgen/2%2].numBlocks()
        // 计算扫描各个G的栈的任务数量
        // On the first markroot, we need to scan all Gs. Gs
        // may be created after this point, but it's okay that
        // we ignore them because they begin life without any
        // roots, so there's nothing to scan, and any roots
        // they create during the concurrent phase will be
        // scanned during mark termination. During mark
        // termination, allglen isn't changing, so we'll scan
        // all Gs.
        work.nStackRoots = int(atomic.Loaduintptr(&allglen))
    } else {
        // We've already scanned span roots and kept the scan
        // up-to-date during concurrent mark.
        work.nSpanRoots = 0
        // The hybrid barrier ensures that stacks can't
        // contain pointers to unmarked objects, so on the
        // second markroot, there's no need to scan stacks.
        work.nStackRoots = 0
        if debug.gcrescanstacks > 0 {
            // Scan stacks anyway for debugging.
            work.nStackRoots = int(atomic.Loaduintptr(&allglen))
        }
    }
    // 计算总任务数量
    // 后台标记任务会对markrootNext进行原子递增, 来决定做哪个任务
    // 这种用数值来实现锁自由队列的办法挺聪明的, 尽管google工程师觉得不好(看后面markroot函数的分析)
    work.markrootNext = 0
    work.markrootJobs = uint32(fixedRootCount + work.nFlushCacheRoots + work.nDataRoots + work.nBSSRoots + work.nSpanRoots + work.nStackRoots)
}

版权申明：内容来源网络，版权归原创者所有。除非无法确认，我们都会标明作者及出处，如有侵权烦请告知，我们会立即删除并表示歉意。谢谢。