前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >page compaction代码分析之一

page compaction代码分析之一

作者头像
DragonKingZhu
发布2020-04-30 18:18:16
7270
发布2020-04-30 18:18:16
举报

重要的数据结构

代码语言:javascript
复制
/*
 * Determines how hard direct compaction should try to succeed.
 * Lower value means higher priority, analogically to reclaim priority.
 */
enum compact_priority {
    COMPACT_PRIO_SYNC_FULL,
    MIN_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_FULL,
    COMPACT_PRIO_SYNC_LIGHT,
    MIN_COMPACT_COSTLY_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
    DEF_COMPACT_PRIORITY = COMPACT_PRIO_SYNC_LIGHT,
    COMPACT_PRIO_ASYNC,
    INIT_COMPACT_PRIORITY = COMPACT_PRIO_ASYNC
};

这个结构是compaction的优先级

  • COMPACT_PRIO_SYNC_FULL:完全同步模式,允许祖塞,允许将脏页写回到存储设备上,直到等待完成
  • COMPACT_PRIO_SYNC_LIGHT: 轻量级同步模式,允许绝大多数祖塞,但是不允许将脏页写回到存储设备上,因为等待时间比较长
  • COMPACT_PRIO_ASYNC: 异步模式,不允许祖塞
  • 优先级关系: COMPACT_PRIO_SYNC_FULL > COMPACT_PRIO_SYNC_LIGHT > COMPACT_PRIO_ASYNC
  • compation对应的成本:COMPACT_PRIO_SYNC_FULL > COMPACT_PRIO_SYNC_LIGHT > COMPACT_PRIO_ASYNC
  • 完全同步成功率最高

再来看下compaction的成功是否的状态

代码语言:javascript
复制
/* Return values for compact_zone() and try_to_compact_pages() */
/* When adding new states, please adjust include/trace/events/compaction.h */
enum compact_result {
    /* For more detailed tracepoint output - internal to compaction */
    COMPACT_NOT_SUITABLE_ZONE,
    /*
     * compaction didn't start as it was not possible or direct reclaim
     * was more suitable
     */
    COMPACT_SKIPPED,
    /* compaction didn't start as it was deferred due to past failures */
    COMPACT_DEFERRED,
 
    /* compaction not active last round */
    COMPACT_INACTIVE = COMPACT_DEFERRED,
 
    /* For more detailed tracepoint output - internal to compaction */
    COMPACT_NO_SUITABLE_PAGE,
    /* compaction should continue to another pageblock */
    COMPACT_CONTINUE,
 
    /*
     * The full zone was compacted scanned but wasn't successfull to compact
     * suitable pages.
     */
    COMPACT_COMPLETE,
    /*
     * direct compaction has scanned part of the zone but wasn't successfull
     * to compact suitable pages.
     */
    COMPACT_PARTIAL_SKIPPED,
 
    /* compaction terminated prematurely due to lock contentions */
    COMPACT_CONTENDED,
 
    /*
     * direct compaction terminated after concluding that the allocation
     * should now succeed
     */
    COMPACT_SUCCESS,
};
  • COMPACT_SKIPPED: 跳过此zone,可能此zone不适合
  • COMPACT_DEFERRED:此zone不能开始,是由于此zone最近失败过
  • COMPACT_CONTINUE:继续尝试做page compaction
  • COMPACT_COMPLETE: 对整个zone扫描已经完成,但是没有规整出合适的页
  • COMPACT_PARTIAL_SKIPPED: 扫描了部分的zone,但是没有找到合适的页
  • COMPACT_SUCCESS:规整成功,并且合并出空闲的页

fragmentation index(碎片指数)

当我们申请内存失败的时候有两种原因:

  • 内存不够
  • 内存碎片太多

那怎么确定到底是什么原因导致分配失败的,所以就出现了碎片指数。取值范围[0 1000]

  • 碎片指数趋近于0,说明申请内存失败原因是由于内存不足
  • 碎片指数趋近于1000,说明申请内存失败原因是内存碎片太多

当然了内核同时提供了一个值,来控制碎片指数。int sysctl_extfrag_threshold = 500; 默认值是500

代码语言:javascript
复制
root:/ # cat /proc/sys/vm/extfrag_threshold
500

这个值默认是500的,如果设置太大,则每次申请内存失败,都会归结为内存不够。如果申请太小,则page compaction就会太频繁,系统负载就会增加

判断一个zone是否合适做page compaction

代码语言:javascript
复制
enum compact_result compaction_suitable(struct zone *zone, int order,
                    unsigned int alloc_flags,
                    int classzone_idx)
{
    enum compact_result ret;
    int fragindex;
 
    ret = __compaction_suitable(zone, order, alloc_flags, classzone_idx,
                    zone_page_state(zone, NR_FREE_PAGES));
    /*
     * fragmentation index determines if allocation failures are due to
     * low memory or external fragmentation
     *
     * index of -1000 would imply allocations might succeed depending on
     * watermarks, but we already failed the high-order watermark check
     * index towards 0 implies failure is due to lack of memory
     * index towards 1000 implies failure is due to fragmentation
     *
     * Only compact if a failure would be due to fragmentation. Also
     * ignore fragindex for non-costly orders where the alternative to
     * a successful reclaim/compaction is OOM. Fragindex and the
     * vm.extfrag_threshold sysctl is meant as a heuristic to prevent
     * excessive compaction for costly orders, but it should not be at the
     * expense of system stability.
     */
    if (ret == COMPACT_CONTINUE && (order > PAGE_ALLOC_COSTLY_ORDER)) {
        fragindex = fragmentation_index(zone, order);
        if (fragindex >= 0 && fragindex <= sysctl_extfrag_threshold)
            ret = COMPACT_NOT_SUITABLE_ZONE;
    }
 
    trace_mm_compaction_suitable(zone, order, ret);
    if (ret == COMPACT_NOT_SUITABLE_ZONE)
        ret = COMPACT_SKIPPED;
 
    return ret;
}
  • __compaction_suitable 此函数主要用来判断此zone是否合适做page compaction
  • 如果此函数返回的是COMPACT_CONTINUE,而且order是昂贵的分配,则就会去获取碎片指数,如果碎片指数在[0-500]之间,则此zone不适合做page compaction
  • 最终返回的结果是跳过此zone=COMPACT_SKIPPED
代码语言:javascript
复制
static enum compact_result __compaction_suitable(struct zone *zone, int order,
                    unsigned int alloc_flags,
                    int classzone_idx,
                    unsigned long wmark_target)
{
    unsigned long watermark;
 
    if (is_via_compact_memory(order))
        return COMPACT_CONTINUE;
 
    watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
    /*
     * If watermarks for high-order allocation are already met, there
     * should be no need for compaction at all.
     */
    if (zone_watermark_ok(zone, order, watermark, classzone_idx,
                                alloc_flags))
        return COMPACT_SUCCESS;
 
    watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
                low_wmark_pages(zone) : min_wmark_pages(zone);
    watermark += compact_gap(order);
    if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
                        ALLOC_CMA, wmark_target))
        return COMPACT_SKIPPED;
 
    return COMPACT_CONTINUE;
}
  • 如果是通过设置/proc/sys/vm/compaction_memory,则order=-1, 则规整继续
  • 如果不是通过设置compaction_memory节点的,则先获取当前zone的水位
  • 通过zone_watermark_ok函数可以判断当前zone的内存很足够,(空闲页面-申请页面 >=水位)则返回COMPACT_SUCCESS说明此zone不需要做内存规整
  • 根据当前zone是否是昂贵的order,如果是昂贵的order,则获取low水位的值,否则获取min水位的值
  • 如果(空闲页面-申请页面)>= watetmark + 2*order的页,则此zone需要做内存规整

内存碎片整理推迟

为什么需要内存碎片整理推迟?

如果上次内存碎片整理失败,当下一次进行内存碎片整理的时候和上一次很近,如果不推迟的话有可能还会失败,白白的增加系统的负载。所以当下一次进行内存碎片整理的时候,则需要推迟。在结构体zone中就定义了推迟整理的几个字段

代码语言:javascript
复制
struct zone {
#ifdef CONFIG_COMPACTION
    /*
     * On compaction failure, 1<<compact_defer_shift compactions
     * are skipped before trying again. The number attempted since
     * last failure is tracked with compact_considered.
     */
    unsigned int        compact_considered;
    unsigned int        compact_defer_shift;
    int         compact_order_failed;
#endif
}
  • compact_considered代表推迟的次数
  • compact_defer_shift是推迟的次数以2的底数
  • compact_order_failed记录碎片整理是分配的order数

当做碎片整理失败的时候,会调用到此函数

代码语言:javascript
复制
/* Do not skip compaction more than 64 times */
#define COMPACT_MAX_DEFER_SHIFT 6
 
 
void defer_compaction(struct zone *zone, int order)
{
    zone->compact_considered = 0;
    zone->compact_defer_shift++;
 
    if (order < zone->compact_order_failed)
        zone->compact_order_failed = order;
 
    if (zone->compact_defer_shift > COMPACT_MAX_DEFER_SHIFT)
        zone->compact_defer_shift = COMPACT_MAX_DEFER_SHIFT;
}
  • 将compact_defer_shift加1, 如果compact_defer_shift的值大于6,则设置为6
  • 说明最大最迟次数为64次,当超过64次之后,则不能推迟了。
  • 如果申请的order小于compact_order_failed, 则设置compact_order_failed=order

当做碎片整理成功的时候,则会调用到compaction_defer_reset函数

代码语言:javascript
复制
void compaction_defer_reset(struct zone *zone, int order, bool alloc_success)
{
    if (alloc_success) {
        zone->compact_considered = 0;
        zone->compact_defer_shift = 0;
    }
    if (order >= zone->compact_order_failed)
        zone->compact_order_failed = order + 1;
}
  • 当分配成功后,会将compact_defer_shift设置为0
  • 同时如果order大于等于compact_order_failed时,则将compact_order_failed设置为order+1

如何确定本次碎片整理是否结束

  • 当迁移扫描器和空闲扫描器相遇的时候,则怎么本次碎片整理结束
  • 当迁移扫描器和空闲扫描器没有相遇,但是从此zone中的根据迁移类型可以从freelist中获取一个大得空闲的页,或者从备用的迁移类型中可以获取一个大的空闲的页,则认为本次碎片整理结束
代码语言:javascript
复制
static enum compact_result __compact_finished(struct compact_control *cc)
{
        if (compact_scanners_met(cc)) {                          //相遇了
        /* Let the next compaction start anew. */
        reset_cached_positions(cc->zone);
 
        /*
         * Mark that the PG_migrate_skip information should be cleared
         * by kswapd when it goes to sleep. kcompactd does not set the
         * flag itself as the decision to be clear should be directly
         * based on an allocation request.
         */
        if (cc->direct_compaction)
            cc->zone->compact_blockskip_flush = true;
 
        if (cc->whole_zone)
            return COMPACT_COMPLETE;
        else
            return COMPACT_PARTIAL_SKIPPED;
    }
 
 
    for (order = cc->order; order < MAX_ORDER; order++) {
        struct free_area *area = &cc->zone->free_area[order];
        bool can_steal;
 
        /* Job done if page is free of the right migratetype */          //有空闲内存了,返回SUCCESS
        if (!list_empty(&area->free_list[migratetype]))
            return COMPACT_SUCCESS;
 
#ifdef CONFIG_CMA
        /* MIGRATE_MOVABLE can fallback on MIGRATE_CMA */
        if (migratetype == MIGRATE_MOVABLE &&
            !list_empty(&area->free_list[MIGRATE_CMA]))
            return COMPACT_SUCCESS;
#endif
 
 
 
 
        if (find_suitable_fallback(area, order, migratetype,            //从备用迁移类型中分配成功了。
                        true, &can_steal, cc->order) != -1) {
 
            /* movable pages are OK in any pageblock */
            if (migratetype == MIGRATE_MOVABLE)
                return COMPACT_SUCCESS;
}
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2020-04-23 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 重要的数据结构
  • fragmentation index(碎片指数)
  • 判断一个zone是否合适做page compaction
  • 内存碎片整理推迟
  • 如何确定本次碎片整理是否结束
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档