前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >container_cpu_load_average_10s是如何统计的

container_cpu_load_average_10s是如何统计的

原创
作者头像
cdh
修改2024-04-23 20:32:10
4950
修改2024-04-23 20:32:10
举报
文章被收录于专栏:笔记+笔记+

有业务反馈监控基于container_cpu_load_average_10s监控指标在无业务流量的pod统计到的值一直在0-1之间波动,想了解下这里的原因,监控的计算公式为:max by (pod) (container_cpu_load_average_10s{container!="",container!~"sandbox|logrotate|sidecar",pod=~"$pod", container=~"$container"}) / 1000 / max by (pod) (kube_pod_container_resource_limits_cpu_cores{container!="",container!~"sandbox|logrotate|sidecar",pod=~"$pod", container=~"$container"})

从监控的计算公式看计算结果主要取决于每次采集到的container_cpu_load_average_10s值,接下来看下container_cpu_load_average_10s是如何被计算出来的。

首先看cadvisorupdateLoad函数可以知道loadAvg的值是根据如下公式计算得出:

cd.loadAvg = cd.loadAvg*cd.loadDecay +float64(newLoad)*(1.0-cd.loadDecay)

公式的含义就是取的是上一次采集计算出来的值cd.loadAvg乘以计算因子cd.loadDecay,然后加上当前采集

到的newLoad值乘以(1.0-cd.loadDecay)最后得出当前的cd.loadAvg值,cd.loadAvg

再乘以1000就得到container_cpu_load_average_10s的值

那么重点就要看下cd.loadDecay和updateLoad函数的参数newLoad是怎么计算得出的

代码语言:javascript
复制
// Calculate new smoothed load average using the new sample of runnable threads.
// The decay used ensures that the load will stabilize on a new constant value within
// 10 seconds.
func (cd *containerData) updateLoad(newLoad uint64) {
        if cd.loadAvg < 0 {
                cd.loadAvg = float64(newLoad) // initialize to the first seen sample for faster stabilization.
        } else {
                cd.loadAvg = cd.loadAvg*cd.loadDecay + float64(newLoad)*(1.0-cd.loadDecay)
        }
}

cd.loadDecay的值默认是固定的0.6321205588285577 ,实现方式如下:

代码语言:javascript
复制
func newContainerData(containerName string, memoryCache *memory.InMemoryCache, handler container.ContainerHandler, logUsage bool, collectorManager collector.CollectorManager, maxHousekeepingInterval time.Duration, allowDynamicHousekeeping bool, clock clock.Clock) (*containerData, error) {
    ......
    cont.loadDecay = math.Exp(float64(-cont.housekeepingInterval.Seconds() / 10))
    ......
}

从上面代码可以看到cd.loadDecay是由如下计算公式计算得出:

cont.loadDecay = math.Exp(float64(-cont.housekeepingInterval.Seconds() / 10)),

housekeepingInterval.Seconds默认值为10,写一段小代码直接代入相关数字算出cont.loadDecay值。

根据数学公式计算后得出的cont.loadDecay值为0.36787944117144233 , 1.0-cont.loadDecay=0.6321205588285577

代码语言:javascript
复制
[root@VM-10-5-centos goproject]# cat test.go
package main

import (
        "fmt"
        "math"
)

func main() {
        x := math.Exp(float64(-10 / 10))
        fmt.Print(x, "'s exponential value is ",(1.0-x))
}
[root@VM-10-5-centos goproject]# go run test.go
0.36787944117144233's exponential value is 0.6321205588285577

updateLoad(newLoad uint64)函数的newLoad参数值又是怎么获取的?

updateStats在调用updateLoad更新load时会将loadStats.NrRunning作为实参赋值给updateLoad的参数newLoad

代码语言:javascript
复制
func (cd *containerData) updateStats() error {
        stats, statsErr := cd.handler.GetStats()
        if statsErr != nil {
                // Ignore errors if the container is dead.
                if !cd.handler.Exists() {
                        return nil
                }

                // Stats may be partially populated, push those before we return an error.
                statsErr = fmt.Errorf("%v, continuing to push stats", statsErr)
        }
        if stats == nil {
                return statsErr
        }
        if cd.loadReader != nil {
                // TODO(vmarmol): Cache this path.
                path, err := cd.handler.GetCgroupPath("cpu")
                if err == nil {
                        loadStats, err := cd.loadReader.GetCpuLoad(cd.info.Name, path)
                        if err != nil {
                                return fmt.Errorf("failed to get load stat for %q - path %q, error %s", cd.info.Name, path, err)
                        }
                        stats.TaskStats = loadStats
                        cd.updateLoad(loadStats.NrRunning)
                        // convert to 'milliLoad' to avoid floats and preserve precision.
                        stats.Cpu.LoadAverage = int32(cd.loadAvg * 1000)
                }
        }
       ......
    }    

cadvisor通过如下调用链实现给内核发送request消息,request消息的cmd为CGROUPSTATS_CMD_ATTR_FD

cadvisor通过updateStats->GetCpuLoad->getLoadStats

->prepareCmdMessage

->conn.WriteMessage

发送完消息后cadvisor会通过conn.ReadMessage()等待内核响应并返回消息给cadvisor,cadvisor收到内核对cmd为CGROUPSTATS_CMD_ATTR_FD的响应后结果解析处理获取到容对应cgroup下各状态的进程数量存赋值给LoadStats。

loadStats.NrRunning就对应监控采集时间点有多少个正在running的线程。

代码语言:javascript
复制

// This mirrors kernel internal structure.
type LoadStats struct {
        // Number of sleeping tasks.
        NrSleeping uint64 `json:"nr_sleeping"`

        // Number of running tasks.
        NrRunning uint64 `json:"nr_running"`

        // Number of tasks in stopped state
        NrStopped uint64 `json:"nr_stopped"`

        // Number of tasks in uninterruptible state
        NrUninterruptible uint64 `json:"nr_uninterruptible"`

        // Number of tasks waiting on IO
        NrIoWait uint64 `json:"nr_io_wait"`
}

// Returns instantaneous number of running tasks in a group.
// Caller can use historical data to calculate cpu load.
// path is an absolute filesystem path for a container under the CPU cgroup hierarchy.
// NOTE: non-hierarchical load is returned. It does not include load for subcontainers.
func (r *NetlinkReader) GetCpuLoad(name string, path string) (info.LoadStats, error) {
        if len(path) == 0 {
                return info.LoadStats{}, fmt.Errorf("cgroup path can not be empty")
        }

        cfd, err := os.Open(path)
        if err != nil {
                return info.LoadStats{}, fmt.Errorf("failed to open cgroup path %s: %q", path, err)
        }
        defer cfd.Close()

        stats, err := getLoadStats(r.familyID, cfd, r.conn)
        if err != nil {
                return info.LoadStats{}, err
        }
        klog.V(4).Infof("Task stats for %q: %+v", path, stats)
        return stats, nil
}


// Get load stats for a task group.
// id: family id for taskstats.
// cfd: open file to path to the cgroup directory under cpu hierarchy.
// conn: open netlink connection used to communicate with kernel.
func getLoadStats(id uint16, cfd *os.File, conn *Connection) (info.LoadStats, error) {
        msg := prepareCmdMessage(id, cfd.Fd())
        err := conn.WriteMessage(msg.toRawMsg())
        if err != nil {
                return info.LoadStats{}, err
        }

        resp, err := conn.ReadMessage()
        if err != nil {
                return info.LoadStats{}, err
        }

        parsedmsg, err := parseLoadStatsResp(resp)
        if err != nil {
                return info.LoadStats{}, err
        }
        return parsedmsg.Stats, nil
}

/ Prepares message to query task stats for a task group.
func prepareCmdMessage(id uint16, cfd uintptr) (msg netlinkMessage) {
        buf := bytes.NewBuffer([]byte{})
        addAttribute(buf, unix.CGROUPSTATS_CMD_ATTR_FD, uint32(cfd), 4)
        return prepareMessage(id, unix.CGROUPSTATS_CMD_GET, buf.Bytes())
}

// Prepares the message and generic headers and appends attributes as data.
func prepareMessage(headerType uint16, cmd uint8, attributes []byte) (msg netlinkMessage) {
        msg.Header.Type = headerType
        msg.Header.Flags = syscall.NLM_F_REQUEST
        msg.GenHeader.Command = cmd
        msg.GenHeader.Version = 0x1
        msg.Data = attributes
        return msg
}

内核收到cadvisor发送的request消息后,会根据cmd值CGROUPSTATS_CMD_ATTR_FD

统计容器对应cgroup下各状态进程数量填充到cgroupstats返回给cadvisor.

代码语言:javascript
复制
static const struct genl_ops taskstats_ops[] = {
        {
                .cmd            = TASKSTATS_CMD_GET,
                .doit           = taskstats_user_cmd,
                .policy         = taskstats_cmd_get_policy,
                .flags          = GENL_ADMIN_PERM,
        },
        {
                .cmd            = CGROUPSTATS_CMD_GET,
                .doit           = cgroupstats_user_cmd,
                .policy         = cgroupstats_cmd_get_policy,
        },
};

struct cgroupstats {
        __u64   nr_sleeping;            /* Number of tasks sleeping */
        __u64   nr_running;             /* Number of tasks running */
        __u64   nr_stopped;             /* Number of tasks in stopped state */
        __u64   nr_uninterruptible;     /* Number of tasks in uninterruptible */
                                        /* state */
        __u64   nr_io_wait;             /* Number of tasks waiting on IO */
};

static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
{
        int rc = 0;
        struct sk_buff *rep_skb;
        struct cgroupstats *stats;
        struct nlattr *na;
        size_t size;
        u32 fd;
        struct fd f;

        na = info->attrs[CGROUPSTATS_CMD_ATTR_FD];
        if (!na)
                return -EINVAL;

        fd = nla_get_u32(info->attrs[CGROUPSTATS_CMD_ATTR_FD]);
        f = fdget(fd);
        if (!f.file)
                return 0;

        size = nla_total_size(sizeof(struct cgroupstats));

        rc = prepare_reply(info, CGROUPSTATS_CMD_NEW, &rep_skb,
                                size);
        if (rc < 0)
                goto err;

        na = nla_reserve(rep_skb, CGROUPSTATS_TYPE_CGROUP_STATS,
                                sizeof(struct cgroupstats));
        if (na == NULL) {
                nlmsg_free(rep_skb);
                rc = -EMSGSIZE;
                goto err;
        }

        stats = nla_data(na);
        memset(stats, 0, sizeof(*stats));

        rc = cgroupstats_build(stats, f.file->f_path.dentry);
        if (rc < 0) {
                nlmsg_free(rep_skb);
                goto err;
        }

        rc = send_reply(rep_skb, info);

err:
        fdput(f);
        return rc;                         
 }
 
 
 **
 * cgroupstats_build - build and fill cgroupstats
 * @stats: cgroupstats to fill information into
 * @dentry: A dentry entry belonging to the cgroup for which stats have
 * been requested.
 *
 * Build and fill cgroupstats so that taskstats can export it to user
 * space.
 */
int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry)
{
        struct kernfs_node *kn = kernfs_node_from_dentry(dentry);
        struct cgroup *cgrp;
        struct css_task_iter it;
        struct task_struct *tsk;

        /* it should be kernfs_node belonging to cgroupfs and is a directory */
        if (dentry->d_sb->s_type != &cgroup_fs_type || !kn ||
            kernfs_type(kn) != KERNFS_DIR)
                return -EINVAL;

        mutex_lock(&cgroup_mutex);

        /*
         * We aren't being called from kernfs and there's no guarantee on
         * @kn->priv's validity.  For this and css_tryget_online_from_dir(),
         * @kn->priv is RCU safe.  Let's do the RCU dancing.
         */
        rcu_read_lock();
        cgrp = rcu_dereference(*(void __rcu __force **)&kn->priv);
        if (!cgrp || cgroup_is_dead(cgrp)) {
                rcu_read_unlock();
                mutex_unlock(&cgroup_mutex);
                return -ENOENT;
        }
        rcu_read_unlock();

        css_task_iter_start(&cgrp->self, 0, &it);
        while ((tsk = css_task_iter_next(&it))) {
                switch (tsk->state) {
                case TASK_RUNNING:
                        stats->nr_running++;
                        break;
                case TASK_INTERRUPTIBLE:
                        stats->nr_sleeping++;
                        break;
                case TASK_UNINTERRUPTIBLE:
                        stats->nr_uninterruptible++;
                        break;
                case TASK_STOPPED:
                        stats->nr_stopped++;
                        break;
                default:
                        if (delayacct_is_task_waiting_on_io(tsk))
                                stats->nr_io_wait++;
                        break;
                }
        }
        css_task_iter_end(&it);

        mutex_unlock(&cgroup_mutex);
        return 0;
}

分析完container_cpu_load_average_10s是如何获取的,我们在实际场景来验证下结果:

部署脚本定期采集裸数据container_cpu_load_average_10s

代码语言:javascript
复制
#cat get-container_cpu_load_average_10s.sh

#!/bin/bash
while true
do
  date
  kubectl get --raw /api/v1/nodes/eklet-subnet-g2wkclr1/proxy/metrics/cadvisor | grep load | grep fb88e098-31b2-4d5c-bbcf-5257361abc1f | grep -E 'web|shell'
  sleep 0.5
done

以图示为例,采集到的container_cpu_load_average_10数值为632

根据代码算法,当监控上一次采集container_cpu_load_average_10s时刻采集到的running线程数为0时,10秒后下一时刻采集到running线程数为1时,这里算出来container_cpu_load_average_10s的值为: cd.loadAvg = cd.loadAvg*cd.loadDecay + float64(newLoad)*(1.0-cd.loadDecay)=0*0.36787944117144233+1*0.6321205588285577=0.632

container_cpu_load_average_10s=0.632*1000=632

当pod设置的cpu limit为2C,根据如下监控计算公式得出可以算出监控最终看到的load值为0.316:

max by (pod) (container_cpu_load_average_10s{container!="",container!~"sandbox|logrotate|sidecar",pod=~"$pod", container=~"$container"}) / 1000 / max by (pod) (kube_pod_container_resource_limits_cpu_cores{container!="",container!~"sandbox|logrotate|sidecar",pod=~"$pod", container=~"$container"})

632/1000/kube_pod_container_resource_limits_cpu_cores=632/1000/2=0.316

对应监控采集到的值:

注:内核源码提供了工具获取cgroup running进程,可通过内核源码自带工具tools/accounting/getdelays.c 获取对应值(详细参考:https://utcc.utoronto.ca/~cks/space/blog/linux/LoadAverageWhereFrom)

参考:https://github.com/google/cadvisor/issues/2286

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
容器服务
腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档