kubernetes 1.11+/openshift3.11
首选需要注册一个apiservice(custom metrics API)。
当HPA请求metrics时,kube-aggregator
(apiservice的controller)会将请求转发到adapter,adapter作为kubernentes集群的pod,实现了Kubernetes resource metrics API and custom metrics API,它会根据配置的rules从Prometheus抓取并处理metrics,在处理(如重命名metrics等)完后将metric通过custom metrics API返回给HPA。最后HPA通过获取的metrics的value对Deployment/ReplicaSet进行扩缩容。
adapter作为extension-apiserver(即自己实现的pod)
,充当了代理kube-apiserver请求Prometheus的功能。
如下是k8s-prometheus-adapter apiservice的定义,kube-aggregator
通过下面的service
将请求转发给adapter。v1beta1.custom.metrics.k8s.io
是写在k8s-prometheus-adapter代码中的,因此不能任意改变。
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.custom.metrics.k8s.io
spec:
service:
name: custom-metrics-apiserver
namespace: custom-metrics
group: custom.metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
directxman12/k8s-prometheus-adapter:latest
,修改镜像tag并push到本地镜像仓库manifests/
目录下,该证书用于kube-aggregator
与adapter通信时认证adapter。注意下面证书有效时间为5年(43800h)以及授权的域名。
#!/usr/bin/env bash # exit immediately when a command fails set -e # only exit with zero if all commands of the pipeline exit successfully set -o pipefail # error on unset variables set -u # Detect if we are on mac or should use GNU base64 options case $(uname) in Darwin) b64_opts='-b=0' ;; *) b64_opts='--wrap=0' esac go get -v -u github.com/cloudflare/cfssl/cmd/... export PURPOSE=metrics echo '{"signing":{"default":{"expiry":"43800h","usages":["signing","key encipherment","'${PURPOSE}'"]}}}' > "ca-config.json" export SERVICE_NAME=custom-metrics-apiserver export ALT_NAMES='"custom-metrics-apiserver.custom-metrics","custom-metrics-apiserver.custom-metrics.svc"' echo "{\"CN\":\"${SERVICE_NAME}\", \"hosts\": [${ALT_NAMES}], \"key\": {\"algo\": \"rsa\",\"size\": 2048}}" | \ cfssl gencert -ca=ca.crt -ca-key=ca.key -config=ca-config.json - | cfssljson -bare apiserver cat <<-EOF > cm-adapter-serving-certs.yaml apiVersion: v1 kind: Secret metadata: name: cm-adapter-serving-certs data: serving.crt: $(base64 ${b64_opts} < apiserver.pem) serving.key: $(base64 ${b64_opts} < apiserver-key.pem) EOF
可以在custom-metrics-apiservice.yaml中设置insecureSkipTLSVerify: true
时,kube-aggregator
不会校验adapter的如上证书。如果需要启用校验,则需要在caBundle中添加openshift集群的ca证书(非openshift集群的自签证书会被认为是不可信任的证书),将openshift集群master节点的/etc/origin/master/ca.crt进行base64转码黏贴到caBundle字段即可。
base64 ca.crt
也可以黏贴openshift集群master节点的/root/.kube/config文件中的clusters.cluster.certificate-authority-data
字段
kubectl create namespace custom-metrics
extension-apiserver-authentication-reader
,如果不存在,则需要创建
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults name: extension-apiserver-authentication-reader namespace: kube-system rules: - apiGroups: - "" resourceNames: - extension-apiserver-authentication resources: - configmaps verbs: - get--prometheus-url
字段,指向正确的prometheuskubectl create -f manifests/
在部署时会创建一个名为custom-metrics-resource-reader
的clusterRole
,用于授权adapter读取kubernetes cluster的资源,可以看到其允许读取的资源为namespaces/pods/services
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: custom-metrics-resource-reader rules: - apiGroups: - "" resources: - namespaces - pods - services verbs: - get - listcustom-metrics
命名空间下验证可以获取到metrics
curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metricsnamespace
和pod
等kubernetes的资源信息,因此需要使用servicemonitor注册方式来为metrics添加这些信息
custom-metrics
命名空间添加标签
oc label namespace custom-metrics openshift.io/cluster-monitoring=trueopenshift-monitoring
命名空间中创建service-monitor
# cat service-monitor.yaml kind: ServiceMonitor apiVersion: monitoring.coreos.com/v1 metadata: name: sample-app labels: k8s-app: testsample app: sample-app spec: namespaceSelector: any: true selector: matchLabels: app: sample-app endpoints: - port: httpoc describe hpa sample-app
查看hpa是否运行正常curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics
发出请求kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/custom-metrics/pods/*/http_requests_per_second"
查看其对应的value
值,当其值大于500m时开始扩容
# oc get pod NAME READY STATUS RESTARTS AGE sample-app-6d55487cdd-dc6qz 1/1 Running 0 18h sample-app-6d55487cdd-w6bbb 1/1 Running 0 5m sample-app-6d55487cdd-zbdbr 1/1 Running 0 5mkubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/custom-metrics/pods/*/http_requests_per_second"
的值持续低于500m时进行缩容,缩容时间由--horizontal-pod-autoscaler-downscale-stabilization
指定,默认5分钟。
提供oc get hpa
的TARGETS
字段可以查看扩缩容比例
# oc get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE sample-app Deployment/sample-app 66m/500m 1 10 1 3h部署adapter前需要配置adapter的rule,用于预处理metrics,默认配置为manifests/custom-metrics-config-map.yaml
。adapter的配置主要分为4个:
kubectl api-resources
命令查看。overrides会将Prometheus metric label与一个kubernetes resource(下例为deployment)关联。需要注意的是该label必须是一个真实的kubernetes resource,如metric的pod_name可以映射为kubernetes的pod resource,但不能将container_image映射为kubernetes的pod resource,映射错误会导致无法通过custom metrics API获取正确的值。这也表示metric中必须存在一个真实的resource 名称,将其映射为kubernetes resource。
resources: overrides: microservice: {group: "apps", resource: "deployment"}curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics
获得的仍然是老的metric名称。如果不需要可以不执行这一步。
# match turn any name <name>_total to <name>_per_second # e.g. http_requests_total becomes http_requests_per_second name: matches: "^(.*)_total$" as: "${1}_per_second"
如本例中HPA后续可以通过/apis/{APIService-name}/v1beta1/namespaces/{namespaces-name}/pods/*/http_requests_per_second
获取metricsmetricsQuery
字段使用Go template将URL请求转变为Prometheus的请求,它会提取custom metrics API请求中的字段,并将其划分为metric name,group-resource,以及group-resource中的一个或多个objects,对应如下字段:
Series
: metric名称LabelMatchers
: 以逗号分割的objects,当前表示特定group-resource加上命名空间的label(如果该group-resource 是namespaced的)GroupBy
:以逗号分割的label的集合,当前表示LabelMatchers中的group-resource label 假设metrics http_requests_per_second
如下
http_requests_per_second{pod="pod1",service="nginx1",namespace="somens"} http_requests_per_second{pod="pod2",service="nginx2",namespace="somens"}
当调用kubectl get --raw "/apis/{APIService-name}/v1beta1/namespaces/somens/pods/*/http_request_per_second"
时,metricsQuery
字段的模板的实际内容如下:
Series: "http_requests_total"
LabelMatchers: "pod=~\"pod1|pod2",namespace="somens"
GroupBy:pod
adapter使用字段rules
和externalRules
分别表示custom metrics和external metrics,如本例中
apiVersion: v1 kind: ConfigMap metadata: name: adapter-config namespace: openshift-monitoring data: config.yaml: | externalRules: - seriesQuery: '{namespace!="",pod!=""}' seriesFilters: [] resources: overrides: namespace: resource: namespace pod: resource: pod metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[22m])) by (<<.GroupBy>>) rules: - seriesQuery: '{namespace!="",pod!=""}' seriesFilters: [] resources: overrides: namespace: resource: namespace pod: resource: pod name: matches: "^(.*)_total" as: "${1}_per_second" metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)
HPA通常会根据type从aggregated APIs (metrics.k8s.io
, custom.metrics.k8s.io
, external.metrics.k8s.io
)的资源路径上拉取metrics
HPA支持的metrics类型有4种(下述为v2beta2的格式):
cpu
和memory
。target可以指定数值(targetAverageValue
)和比例(targetAverageUtilization
)进行扩缩容
HPA从metrics.k8s.io
获取resource metrics
targetAverageValue
)进行扩缩容。targetAverageValue
用于计算所有相关pods上的metrics的平均值
type: Pods pods: metric: name: packets-per-second target: type: AverageValue averageValue: 1k
HPA从custom.metrics.k8s.io
获取custom metrics
value
和AverageValue
进行扩缩容,前者直接将metric与target比较进行扩缩容,后者通过metric/相关的pod数目
与target比较进行扩缩容
type: Object object: metric: name: requests-per-second describedObject: apiVersion: extensions/v1beta1 kind: Ingress name: main-route target: type: Value value: 2kvalue
和AverageValue
进行扩缩容。由于external会尝试匹配所有kubernetes资源的metrics,因此实际中不建议使用该类型。
HPA从external.metrics.k8s.io
获取external metrics
- type: External external: metric: name: queue_messages_ready selector: "queue=worker_tasks" target: type: AverageValue averageValue: 30注:target的value的一个单位可以划分为1000份,每一份以m
为单位,如500m表示1/2
个单位。参见Quantity
kubernetes HPA的算法如下:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
当使用targetAverageValue
或targetAverageUtilization
时,currentMetricValue会取HPA指定的所有pods的metric的平均值
假设注册的APIService为custom.metrics.k8s.io/v1beta1,在注册好APIService后HorizontalPodAutoscaler controller会从以/apis/custom.metrics.k8s.io/v1beta1
为根API的路径上抓取metrics。metrics的API path可以分为namespaced
和non-namespaced
类型的。通过如下方式校验HPA是否可以获取到metrics:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}"
如获取monitor
命名空间下名为grafana
的pod的start_time_seconds
metric
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitor/pods/grafana/start_time_seconds"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}"
如获取monitor
命名空间下名为所有pod的start_time_seconds
metric
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitor/pods/*/start_time_seconds"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}?labelSelector={label-name}"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}?labelSelector={label-name}"
non-namespaced和namespaced的类似,主要有node,namespace,PersistentVolume等。non-namespaced访问有些与custom metrics API描述不一致。
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/metrics/{metric-name...}"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/*/metrics/{metric-name...}"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/nodes/{node-name}/{metric-name...}"
oc get apiservice v1beta1.custom.metrics.k8s.io -oyaml
查看status
和message
的相关信息
如果获取到的resource为空,则需要校验deploy中的Prometheus url是否正确,是否有权限等--metrics-relist-interval
设置值大于Prometheus的参数scrape_interval
rules
的seriesQuery
规则可以抓取到Prometheus的数据rules
的metricsQuery
规则可以抓取到计算出数据,此处需要注意的是,如果使用到了计算某段时间的数据,如果时间设置过短,可能导致没有数据生成pod
和namespace
label,否则在官方默认配置下无法采集到metrics。--secure-port=6443 --tls-cert-file=D:\adapter\serving.crt --tls-private-key-file=D:\adapter\serving.key --logtostderr=true --prometheus-url=${prometheus-url} --metrics-relist-interval=70s --v=10 --config=D:\adapter\config.yaml --lister-kubeconfig=D:\adapter\k8s-config.yaml --authorization-kubeconfig=D:\adapter\k8s-config.yaml --authentication-kubeconfig=D:\adapter\k8s-config.yaml
Kubernetes pod autoscaler using custom metrics
Kubernetes API Aggregation Setup — Nuts & Bolts