本文介绍如何基于用户自建的 Prometheus,采集 TKE 托管集群的控制平面组件监控 APIServer、Scheduler、KCM 指标配置。
前提条件
自建的 Prometheus 能够访问 TKE 集群的 APIServer。详情请参见 连接集群。
自建的 Prometheus 可以部署在 TKE 集群内,也可以部署在 TKE 集群外。
Prometheus 采集配置
使用自建的 Prometheus 采集 TKE 集群控制面核心组件指标时,首先需要在 Prometheus 的配置文件 prometheus.yaml 中配置指标采集 Job。配置文件格式如下:
global:scrape_interval: 15s # By default, scrape targets every 15 seconds.# A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: kube-apiserver......- job_name: kube-controller-manager......- job_name: kube-scheduler......
TKE 集群内监控采集
内监控是将 Prometheus 部署在待监控的 TKE 集群内的监控形式。
kube-apiserver
scrape_configs:- job_name: kube-apiserverhonor_timestamps: trueparams:component:- apiserverscrape_interval: 15smetrics_path: /master/metricsscheme: httpfollow_redirects: trueenable_http2: truerelabel_configs:- source_labels: [job]separator: ;regex: (.*)target_label: __tmp_prometheus_job_namereplacement: $1action: replace- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_managed_by,__meta_kubernetes_service_labelpresent_app_kubernetes_io_managed_by]separator: ;regex: (Helm);truereplacement: $1action: keep- source_labels: [__meta_kubernetes_service_label_label_qcloud_app, __meta_kubernetes_service_labelpresent_label_qcloud_app]separator: ;regex: (cluster-monitor);truereplacement: $1action: keep- source_labels: [__meta_kubernetes_service_label_label_qcloud_service, __meta_kubernetes_service_labelpresent_label_qcloud_service]separator: ;regex: (master-metrics-service);truereplacement: $1action: keep- source_labels: [__meta_kubernetes_endpoint_port_name]separator: ;regex: http-metricsreplacement: $1action: keep- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]separator: ;regex: Node;(.*)target_label: nodereplacement: ${1}action: replace- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]separator: ;regex: Pod;(.*)target_label: podreplacement: ${1}action: replace- source_labels: [__meta_kubernetes_namespace]separator: ;regex: (.*)target_label: namespacereplacement: $1action: replace- source_labels: [__meta_kubernetes_service_name]separator: ;regex: (.*)target_label: servicereplacement: $1action: replace- separator: ;regex: (.*)target_label: endpointreplacement: http-metricsaction: replacemetric_relabel_configs:- source_labels: [pod]separator: ;regex: (.*)target_label: instancereplacement: $1action: replacekubernetes_sd_configs:- role: endpointskubeconfig_file: ""follow_redirects: trueenable_http2: truenamespaces:own_namespace: falsenames:- kube-system
kube-scheduler
scrape_configs:- job_name: kube-schedulerhonor_labels: truehonor_timestamps: trueparams:component:- schedulerscrape_interval: 30sscrape_timeout: 10smetrics_path: /master/metricsscheme: httpfollow_redirects: trueenable_http2: truerelabel_configs:- source_labels: [job]separator: ;regex: (.*)target_label: __tmp_prometheus_job_namereplacement: $1action: replace- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_managed_by,__meta_kubernetes_service_labelpresent_app_kubernetes_io_managed_by]separator: ;regex: (Helm);truereplacement: $1action: keep- source_labels: [__meta_kubernetes_service_label_label_qcloud_app, __meta_kubernetes_service_labelpresent_label_qcloud_app]separator: ;regex: (cluster-monitor);truereplacement: $1action: keep- source_labels: [__meta_kubernetes_service_label_label_qcloud_service, __meta_kubernetes_service_labelpresent_label_qcloud_service]separator: ;regex: (master-metrics-service);truereplacement: $1action: keep- source_labels: [__meta_kubernetes_endpoint_port_name]separator: ;regex: http-metricsreplacement: $1action: keep- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]separator: ;regex: Node;(.*)target_label: nodereplacement: ${1}action: replace- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]separator: ;regex: Pod;(.*)target_label: podreplacement: ${1}action: replace- source_labels: [__meta_kubernetes_namespace]separator: ;regex: (.*)target_label: namespacereplacement: $1action: replace- source_labels: [__meta_kubernetes_service_name]separator: ;regex: (.*)target_label: servicereplacement: $1action: replace- separator: ;regex: (.*)target_label: endpointreplacement: http-metricsaction: replacemetric_relabel_configs:- source_labels: [pod]separator: ;regex: (.*)target_label: instancereplacement: $1action: replacekubernetes_sd_configs:- role: endpointskubeconfig_file: ""follow_redirects: trueenable_http2: truenamespaces:own_namespace: falsenames:- kube-system
kube-controller-manager
scrape_configs:- job_name: kube-controller-managerhonor_labels: truehonor_timestamps: trueparams:component:- controller-managerscrape_interval: 30sscrape_timeout: 10smetrics_path: /master/metricsscheme: httpfollow_redirects: trueenable_http2: truerelabel_configs:- source_labels: [job]separator: ;regex: (.*)target_label: __tmp_prometheus_job_namereplacement: $1action: replace- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_managed_by,__meta_kubernetes_service_labelpresent_app_kubernetes_io_managed_by]separator: ;regex: (Helm);truereplacement: $1action: keep- source_labels: [__meta_kubernetes_service_label_label_qcloud_app, __meta_kubernetes_service_labelpresent_label_qcloud_app]separator: ;regex: (cluster-monitor);truereplacement: $1action: keep- source_labels: [__meta_kubernetes_service_label_label_qcloud_service, __meta_kubernetes_service_labelpresent_label_qcloud_service]separator: ;regex: (master-metrics-service);truereplacement: $1action: keep- source_labels: [__meta_kubernetes_endpoint_port_name]separator: ;regex: http-metricsreplacement: $1action: keep- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]separator: ;regex: Node;(.*)target_label: nodereplacement: ${1}action: replace- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]separator: ;regex: Pod;(.*)target_label: podreplacement: ${1}action: replace- source_labels: [__meta_kubernetes_namespace]separator: ;regex: (.*)target_label: namespacereplacement: $1action: replace- source_labels: [__meta_kubernetes_service_name]separator: ;regex: (.*)target_label: servicereplacement: $1action: replace- separator: ;regex: (.*)target_label: endpointreplacement: http-metricsaction: replacemetric_relabel_configs:- source_labels: [pod]separator: ;regex: (.*)target_label: instancereplacement: $1action: replacekubernetes_sd_configs:- role: endpointskubeconfig_file: ""follow_redirects: trueenable_http2: truenamespaces:own_namespace: falsenames:- kube-system
通过 ServiceMonitor 进行采集
apiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata:name: master-metrics-exporternamespace: kube-systemspec:endpoints:- honorLabels: trueinterval: 15smetricRelabelings:- action: replacesourceLabels:- podtargetLabel: instanceparams:component:- apiserverpath: /master/metricsport: http-metricsrelabelings:- action: replacereplacement: kube-apiservertargetLabel: job- honorLabels: trueinterval: 15smetricRelabelings:- action: replacesourceLabels:- podtargetLabel: instanceparams:component:- controller-managerpath: /master/metricsport: http-metricsrelabelings:- action: replacereplacement: kube-controller-managertargetLabel: job- honorLabels: trueinterval: 15smetricRelabelings:- action: replacesourceLabels:- podtargetLabel: instanceparams:component:- schedulerpath: /master/metricsport: http-metricsrelabelings:- action: replacereplacement: kube-schedulertargetLabel: jobnamespaceSelector: {}selector:matchLabels:app.kubernetes.io/managed-by: Helmlabel_qcloud_app: cluster-monitorlabel_qcloud_service: master-metrics-service
TKE 集群外监控采集
本文以同 VPC 下集群外采集为示例。
前提条件
被采集集群已开启公网或内网访问。
开启跨集群指标采集
1. 登录 容器服务控制台,选择左侧导航栏中的集群。
2. 在集群管理页面,单击目标集群 ID,进入集群详情页。
3. 选择左侧导航中的组件管理。
4. 单击 clustermonitor 组件的更新配置。
5. 在更新组件配置页面,勾选开启控制面组件监控开启集群外采集,选择容器子网。

说明:
选择容器子网后不支持更新,若需变更容器子网,请关闭控制面组件监控开启集群外采集能力后重新开启配置。
6. 单击完成。
采集配置
具体操作请参见 Configuration 和 Monitoring kubernetes with prometheus from outside of k8s cluster,与集群内配置相比,主要修改 kubernetes_sd_configs 配置,配置方式如下:
scrape_configs:- job_name: out-tke-apiserverkubernetes_sd_configs:- role: endpointskubeconfig_file: ""follow_redirects: trueenable_http2: truenamespaces:names:- kube-systemnamespaces:names:- kube-systemapi_server: 'https://<KUBERNETES URL>'tls_config:ca_file: /etc/prometheus/kubernetes-ca.crtbearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'scrape_interval: 30s
验证
1. 登录自建的 Prometheus 控制台,切换到 Graph 页面。
2. 输入
up,查看是否全部控制面组件都可以显示。
3. 输入以下指令,验证 API Server 是否正常:
apiserver_request_total{job="apiserver"}

4. 输入以下指令,验证 Scheduler 是否正常:
rest_client_requests_total{job="scheduler"}

5. 输入以下指令,验证 Controller Manager 是否正常:
rest_client_requests_total{job="controller-manager"}
