用户自建 Prometheus 采集控制面监控

最近更新时间:2025-09-04 10:51:42

我的收藏
本文介绍如何基于用户自建的 Prometheus,采集 TKE 托管集群的控制平面组件监控 APIServer、Scheduler、KCM 指标配置。

前提条件

自建的 Prometheus 能够访问 TKE 集群的 APIServer。详情请参见 连接集群
自建的 Prometheus 可以部署在 TKE 集群内,也可以部署在 TKE 集群外。

Prometheus 采集配置

使用自建的 Prometheus 采集 TKE 集群控制面核心组件指标时,首先需要在 Prometheus 的配置文件 prometheus.yaml 中配置指标采集 Job。配置文件格式如下:
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.


# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: kube-apiserver
......

- job_name: kube-controller-manager
......

- job_name: kube-scheduler
......
其中,每个核心组件对应一个 Job 配置,具体配置可参见对应核心组件的指标列表。社区 Prometheus 配置 Prometheus.yaml 方法,请参见 Configuration

TKE 集群内监控采集

内监控是将 Prometheus 部署在待监控的 TKE 集群内的监控形式。

kube-apiserver

关于 kube-apiserver 组件的更多信息,请参见 kube-apiserver 组件指标说明
scrape_configs:
- job_name: kube-apiserver
honor_timestamps: true
params:
component:
- apiserver
scrape_interval: 15s
metrics_path: /master/metrics
scheme: http
follow_redirects: true
enable_http2: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_managed_by,
__meta_kubernetes_service_labelpresent_app_kubernetes_io_managed_by]
separator: ;
regex: (Helm);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_label_qcloud_app, __meta_kubernetes_service_labelpresent_label_qcloud_app]
separator: ;
regex: (cluster-monitor);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_label_qcloud_service, __meta_kubernetes_service_labelpresent_label_qcloud_service]
separator: ;
regex: (master-metrics-service);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http-metrics
action: replace
metric_relabel_configs:
- source_labels: [pod]
separator: ;
regex: (.*)
target_label: instance
replacement: $1
action: replace
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
namespaces:
own_namespace: false
names:
- kube-system

kube-scheduler

关于 kube-scheduler 组件的更多信息,请参见 kube-scheduler 组件指标说明
scrape_configs:
- job_name: kube-scheduler
honor_labels: true
honor_timestamps: true
params:
component:
- scheduler
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /master/metrics
scheme: http
follow_redirects: true
enable_http2: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_managed_by,
__meta_kubernetes_service_labelpresent_app_kubernetes_io_managed_by]
separator: ;
regex: (Helm);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_label_qcloud_app, __meta_kubernetes_service_labelpresent_label_qcloud_app]
separator: ;
regex: (cluster-monitor);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_label_qcloud_service, __meta_kubernetes_service_labelpresent_label_qcloud_service]
separator: ;
regex: (master-metrics-service);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http-metrics
action: replace
metric_relabel_configs:
- source_labels: [pod]
separator: ;
regex: (.*)
target_label: instance
replacement: $1
action: replace
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
namespaces:
own_namespace: false
names:
- kube-system

kube-controller-manager

关于 kube-controller-manager 组件的更多信息,请参见 kube-controller-manager 组件指标说明
scrape_configs:
- job_name: kube-controller-manager
honor_labels: true
honor_timestamps: true
params:
component:
- controller-manager
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /master/metrics
scheme: http
follow_redirects: true
enable_http2: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_managed_by,
__meta_kubernetes_service_labelpresent_app_kubernetes_io_managed_by]
separator: ;
regex: (Helm);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_label_qcloud_app, __meta_kubernetes_service_labelpresent_label_qcloud_app]
separator: ;
regex: (cluster-monitor);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_label_qcloud_service, __meta_kubernetes_service_labelpresent_label_qcloud_service]
separator: ;
regex: (master-metrics-service);true
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: http-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: http-metrics
action: replace
metric_relabel_configs:
- source_labels: [pod]
separator: ;
regex: (.*)
target_label: instance
replacement: $1
action: replace
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
namespaces:
own_namespace: false
names:
- kube-system

通过 ServiceMonitor 进行采集

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: master-metrics-exporter
namespace: kube-system
spec:
endpoints:
- honorLabels: true
interval: 15s
metricRelabelings:
- action: replace
sourceLabels:
- pod
targetLabel: instance
params:
component:
- apiserver
path: /master/metrics
port: http-metrics
relabelings:
- action: replace
replacement: kube-apiserver
targetLabel: job
- honorLabels: true
interval: 15s
metricRelabelings:
- action: replace
sourceLabels:
- pod
targetLabel: instance
params:
component:
- controller-manager
path: /master/metrics
port: http-metrics
relabelings:
- action: replace
replacement: kube-controller-manager
targetLabel: job
- honorLabels: true
interval: 15s
metricRelabelings:
- action: replace
sourceLabels:
- pod
targetLabel: instance
params:
component:
- scheduler
path: /master/metrics
port: http-metrics
relabelings:
- action: replace
replacement: kube-scheduler
targetLabel: job
namespaceSelector: {}
selector:
matchLabels:
app.kubernetes.io/managed-by: Helm
label_qcloud_app: cluster-monitor
label_qcloud_service: master-metrics-service

TKE 集群外监控采集

本文以同 VPC 下集群外采集为示例。

前提条件

被采集集群已开启公网或内网访问。

开启跨集群指标采集

1. 登录 容器服务控制台,选择左侧导航栏中的集群
2. 集群管理页面,单击目标集群 ID,进入集群详情页。
3. 选择左侧导航中的组件管理
4. 单击 clustermonitor 组件的更新配置。
5. 在更新组件配置页面,勾选开启控制面组件监控开启集群外采集,选择容器子网。

说明:
选择容器子网后不支持更新,若需变更容器子网,请关闭控制面组件监控开启集群外采集能力后重新开启配置。
6. 单击完成

采集配置

具体操作请参见 ConfigurationMonitoring kubernetes with prometheus from outside of k8s cluster,与集群内配置相比,主要修改 kubernetes_sd_configs 配置,配置方式如下:
scrape_configs:
- job_name: out-tke-apiserver
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
namespaces:
names:
- kube-system
namespaces:
names:
- kube-system
api_server: 'https://<KUBERNETES URL>'
tls_config:
ca_file: /etc/prometheus/kubernetes-ca.crt
bearer_token: '<SERVICE ACCOUNT BEARER TOKEN>'
scrape_interval: 30s

验证

1. 登录自建的 Prometheus 控制台,切换到 Graph 页面。
2. 输入 up,查看是否全部控制面组件都可以显示。

3. 输入以下指令,验证 API Server 是否正常:
apiserver_request_total{job="apiserver"}

4. 输入以下指令,验证 Scheduler 是否正常:
rest_client_requests_total{job="scheduler"}

5. 输入以下指令,验证 Controller Manager 是否正常:
rest_client_requests_total{job="controller-manager"}