TKE Serverless 如何对接腾讯云原生 Prometheus 监控?
1. 登录 容器服务控制台,选择左侧导航中的 Prometheus 监控。
2. 创建监控实例,操作详情请参见 创建监控实例。
3. 完成创建后,在 Prometheus 监控列表页中单击监控实例名称进入监控实例详情页。
4. 在监控实例详情页,选择数据采集 > 集成容器服务。
5. 单击关联集群。如下图所示:
集群类型:选择 “Serverless 集群”。
集群:勾选当前 VPC 下需要关联的集群。
6. 单击确定完成关联集群。
7. 在关联集群页签,单击集群 ID 右侧的数据采集配置,完成数据采集规则配置。操作详情请参见 数据采集配置。
8. 在基本信息页签,查看 Grafana 信息。登录指定的 Grafana 地址并输入账号密码即可查看监控数据。
TKE Serverless 如何对接自建 Prometheus?
前提条件
已创建 Prometheus。
已安装 Prometheus Operator。
已配置 Grafana。
在 TKE Serverless 集群中,需要获取以下监控指标:
指标类型 | 采集源 | 发现类型 |
k8s资源指标 | kube-state-metrics | 通过 coredns 访问域名 |
容器运行时指标 | pod 的 metrics 接口 | k8s_sd pod 级别 |
监控 k8s 资源指标
若您希望监控 k8s 的资源指标,可以通过在 TKE Serverless 集群内部署 kube-state-metrics 组件及编写 ServiceMonitor 实现。
1. 在 TKE Serverless 集群内部署 kube-state-metrics 组件。
如果您在 TKE Serverless 的集群内已经部署了 Prometheus Operator 会发现对应的 kube-state-metrics 组件和 node exporter 的 Pod 是 pending 状态,这是因为它们并不适用于 TKE Serverless 集群的场景,node exporter 在 TKE Serverless 集群的监控中不需要使用,可以直接删除该 pod,同时我们需要重新部署 kube-state-metrics 组件,具体的部署内容如下所示:
kube-state-metrics-ClusterRole
apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata:labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: 1.9.7name: tke-kube-state-metricsrules:- apiGroups:- ""resources:- configmaps- secrets- nodes- pods- services- resourcequotas- replicationcontrollers- limitranges- persistentvolumeclaims- persistentvolumes- namespaces- endpointsverbs:- list- watch- apiGroups:- extensionsresources:- daemonsets- deployments- replicasets- ingressesverbs:- list- watch- apiGroups:- appsresources:- statefulsets- daemonsets- deployments- replicasetsverbs:- list- watch- apiGroups:- batchresources:- cronjobs- jobsverbs:- list- watch- apiGroups:- autoscalingresources:- horizontalpodautoscalersverbs:- list- watch- apiGroups:- authentication.k8s.ioresources:- tokenreviewsverbs:- create- apiGroups:- authorization.k8s.ioresources:- subjectaccessreviewsverbs:- create- apiGroups:- policyresources:- poddisruptionbudgetsverbs:- list- watch- apiGroups:- certificates.k8s.ioresources:- certificatesigningrequestsverbs:- list- watch- apiGroups:- storage.k8s.ioresources:- storageclasses- volumeattachmentsverbs:- list- watch- apiGroups:- admissionregistration.k8s.ioresources:- mutatingwebhookconfigurations- validatingwebhookconfigurationsverbs:- list- watch- apiGroups:- networking.k8s.ioresources:- networkpoliciesverbs:- list- watch- apiGroups:- coordination.k8s.ioresources:- leasesverbs:- list- watch
kube-state-metrics-service-ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: 1.9.7name: tke-kube-state-metricsroleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: tke-kube-state-metricssubjects:- kind: ServiceAccountname: tke-kube-state-metricsnamespace: kube-system
kube-state-metrics-deployment
apiVersion: apps/v1kind: Deploymentmetadata:labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: 1.9.7name: tke-kube-state-metricsnamespace: kube-systemspec:replicas: 1selector:matchLabels:app.kubernetes.io/name: kube-state-metricstemplate:metadata:labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: 1.9.7spec:containers:- image: ccr.ccs.tencentyun.com/tkeimages/kube-state-metrics:v1.9.7livenessProbe:httpGet:path: /healthzport: 8080initialDelaySeconds: 5timeoutSeconds: 5name: kube-state-metricsports:- containerPort: 8080name: http-metrics- containerPort: 8081name: telemetryreadinessProbe:httpGet:path: /port: 8081initialDelaySeconds: 5timeoutSeconds: 5securityContext:runAsUser: 65534serviceAccountName: tke-kube-state-metrics
kube-state-metrics-service
apiVersion: v1kind: Servicemetadata:labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: 1.9.7name: tke-kube-state-metricsnamespace: kube-systemspec:clusterIP: Noneports:- name: http-metricsport: 8180targetPort: http-metrics- name: telemetryport: 8181targetPort: telemetryselector:app.kubernetes.io/name: kube-state-metrics
kube-state-metrics-serviceaccount
apiVersion: v1kind: ServiceAccountmetadata:labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: 1.9.7name: tke-kube-state-metricsnamespace: kube-system
2. 在 TKE Serverless 集群内部署 ServiceMonitor。
ServiceMonitor 可以定义如何监控一组动态服务,部署 kube-state-metrics-servicemonitor 后,Prometheus 可以通过 kube-state-metrics 来收集 k8s 的资源指标。具体的部署内容如下所示:
kube-state-metrics-servicemonitor
apiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata:labels:app.kubernetes.io/name: kube-state-metricsapp.kubernetes.io/version: 1.9.7name: kube-state-metricsnamespace: kube-systemspec:endpoints:- interval: 15sport: http-metricsscrapeTimeout: 15shonorLabels: truejobLabel: app.kubernetes.io/nameselector:matchLabels:app.kubernetes.io/name: kube-state-metrics
监控容器运行时指标
TKE Serverless 中的 Pod 通过暴露9100端口向外提供监控数据,您可以通过访问
podip:9100/metrics
获取监控数据指标。相较于容器服务 TKE 标准的监控配置,监控 TKE Serverless 需要修改相应的配置文件,建议使用 Operator 的 additional scrape config 配置。此外,您也可以通过在 Pod 中添加 annotation 的方式对指定的 Pod 进行监控。1. 通过配置 Operator 的 additional\\sscrape\\sconfig 获取监控数据指标。
若您希望通过访问
podip:9100/metrics
获取监控数据指标,可执行以下步骤:1.1 新建 prometheus-additional.yaml 文件。
1.2 在文件中添加 scrape_configs。scrape_configs 内容如下所示:
- job_name: eks-infohonor_timestamps: truemetrics_path: /metricsscheme: httpkubernetes_sd_configs:- role: podbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokentls_config:insecure_skip_verify: truerelabel_configs:- source_labels: [__meta_kubernetes_pod_ip]separator: ;regex: (.*)target_label: __address__replacement: ${1}:9100action: replace- source_labels: [__meta_kubernetes_pod_name]separator: ;regex: (.*)target_label: pod_namereplacement: ${1}action: replacemetric_relabel_configs:- source_labels: [__name__]separator: ;regex: node_network_receive_packets_totaltarget_label: __name__replacement: container_network_receive_packets_totalaction: replace- source_labels: [__name__]separator: ;regex: node_network_receive_bytes_totaltarget_label: __name__replacement: container_network_receive_bytes_totalaction: replace- source_labels: [__name__]separator: ;regex: node_network_transmit_bytes_totaltarget_label: __name__replacement: container_network_transmit_bytes_totalaction: replace- source_labels: [__name__]separator: ;regex: node_network_transmit_packets_totaltarget_label: __name__replacement: container_network_transmit_packets_totalaction: replace- source_labels: [pod_name]separator: ;regex: (.*)target_label: podreplacement: $1action: replace- source_labels: [__name__]separator: ;regex: (container_network.*|pod_.*)replacement: $1action: keep- separator: ;regex: pod_name|node|unInstanceId|workload_kind|workload_namereplacement: $1action: labeldrop
1.3 完成部署后,连接 Grafana 获取相应数据。
2. 通过在 Pod 中添加 annotation 对指定 Pod 进行监控。
若您希望通过在 Pod 中添加 annotation 的方式对指定的 Pod 进行监控,可执行以下步骤:
2.1 修改需要进行采集的 Pod 的 yaml 文件,在 spec.template.metadata.annotations 中配置以下内容:
prometheus.io/scrape: 'true'prometheus.io/port: '9100'prometheus.io/path: 'metrics'
2.2 配置 scrape_configs。配置 scrape_configs 后,prometheus 会对所有配置过采集信息为 true 的 Pod 进行监控。scrape_configs 请参考以下配置:
- job_name: kubernetes-podshonor_timestamps: truemetrics_path: /metricsscheme: httpkubernetes_sd_configs:- role: podrelabel_configs:- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]separator: ;regex: "true"replacement: $1action: keep- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]separator: ;regex: (.+)target_label: __metrics_path__replacement: $1action: replace- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]separator: ;regex: ([^:]+)(?::\\d+)?;(\\d+)target_label: __address__replacement: $1:$2action: replace- separator: ;regex: __meta_kubernetes_pod_label_(.+)replacement: $1action: labelmap- source_labels: [__meta_kubernetes_namespace]separator: ;regex: (.*)target_label: kubernetes_namespacereplacement: $1action: replace- source_labels: [__meta_kubernetes_pod_name]separator: ;regex: (.*)target_label: kubernetes_pod_namereplacement: $1action: replace