上篇文章我们讲解了自定义监控,但是我们会发现,每次新增一个监控对象都需要我们手动的创建一个ServiceMonitor,这无疑是比较麻烦的,那么有没有一种可以自动发现的实现方案呢。本篇文章我们讲解如何自动发现监控目标
通过Promethues控制台查看:
scrape_configs:
- job_name: serviceMonitor/monitoring/alertmanager/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
follow_redirects: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_alertmanager]
separator: ;
regex: main
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_component]
separator: ;
regex: alert-router
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
separator: ;
regex: alertmanager
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_part_of]
separator: ;
regex: kube-prometheus
replacement: $1
action: keep
注:仅截取了部分文件内容
__meta_:在重新标记阶段可以使用以 _meta_ 为前缀的附加标签。它们由提供目标的服务发现机制设置的,并因机制而异。
__:目标重新标记完成后,以 __ 开头的标签将从标签集中删除。
__tmp:如果重新标记步骤仅需要临时存储标签值(作为后续重新标记步骤的输入),请使用这个标签名称前缀。这个前缀保证永远不会被 Prometheus 本身使用。
{2} 确定写入的内容。如果没匹配到任何内容则不对 target_label 进行替换, 默认为 replace
本次案例自动发现使用kubernetes_sd_configs,也就是在kubernetes中的自动发现。
更多自动发现可以参考:https://prometheus.io/docs/prometheus/latest/configuration/configuration/#configuration-file
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-50.57 Ready control-plane,master 77d v1.20.5
k8s-node-50.58 Ready <none> 77d v1.20.5
k8s-node-50.59 Ready <none> 77d v1.20.5
# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 21h
alertmanager-main-1 2/2 Running 0 21h
alertmanager-main-2 2/2 Running 0 21h
blackbox-exporter-55c457d5fb-5m7ql 3/3 Running 0 21h
grafana-9df57cdc4-gpzsq 1/1 Running 0 21h
kube-state-metrics-56dbb74497-gpkn9 3/3 Running 0 21h
node-exporter-4wl6d 2/2 Running 0 21h
node-exporter-b4595 2/2 Running 0 21h
node-exporter-g4l99 2/2 Running 0 21h
prometheus-adapter-59df95d9f5-tnt4w 1/1 Running 0 21h
prometheus-adapter-59df95d9f5-xhz5v 1/1 Running 0 21h
prometheus-k8s-0 2/2 Running 1 21h
prometheus-k8s-1 2/2 Running 1 21h
prometheus-operator-c46b8b7c9-mg9cv 2/2 Running 0 21h
# cat kubesre-com.yaml
- job_name: "kubesre-com"
kubernetes_sd_configs: # 指定k8s服务发现的配置
- role: endpoints # 使用endpoints角色进行服务发现
relabel_configs: # 指标采集之前或采集过程中去重新配置
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] # 源标签名称
action: keep # 保留具有 prometheus.io/scrape=true 这个注解的Service
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels:
[__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+) # RE2 正则规则,+是一次多多次,?是0次或1次,其中?:表示非匹配组(意思就是不获取匹配结果)
replacement: $1:$2
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
replacement: $1
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod
- source_labels: [__meta_kubernetes_node_name]
action: replace
target_label: kubernetes_node
# kubectl create secret generic kubesre-com-secret --from-file=kubesre-com.yaml -n monitoring
secret/kubesre-com-secret created
# kubectl get secret kubesre-com-secret -n monitoring -o yaml
apiVersion: v1
data:
kubesre-com.yaml: LSBqb2JfbmFtZTogImt1YmVzcmUuY29tIiAKICBrdWJlcm5ldGVzX3NkX2NvbmZpZ3M6ICAgIyDmjIflrpprOHPmnI3liqHlj5HnjrDnmoTphY3nva4KICAgIC0gcm9sZTogZW5kcG9pbnRzICAgIyDkvb/nlKhlbmRwb2ludHPop5LoibLov5vooYzmnI3liqHlj5HnjrAKICByZWxhYmVsX2NvbmZpZ3M6ICMg5oyH5qCH6YeH6ZuG5LmL5YmN5oiW6YeH6ZuG6L+H56iL5Lit5Y676YeN5paw6YWN572uCiAgICAtIHNvdXJjZV9sYWJlbHM6IFtfX21ldGFfa3ViZXJuZXRlc19zZXJ2aWNlX2Fubm90YXRpb25fcHJvbWV0aGV1c19pb19zY3JhcGVdICAgIyDmupDmoIfnrb7lkI3np7AKICAgICAgYWN0aW9uOiBrZWVwICMg5L+d55WZ5YW35pyJIHByb21ldGhldXMuaW8vc2NyYXBlPXRydWUg6L+Z5Liq5rOo6Kej55qEU2VydmljZQogICAgICByZWdleDogdHJ1ZQogICAgLSBzb3VyY2VfbGFiZWxzOiBbX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9hbm5vdGF0aW9uX3Byb21ldGhldXNfaW9fcGF0aF0KICAgICAgYWN0aW9uOiByZXBsYWNlCiAgICAgIHRhcmdldF9sYWJlbDogX19tZXRyaWNzX3BhdGhfXwogICAgICByZWdleDogKC4rKQogICAgLSBzb3VyY2VfbGFiZWxzOgogICAgICAgIFtfX2FkZHJlc3NfXywgX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9hbm5vdGF0aW9uX3Byb21ldGhldXNfaW9fcG9ydF0KICAgICAgYWN0aW9uOiByZXBsYWNlCiAgICAgIHRhcmdldF9sYWJlbDogX19hZGRyZXNzX18KICAgICAgcmVnZXg6IChbXjpdKykoPzo6XGQrKT87KFxkKykgIyBSRTIg5q2j5YiZ6KeE5YiZ77yMK+aYr+S4gOasoeWkmuWkmuasoe+8jD/mmK8w5qyh5oiWMeasoe+8jOWFtuS4rT866KGo56S66Z2e5Yy56YWN57uEKOaEj+aAneWwseaYr+S4jeiOt+WPluWMuemFjee7k+aenCkKICAgICAgcmVwbGFjZW1lbnQ6ICQxOiQyCiAgICAtIHNvdXJjZV9sYWJlbHM6IFtfX21ldGFfa3ViZXJuZXRlc19zZXJ2aWNlX2Fubm90YXRpb25fcHJvbWV0aGV1c19pb19zY2hlbWVdCiAgICAgIGFjdGlvbjogcmVwbGFjZQogICAgICB0YXJnZXRfbGFiZWw6IF9fc2NoZW1lX18KICAgICAgcmVnZXg6IChodHRwcz8pCiAgICAtIGFjdGlvbjogbGFiZWxtYXAKICAgICAgcmVnZXg6IF9fbWV0YV9rdWJlcm5ldGVzX3NlcnZpY2VfbGFiZWxfKC4rKQogICAgICByZXBsYWNlbWVudDogJDEKICAgIC0gc291cmNlX2xhYmVsczogW19fbWV0YV9rdWJlcm5ldGVzX25hbWVzcGFjZV0KICAgICAgYWN0aW9uOiByZXBsYWNlCiAgICAgIHRhcmdldF9sYWJlbDoga3ViZXJuZXRlc19uYW1lc3BhY2UKICAgIC0gc291cmNlX2xhYmVsczogW19fbWV0YV9rdWJlcm5ldGVzX3NlcnZpY2VfbmFtZV0KICAgICAgYWN0aW9uOiByZXBsYWNlCiAgICAgIHRhcmdldF9sYWJlbDoga3ViZXJuZXRlc19zZXJ2aWNlCiAgICAtIHNvdXJjZV9sYWJlbHM6IFtfX21ldGFfa3ViZXJuZXRlc19wb2RfbmFtZV0KICAgICAgYWN0aW9uOiByZXBsYWNlCiAgICAgIHRhcmdldF9sYWJlbDoga3ViZXJuZXRlc19wb2QKICAgIC0gc291cmNlX2xhYmVsczogW19fbWV0YV9rdWJlcm5ldGVzX25vZGVfbmFtZV0KICAgICAgYWN0aW9uOiByZXBsYWNlCiAgICAgIHRhcmdldF9sYWJlbDoga3ViZXJuZXRlc19ub2RlCg==
kind: Secret
metadata:
creationTimestamp: "2023-08-13T11:50:22Z"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:data:
.: {}
f:kubesre-com.yaml: {}
f:type: {}
manager: kubectl-create
operation: Update
time: "2023-08-13T11:50:22Z"
name: kubesre-com-secret
namespace: monitoring
resourceVersion: "16541714"
selfLink: /api/v1/namespaces/monitoring/secrets/kubesre-com-secret
uid: 3eecedbb-5774-4434-953b-d6e89887c96f
type: Opaque
# cat prometheus-prometheus.yaml
### 省略内容
additionalScrapeConfigs:
name: kubesre-com-secret
key: kubesre-com.yaml
# kubectl apply -f prometheus-prometheus.yaml
注:如果没有该target,需要查看一个日志
kubectl logs -f prometheus-k8s-0 prometheus -n monitoring
,大多数情况是因为权限的问题,在上篇文章中我们已经修改了prometheus-clusterRole.yaml
。具体修改内容可以参考上篇文章
# cat mysql-sd.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql-exporter
namespace: kube-ops
labels:
k8s-app: mysql-exporter
spec:
replicas: 1
selector:
matchLabels:
k8s-app: mysql-exporter
strategy:
rollingUpdate:
maxSurge: 70%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
k8s-app: mysql-exporter
spec:
terminationGracePeriodSeconds: 60
containers:
- name: mysql-exporter
image: prom/mysqld-exporter
imagePullPolicy: IfNotPresent
env:
- name: DATA_SOURCE_NAME
value: 'root:aMIZi9Ydh2GRKe@(192.168.70.204:3306)/' # user:password@(hostname:3306)/
readinessProbe:
httpGet:
port: 9104
path: /health
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 30
failureThreshold: 10
livenessProbe:
httpGet:
port: 9104
path: /health
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 1
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 512Mi
---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/path: /metrics # 指标路径,默认 /metrics
prometheus.io/port: "9104" # 暴露指标的端口
prometheus.io/scrape: "true" # 开启
name: mysql-exporter
namespace: kube-ops
labels:
k8s-app: mysql-exporter
spec:
selector:
k8s-app: mysql-exporter
ports:
- name: mysql-exporter
port: 9104
protocol: TCP
type: ClusterIP
# kubectl apply -f mysql-sd.yaml
此案例是上一篇的mysql_exporter,上篇使用的
ServiceMonitor
,此次使用注解的方式,使其可被kubernetes自动发现
本此文章主要讲解了在kubernetes中如何自动发现并纳入监控中,以及一些标签和操作标签动作。下期内容:Prometheus告警