前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Prometheus_operator使用

Prometheus_operator使用

作者头像
mikelLam
发布2022-10-31 11:39:59
8520
发布2022-10-31 11:39:59
举报
文章被收录于专栏:Kubernetes 与 Devops 干货分享

Prometheus Operator 使用

安装

最新的版本官方将资源https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus迁移到了独立的 git 仓库中:https://github.com/coreos/kube-prometheus.git 克隆最新的代码:

代码语言:shell
复制
git clone https://github.com/coreos/kube-prometheus.git

进入到 manifests 目录下面,这个目录下面包含所有的资源清单文件,需要对其中的文件 prometheus-serviceMonitorKubelet.yaml 进行简单的修改: 将https-metrics改为http-metrics

创建对应的资源

代码语言:shell
复制
kubectl apply -f maifests/* 
告警配置

删除掉官方的告警secreet,配置对应的告警secret,yaml文件如下:

代码语言:yaml
复制
cat > alertmanager.yaml <<EOF
global:
  resolve_timeout: 5m
  smtp_smarthost: 'xxxxxx'
  smtp_from: 'xxxxxxxx'
  smtp_auth_username: 'xxxxxxx'
  smtp_auth_password: 'xxxxxxx'
  smtp_hello: 'xxxxx'
  smtp_require_tls: false
route:
  group_by: ['job','alertname','severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 5h
  receiver: default
  routes:
  - match:
      alertname: CPUThrottlingHigh
      receiver: default
    match_re:
      alertname: ^(severity|warring)$
      receiver: webhook
receivers:
- name: 'default'
  email_configs:
  - to: 'xxxxxxx'
    send_resolved: true
- name: 'webhook'
  webhook_configs:
  - url: 'https://oapi.dingtalk.com/robot/send?access_token=8512095dcbf2777d5521f556668a1f0c3df62737f6e244c868197f5bexxxxx'
    send_resolved: true
EOF 

生成新的告警文件

代码语言:shell
复制
cat > reset_alertmanager_config.sh <<EOF
kubectl delete secret alertmanager-main -n monitoring
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
kubectl delete -f alertmanager-alertmanager.yaml;kubectl create -f alertmanager-alertmanager.yaml
EOF

由于默认没有生成对应的kube-controller-manager和kube-scheduler的service,所以需要手动添加service

监控kube-controller-manager

添加kube-controller-manager的监控service文件:

代码语言:yaml
复制
cat > prometheus-kubeControllerManagerService.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  selector:
    component: kube-controller-manager
  ports:
  - name: http-metrics
    port: 10251
    targetPort: 10251
    protocol: TCP
EOF
监控 kube-scheduler

添加监控kube-scheduler的监控yaml文件

代码语言:yaml
复制
cat > prometheus-kubeSchedulerService.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  selector:
    component: kube-scheduler
  ports:
  - name: http-metrics
    port: 10251
    targetPort: 10251
    protocol: TCP
EOF
监控etcd

由于etcd 使用的证书都对应在节点的 /etc/kubernetes/pki/etcd 这个路径下面,所以首先我们将需要使用到的证书通过 secret 对象保存到集群中去:(在 etcd 运行的节点)

创建etcd secret

代码语言:shell
复制
kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt

然后将上面创建的 etcd-certs 对象配置到 prometheus 资源对象中,直接更新 prometheus 资源对象即可

代码语言:yaml
复制
nodeSelector:
    kubernetes.io/os: linux
  podMonitorSelector: {}
  replicas: 2
  secrets:
  - etcd-certs

创建 ServiceMonitor 现在 Prometheus 访问 etcd 集群的证书已经准备好了,接下来创建 ServiceMonitor 对象即可(prometheus-serviceMonitorEtcd.yaml)

代码语言:yaml
复制
cat > prometheus-serviceMonitorEtcd.yaml <<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: etcd-k8s
  namespace: monitoring
  labels:
    k8s-app: etcd
spec:
  jobLabel: etcd
  endpoints:
  - port: port
    interval: 30s
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/secrets/etcd-certs/ca.crt
      certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt
      keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
      insecureSkipVerify: true
  selector:
    matchLabels:
      k8s-app: etcd
  namespaceSelector:
    matchNames:
    - kube-system
EOF

创建Service

ServiceMonitor 创建完成了,但是现在还没有关联的对应的 Service 对象,所以需要我们去手动创建一个 Service 对象(prometheus-etcdService.yaml) etcd 部署到k8s 集群中

代码语言:yaml
复制
cat > prometheus-etcdService.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    k8s-app: etcd
spec:
  selector:
    component: etcd
  type: ClusterIP
  clusterIP: None
  ports:
  - name: port
    port: 2379
    protocol: TCP
EOF

etcd 部署到外部集群

代码语言:yaml
复制
cat > prometheus-etcdService.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    component: etcd
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: port
    port: 2379
    protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    k8s-app: etcd
subsets:
- addresses:
  - ip: 172.18.12.14
    nodeName: pre-etcd-master
  ports:
  - name: port
    port: 2379
    protocol: TCP
EOF

数据采集到后,可以在 grafana 中导入编号为3070的 dashboard,获取到 etcd 的监控图表

黑盒监控
blackbox-eporter 监控k8s 网络性能

创建blackbox-exporter cm

代码语言:yaml
复制
cat > prometheus-blackbox-exporter-cm.yaml <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: prometheus-blackbox-exporter
  name: prometheus-blackbox-exporter
  namespace: monitoring
data:
  blackbox.yml: |-
    modules:
      http_2xx:
        prober: https
        timeout: 10s
        http:
          valid_http_versions: ["HTTP/1.1", "HTTP/2"]    
      # 这里最好作一个返回状态码,在grafana作图时,有明示---陈刚注释。
          valid_status_codes: [200]
          method: GET
          preferred_ip_protocol: "ip4"
      http_post_2xx: # http post 监测模块
        prober: https
        timeout: 10s
        http:
          valid_http_versions: ["HTTP/1.1", "HTTP/2"]
          method: POST
          preferred_ip_protocol: "ip4"
      tcp_connect:
        prober: tcp
        timeout: 10s
      icmp:
        prober: icmp
        timeout: 10s
        icmp:
          preferred_ip_protocol: "ip4"
EOF

创建blackbox_exporter 启动文件

代码语言:yaml
复制
cat > prometheus-blackbox-exporter-deploy.yaml <<EOF
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: prometheus-blackbox-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: prometheus-blackbox-exporter
  replicas: 1
  template:
    metadata:
      labels:
        app: prometheus-blackbox-exporter
    spec:
      restartPolicy: Always
      containers:
      - name: prometheus-blackbox-exporter
        image: prom/blackbox-exporter:v0.16.0
        imagePullPolicy: IfNotPresent
        ports:
        - name: blackbox-port
          containerPort: 9115
        readinessProbe:
          tcpSocket:
            port: 9115
          initialDelaySeconds: 5
          timeoutSeconds: 5
        resources:
          requests:
            memory: 50Mi
            cpu: 100m
          limits:
            memory: 60Mi
            cpu: 200m
        volumeMounts:
        - name: config
          mountPath: /etc/blackbox_exporter
        args:
        - --config.file=/etc/blackbox_exporter/blackbox.yml
        - --log.level=debug
        - --web.listen-address=:9115
      volumes:
      - name: config
        configMap:
          name: prometheus-blackbox-exporter
      nodeSelector:
        node-role.kubernetes.io/master: "true"
      tolerations:
      - key: "node-role.kubernetes.io/master"
        effect: "NoSchedule"
EOF

创建blackbox_exporter service 文件

代码语言:yaml
复制
cat > prometheus-blackbox-exporter-svc.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  labels:
    app: prometheus-blackbox-exporter
  name: prometheus-blackbox-exporter
  namespace: monitoring
  annotations:
    prometheus.io/scrape: 'true'
spec:
  #type: NodePort
  selector:
    app: prometheus-blackbox-exporter
  ports:
  - name: blackbox-port
    port: 9115
    targetPort: 9115
    #nodePort: 30009
    protocol: TCP
EOF

创建对应的资源

代码语言:shell
复制
kubectl apply -f prometheus-blackbox-exporter-cm.yaml
kubectl apply -f prometheus-blackbox-exporter-deploy.yaml
kubectl apply -f prometheus-blackbox-exporter-svc.yaml

创建监控网络插件的serverMonitorer

代码语言:yaml
复制
cat > prometheus-serviceMonitorBlackboxExporter.yaml <<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: prometheus-blackbox-exporter
  name: prometheus-blackbox-exporter
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    port: blackbox-port
  jobLabel: app
  namespaceSelector:
    matchNames:
    - monitoring
  selector:
    matchLabels:
      app: prometheus-blackbox-exporter
EOF

cat > prometheus-serviceMonitorBlackboxExporter.yaml <<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: prometheus-blackbox-exporter
  name: prometheus-blackbox-exporter
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    path: /probe
    params:
      moudle: [http_2xx]
    scheme: http
    kubernetes_sd_config:
    - role: service
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
    - role: service
    relabelings:
    # - action: keep
    #   sourceLabels:
    #   - __meta_kubernetes_service_annotation_prometheus_io_scrape
    #   - __meta_kubernetes_service_annotation_prometheus_io_http_probe
    #   regex: true;
    - sourceLabels: 
      - __address__
      targetLabel: __param_target
    - targetLabel: __address__
      replacement: prometheus-blackbox-exporter:9115
    - sourceLabels: 
      - __param_target
      targetLabel: instance
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - sourceLabels: 
      - __meta_kubernetes_namespace
      targetLabel: kubernetes_namespace
    - sourceLabels: 
      - __meta_kubernetes_service_name
      targetLabel: kubernetes_name
    port: blackbox-port
  jobLabel: app
  namespaceSelector:
    #any: true
    matchNames:
    - monitoring
    - kube-system
  selector:
    matchLabels:
      app: prometheus-blackbox-exporter
EOF
配置集群联邦

将Prometheus Operator做为中心节点集群node节点的数据,组成联邦。配置yaml文件如下:

代码语言:yaml
复制
cat > prometheus-additional.yaml <<EOF
- job_name: 'dist-sz-pre-monitoring/kube-state-metrics'
  scrape_interval: 15s

  honor_labels: true
  metrics_path: '/federate'

  params:
    'match[]':
    - '{job="kube-state-metrics"}'
    - '{job="kubelet"}'

  static_configs:
  - targets:
    - '39.108.192.100:32501'
    labels:
      group: 'sz-pre'
EOF

由于prometheus 提供自动发现服务机制,所以只需要将联邦的配置放到自动发现配置中即可, 配置自动发现集群中的 Service,就需要在 Service 的annotation区域添加prometheus.io/scrape=true的声明,将上面文件直接保存为 prometheus-additional.yaml,然后通过这个文件创建一个对应的 Secret 对象:

代码语言:shell
复制
cat > reset_additional_secret.sh <<EOF
kubectl delete secret additional-configs -n monitoring
kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
EOF

修改prometheus-prometheus.yaml,添加自动发现配置:

代码语言:yaml
复制
cat > prometheus-prometheus.yaml <<EOF
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorSelector: {}
  replicas: 2
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.11.0
  additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml
EOF

重新应用启用配置文件,更新自动发现

代码语言:shell
复制
sh reset_additional_secret.sh
kubectl delete -f prometheus-prometheus.yaml
kubectl apply -f prometheus-prometheus.yaml

也可以将prometheus配置文件加入到addtional文件中,使prometheus-operator像prometheus配置一致,例如配置黑盒监控,配置文件如下:

代码语言:yaml
复制
- job_name: 'pre-monitoring/kube-state-metrics'
  scrape_interval: 15s

  honor_labels: true
  metrics_path: '/federate'

  params:
    'match[]':
    - '{job="kube-state-metrics"}'
    - '{job="kubelet"}'
    - '{job="apiserver"}'
    - '{job="node-exporter"}'
    - '{job="kube-scheduler"}'
    - '{job="kube-controller-manager"}'
    - '{job="etcd-k8s"}'
    - '{job="traefik-ingress"}'
    - '{job="kubernetes-services"}'
    - '{job="kubernetes-Business-redis"}'
    - '{job="kubernetes-Business-mongodb"}'
    - '{job="kubernetes-Business-psql"}'
    - '{job="kubernetes-Business-amqp"}'
    - '{job="kubernetes-Physical-node"}'

  static_configs:
  - targets:
    - '39.108.192.100:32501'
    labels:
      group: 'dist-sz-pre'

- job_name: 'kubernetes-Business-services'
  metrics_path: /probe
  params:
    module: [http_2xx]
  kubernetes_sd_configs:
  - role: service
  relabel_configs:
  #- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
  #  action: keep
  #  regex: true
  - source_labels: [__address__]
    target_label: __param_target
  - target_label: __address__
    replacement: prometheus-blackbox-exporter:9115
  - source_labels: [__param_target]
    target_label: instance
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    target_label: kubernetes_service
  - source_labels: [instance]
    regex: ".*22.*"
    action: drop
  - source_labels: [__meta_kubernetes_namespace]
    regex: "dadi-saas-member-pre"
    action: keep

- job_name: 'kubernetes-services'
  metrics_path: /probe
  params:
    module: [http_2xx]  # 使用定义的http模块
  kubernetes_sd_configs:
  - role: service  # service 类型的服务发现
  relabel_configs:
  #只有service的annotation中配置了 prometheus.io/http_probe=true 的才进行发现
  #- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_http_probe]
  #  action: keep
  #  regex: true
  - source_labels: [__address__]
    target_label: __param_target
  - target_label: __address__
    replacement: prometheus-blackbox-exporter:9115
  - source_labels: [__param_target]
    target_label: instance
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    target_label: kubernetes_name
  - source_labels: [__meta_kubernetes_service_port_name]
    target_label: kubernetes_port_name
  - source_labels: [__meta_kubernetes_namespace]
    regex: "dadi-saas-member-pre"
    action: drop

- job_name: "kubernetes-service-dns"
  metrics_path: /probe # 不是 metrics,是 probe
  params:
    module: [dns] # 使用 DNS 模块
  static_configs:
  - targets:
    - kube-dns.kube-system:53  # 不要省略端口号
  relabel_configs:
  - source_labels: [__address__]
    target_label: __param_target
  - source_labels: [__param_target]
    target_label: instance
  - target_label: __address__
    replacement: prometheus-blackbox-exporter:9115  # 服务地址,和上面的 Service 定义保持一致

- job_name: 'kubernetes-ingresses'
  metrics_path: /probe
  params:
    module: [http_2xx]  # 使用定义的http模块
  kubernetes_sd_configs:
  - role: ingress  # ingress 类型的服务发现
  relabel_configs:
  # 只有ingress的annotation中配置了 prometheus.io/http_probe=true的才进行发现
  #- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_http_probe]
  #  action: keep
  #  regex: true
  - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
    regex: (.+);(.+);(.+)
    replacement: ${1}://${2}${3}
    target_label: __param_target
  - target_label: __address__
    replacement: prometheus-blackbox-exporter:9115
  - source_labels: [__param_target]
    target_label: instance
  - action: labelmap
    regex: __meta_kubernetes_ingress_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_ingress_name]
    target_label: kubernetes_name
本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2021-02-16,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Prometheus Operator 使用
    • 安装
      • 告警配置
        • 监控kube-controller-manager
          • 监控 kube-scheduler
            • 监控etcd
              • 黑盒监控
                • blackbox-eporter 监控k8s 网络性能
                  • 配置集群联邦
                  相关产品与服务
                  Prometheus 监控服务
                  Prometheus 监控服务(TencentCloud Managed Service for Prometheus,TMP)是基于开源 Prometheus 构建的高可用、全托管的服务,与腾讯云容器服务(TKE)高度集成,兼容开源生态丰富多样的应用组件,结合腾讯云可观测平台-告警管理和 Prometheus Alertmanager 能力,为您提供免搭建的高效运维能力,减少开发及运维成本。
                  领券
                  问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档