前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >5.Prometheus监控入门之企业监控实战采集展示

5.Prometheus监控入门之企业监控实战采集展示

作者头像
全栈工程师修炼指南
发布2022-09-29 19:15:29
8180
发布2022-09-29 19:15:29
举报
文章被收录于专栏:全栈工程师修炼之路

[TOC]

0x00 前言简述及环境准备

描述: 本章主要讲解和实践Prometheus在企业中的应用场景的复现,采用了docker-compose的资源清单进行快速构建prometheus_server、prometheus_pushgateway、prometheus_alertmanager、grafana等环境。

主要实现目标(功能):

  • 0) 实现Windows主机的监控和展示
  • 1) 实现MySQL与Redis数据库的监控和展示
  • 2) 实现外部kubernetes集群的监控和展示

主机说明:

代码语言:javascript
复制
# Kubernetes cluster 0: weiyigeek-lb-vip.k8s (正式环境)
192.168.12.107 - master
192.168.12.108 - master
192.168.12.109 - master
192.168.12.223 - work
192.168.12.224 - work
192.168.12.225 - work

# Kubernetes cluster 1: k8s-test.weiyigeek (测试环境-单master节点)
192.168.12.111

# Kubernetes cluster 2: 192.168.12.226(开发环境-单master节点)
192.168.12.226

环境说明 描述: 上述环境中都安装了docker运行环境,并在192.168.12.107主机中安装了docker-compose软件,下面进行的配置循序渐进的进行添加。

代码语言:javascript
复制
# 下面表示在主机下进行基础环境的安装部署的组件(node_export与cAdivsor的安装配置参考“1.Prometheus(普罗米修斯)容器集群监控入门.md”,此处不在重新累述)
192.168.12.107
- prometheus_server: 30090
- prometheus_pushgateway: 30091
- prometheus_alertmanager: 30093
- grafana: 9091

192.168.12.108~109
192.168.12.223~225
- node_exporter: 9091

192.168.12.111
- cAdivsor: 9100

# 此处暂时不进行配置后续利用prometheus监控第三方k8s集群时使用
192.168.12.226
- kubernetes Api Server: 6443

目录结构一览:

代码语言:javascript
复制
$ tree -L 5
.
├── docker-compose.yml # 资源清单
├── grafana            # grafana UI 展示: 针对于数据持久化(插件、dashboard等)
│   └── data
└── prometheus         # Prometheus 监控相关配置以及数据持久化目录   
    ├── conf  
    │   ├── alertmanager.yaml  # 报警发送器配置
    │   ├── conf.d            
    │   │   ├── discovery      # 自动化发现相关配置文件
    │   │   │   └── k8s_nodes.yaml
    │   │   ├── rules          # 报警规则
    │   │   │   └── alert.rules
    │   │   └── auth            # k8s 以及 相关认证使用
    │   │       ├── k8s_client.crt
    │   │       ├── k8s_client.key
    │   │       └── k8s_token
    │   └── prometheus.yml
    └── data

环境快速准备

0.目录结构快速生成

代码语言:javascript
复制
mkdir -vp /nfsdisk-31/monitor/prometheus/conf/conf.d/{discovery,rules,auth}
mkdir -vp /nfsdisk-31/monitor/prometheus/data
mkdir -vp /nfsdisk-31/monitor/grafana/date

1.prometheus.yaml 主配置文件:

代码语言:javascript
复制
tee prometheus.yaml <<'EOF'
global:
  scrape_interval: 2m
  scrape_timeout: 10s
  evaluation_interval: 1m
  external_labels:
    monitor: 'prom-demo'
scrape_configs:
  - job_name: 'prom-Server'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'cAdvisor'
    static_configs:
      - targets: ['192.168.12.111:9100']
  - job_name: 'prom-Host'
    file_sd_configs:
    - files:
      - /etc/prometheus/conf.d/discovery/k8s_nodes.yaml
      refresh_interval: 1m

rule_files:
  - /etc/prometheus/conf.d/rules/*.rules

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - '192.168.12.107:30093'
EOF

2.alert.rules 配置文件:

代码语言:javascript
复制
tee alert.rules <<'EOF'
groups:
- name: node-normal
  rules:
  - alert: service_down
    expr: up == 0
    for: 2m
    labels:
      severity: 1
      team: node
    annotations:
      summary: "主机 {{ $labels.instance }} 监控服务已停止运行超过 15s!"
  - alert: high_load
    expr: node_load1 > 0.7
    for: 5m
    labels:
      severity: 1
      team: node
    annotations:
      summary: "主机 {{ $labels.instance }} 高负载大于0.7以上运行超过 5m!"
EOF

3.k8s_nodes.yaml 自动发现file_sd_configs配置文件。

代码语言:javascript
复制
tee k8s_nodes.yaml <<'EOF'
- targets: [ '192.168.12.107:9100','192.168.12.108:9100','192.168.12.109:9100' ]
  labels: {'env': 'prod','cluster': 'weiyigeek-lb-vip.k8s','nodeType': 'master'}
- targets: [ '192.168.12.223:9100','192.168.12.224:9100','192.168.12.225:9100' ]
  labels: {'env': 'prod','cluster': 'weiyigeek-lb-vip.k8s','nodeType': 'work'}
EOF

4.alertmanager.yaml 邮箱报警发送配置

代码语言:javascript
复制
tee alertmanager.yaml <<'EOF'
global:
  resolve_timeout: 5m
  smtp_from: 'monitor@weiyigeek.top'
  smtp_smarthost: 'smtp.exmail.qq.com:465'
  smtp_auth_username: 'monitor@weiyigeek.top'
  smtp_auth_password: xxxxxxxxxxx'
  smtp_require_tls: false
  # smtp_hello: 'qq.com'
route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 1m
  repeat_interval: 10m
  receiver: 'default-email'
receivers:
- name: 'default-email'
  email_configs:
  - to: 'master@weiyigeek.top'
    send_resolved: true
# inhibit_rules:
#   - source_match:
#       severity: 'critical'
#     target_match:
#       severity: 'warning'
#     equal: ['alertname', 'instance']
EOF
# Tips : 可以采用amtool工具校验该yml文件是否无误`./amtool check-config alertmanager.yml`

5.docker-compose.yml 资源清单内容:

代码语言:javascript
复制
# Desc:  prometheus / pushgateway / alertmanager / grafana 环境搭建
# author: WeiyiGeek
# email: master@weiyigeek.top

# 创建一个名称为monitor的桥接网络
$ docker network create monitor --driver bridge

tee docker-compose.yml <<'EOF'
version: '3.2'
services:
  prometheus:
    image: prom/prometheus:v2.26.0
    container_name: prometheus_server
    environment:
      TZ: Asia/Shanghai
    volumes:
      - /nfsdisk-31/monitor/prometheus/conf/prometheus.yaml:/etc/prometheus/prometheus.yaml
      - /nfsdisk-31/monitor/prometheus/conf/conf.d:/etc/prometheus/conf.d
      - /nfsdisk-31/monitor/prometheus/data:/prometheus/data
      - /etc/localtime:/etc/localtime
    command:
      - '--config.file=/etc/prometheus/prometheus.yaml'
      - '--storage.tsdb.path=/prometheus/data'
      - '--web.enable-admin-api'
      - '--web.enable-lifecycle'
    ports:
      - '30090:9090'
    restart: always
    networks:
      - monitor

  pushgateway:
    image: prom/pushgateway
    container_name: prometheus_pushgateway
    environment:
      TZ: Asia/Shanghai
    volumes:
      - /etc/localtime:/etc/localtime
    ports:
      - '30091:9091'
    restart: always
    networks:
      - monitor

  alertmanager:
    image: prom/alertmanager:v0.21.0
    container_name: prometheus_alertmanager
    environment:
      TZ: Asia/Shanghai
    volumes:
      - /nfsdisk-31/monitor/prometheus/conf/alertmanager.yaml:/etc/alertmanager.yaml
      - /etc/localtime:/etc/localtime
      # - /nfsdisk-31/monitor/prometheus/alertmanager:/alertmanager
    command:
      - '--config.file=/etc/alertmanager.yaml'
      - '--storage.path=/alertmanager'
    ports:
      - '30093:9093'
    restart: always
    networks:
      - monitor

  grafana:
    image: grafana/grafana:7.5.5
    container_name: grafana
    user: "472"
    environment:
      - TZ=Asia/Shanghai
      - GF_SECURITY_ADMIN_PASSWORD=weiyigeek
    volumes:
      - /nfsdisk-31/monitor/grafana/data:/var/lib/grafana
      - /etc/localtime:/etc/localtime
    ports:
      - '30000:3000'
    restart: always
    networks:
      - monitor
    dns:
      - 223.6.6.6
      - 192.168.12.254

networks:
  monitor:
    external: true
EOF

# 验证配置
docker-compose config 

# 创建和后台启动容器
docker-compose up -d

6.环境验证: 访问搭建的prometheus server 服务地址http://192.168.12.107:30090/service-discovery进行查询以及监控节点的查看。

WeiyiGeek.基础环境验证
WeiyiGeek.基础环境验证

WeiyiGeek.基础环境验证

0x01 实现Windows主机的监控和展示

描述: 我们采用 Prometheus 监控进行 Windows 机器,我们也要像在 node_exporter 二进制可执行软件安装运行在Linux系统上, 在Windows系统上安装 windows_exporter 操作流程如下:

Step 1.下载安装windows_exporter可执行软件其releases地址 我们可以选择exe或者msi安装方式;

代码语言:javascript
复制
# exe与msi下载
windows_exporter-0.16.0-amd64.exe
windows_exporter-0.16.0-amd64.msi

# msi - 安装执行调用示例:
msiexec /i <path-to-msi-file> ENABLED_COLLECTORS=os,iis LISTEN_PORT=5000
# 带有自定义查询的示例服务收集器
msiexec /i <path-to-msi-file> ENABLED_COLLECTORS=os,service --% EXTRA_FLAGS="--collector.service.services-where ""Name LIKE 'sql%'"""
# 在某些旧版本的Windows上,可能需要用双引号将参数值括起来,以正确解析install命令:
msiexec /i C:\Users\Administrator\Downloads\windows_exporter.msi ENABLED_COLLECTORS="ad,iis,logon,memory,process,tcp,thermalzone" TEXTFILE_DIR="C:\custom_metrics\"
代码语言:javascript
复制
# exe - Examples
# 仅启用service collector并指定自定义查询
.\windows_exporter.exe --collectors.enabled "service" --collector.service.services-where "Name='windows_exporter'"
# 仅启用process collector并指定自定义查询
.\windows_exporter.exe --collectors.enabled "process" --collector.process.whitelist="firefox.+"
# 将[defaults]与--collectors.enabled参数一起使用,该参数将与所有默认收集器一起展开。
.\windows_exporter.exe --collectors.enabled "[defaults],process,container"

Step 2.使用配置文件config.yml运行后将在9182端口启用监听我们可以访问该http://127.0.0.1:9182/metricsurl查看metrics.

代码语言:javascript
复制
.\windows_exporter-0.16.0-amd64.exe --config.file=config.yml

# config.yml
# 默认启用 Collectors 收集器以及额外添加的收集器
collectors:
  enabled: cpu,cs,logical_disk,net,os,system,service,logon,process,tcp	
collector:
  service:
    services-where: Name='windows_exporter'
log:
  level: debug
scrape:
  timeout-margin: 0.5
telemetry:
  addr: ":9182"
  path: /metrics
  max-requests: 5

# 防火墙规则调整(指定远程连接的地址以及本地开放端口)
New-NetFirewallRule -Name prom-windows_exporter -Direction Inbound -DisplayName 'windows_exporter' -RemoteAddress 192.168.12.107 -LocalPort 9182 -Protocol 'TCP'
  # Name                  : prom-windows_exporter
  # DisplayName           : windows_exporter
  # Description           :
  # DisplayGroup          :
  # Group                 :
  # Enabled               : True
  # Profile               : Any
  # Platform              : {}
  # Direction             : Inbound
  # Action                : Allow
  # EdgeTraversalPolicy   : Block
  # LooseSourceMapping    : False
  # LocalOnlyMapping      : False
  # Owner                 :
  # PrimaryStatus         : OK
  # Status                : 已从存储区成功分析规则。 (65536)
  # EnforcementStatus     : NotApplicable
  # PolicyStoreSource     : PersistentStore
  # PolicyStoreSourceType : Local
# New-NetFirewallRule -Name powershell-remote-udp -Direction Inbound -DisplayName 'PowerShell远程连接 UDP' -LocalPort 9182 -Protocol 'UDP'  # http 协议走的是tcp所以不用添加UDP协议的防火墙策略
WeiyiGeek.windows-metrics
WeiyiGeek.windows-metrics

WeiyiGeek.windows-metrics

Step 3.添加到prometheus.yaml主配置文件之中进行重新加载配置即可发现该机器,如下图所示。

代码语言:javascript
复制
# prometheus.yaml
scrape_configs:
- job_name: 'windows-exporter'
    file_sd_configs:
    - files:
      - /etc/prometheus/conf.d/discovery/win_nodes.yaml
      refresh_interval: 1m


# vi win_nodes.yaml
- targets: [ '192.168.12.240:9182' ]
  labels: {'env': 'temp','osType': 'windows','nodeType': 'master'}

# PromQL
windows_os_info or windows_exporter_build_info{instance='192.168.12.240:9182'} or windows_logical_disk_free_bytes{volume="C:"} / (1024^3) or windows_net_current_bandwidth
WeiyiGeek.windows_exporter_promQL
WeiyiGeek.windows_exporter_promQL

WeiyiGeek.windows_exporter_promQL

WeiyiGeek.Grafana_windows_export
WeiyiGeek.Grafana_windows_export

WeiyiGeek.Grafana_windows_export

  • Step 5.访问Grafana的Dashbord查看windows采集的数据展示验证
WeiyiGeek.
WeiyiGeek.

WeiyiGeek.

0x02 实现MySQL与Redis数据库的监控和展示

描述: 我们可以针对于MySQL以及Redis进行数据库的监控配置利用到的软件是mysql_exporter(https://github.com/prometheus/mysqld_exporter)和`redis_exporter`(https://github.com/oliver006/redis_exporter/)。

Step 1.准备测试的MySQL与Redis的数据库然后利用docker容器进行监控指标的采集;

代码语言:javascript
复制
# - mysqld-exporter : export DATA_SOURCE_NAME='user:password@(hostname:3306)/'
# (1) 在执行的MySQL数据库中添加监控用户
CREATE USER 'exporter'@'%' IDENTIFIED BY 'XXXXXXXX' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';
# (2) 运行 prom/mysqld-exporter 容器
docker run -d -p 9104:9104 --name mysqld-exporter -e DATA_SOURCE_NAME="exporter:XXXXXXXX@(192.168.12.185:3306)/" prom/mysqld-exporter

# - redis_exporter:  Supports Redis 2.x, 3.x, 4.x, 5.x, and 6.x
# Redis instance addresses can be tcp addresses: redis://localhost:6379, redis.example.com:6379 or e.g. unix sockets: unix:///tmp/redis.sock.
docker run -d  --name redis_exporter --network host -e REDIS_ADDR="redis://192.168.12.1doc85:6379"  -e REDIS_PASSWORD="weiyigeek.top" oliver006/redis_exporter # -p 9121:9121

# 查看部署的exporter
$ docker ps
CONTAINER ID        IMAGE                      COMMAND                  CREATED             STATUS              PORTS                    NAMES
c3a7a5663143        oliver006/redis_exporter   "/redis_exporter"        9 minutes ago       Up 9 minutes                                 redis_exporter
0a3d557bf36b        prom/mysqld-exporter       "/bin/mysqld_exporter"   16 minutes ago      Up 16 minutes       0.0.0.0:9104->9104/tcp   mysqld-exporter

Step 2.分别访问mysqld-exporter和redis_exporter的metrics的URL

代码语言:javascript
复制
$ curl -s http://192.168.12.111:9104/metrics | tail -n -5
  # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
  # TYPE promhttp_metric_handler_requests_total counter
  promhttp_metric_handler_requests_total{code="200"} 2
  promhttp_metric_handler_requests_total{code="500"} 2
  promhttp_metric_handler_requests_total{code="503"} 0

$ curl -s http://192.168.12.111:9121/metrics | tail -n -5
  # TYPE redis_up gauge
  redis_up 1
  # HELP redis_uptime_in_seconds uptime_in_seconds metric
  # TYPE redis_uptime_in_seconds gauge
  redis_uptime_in_seconds 1.281979e+06

Step 3.prometheus.yaml 主配置文件修改和添加;

代码语言:javascript
复制
scrape_configs:
  - job_name: 'mysql_discovery'
    file_sd_configs:
    - files:
      - /etc/prometheus/conf.d/discovery/mysql_discovery.yaml
      refresh_interval: 1m
  - job_name: 'redis_discovery'
    file_sd_configs:
    - files:
      - /etc/prometheus/conf.d/discovery/redis_discovery.yaml
      refresh_interval: 1m

# vi mysql_discovery.yaml
- targets: [ '192.168.12.111:9104' ]
  labels: {'env': 'test','osType': 'container','nodeType': 'database'}

# vi redis_discovery.yaml
- targets: [ '192.168.12.111:9121' ]
  labels: {'env': 'test','osType': 'container','nodeType': 'database'}

Step 4.热加载prometheus.yaml配置或者重新启动prometheus容器验证monitor目标执行PromQL表达式: redis_instance_info or mysql_version_info

WeiyiGeek.redis&mysql_exporter
WeiyiGeek.redis&mysql_exporter

WeiyiGeek.redis&mysql_exporter

WeiyiGeek.Redis-Dashbord
WeiyiGeek.Redis-Dashbord
WeiyiGeek.MySQL-Dashbord
WeiyiGeek.MySQL-Dashbord

WeiyiGeek.MySQL-Dashbord


0x03 实现Jenkins持续集成和交付的服务监控和展示

目标: 使用Prometheus对持续集成Jenkins进行监控,并通过Grafana展示监控数据。

  • Step 1.安装Prometheus metrics插件(Prometheus metrics: Expose Jenkins metrics in prometheus format)
  • Step 2.在系统管理-> 系统配置->配置Prometheus插件,主要填写Path地址和url路径namespace最后应用保存即可。
WeiyiGeek.Prometheus metrics
WeiyiGeek.Prometheus metrics

WeiyiGeek.Prometheus metrics

  • Step 3.测试验证Prometheus插件运行情况,即访问http://yourjenkinserver:port/prometheus
WeiyiGeek.reuqets-Prometheus
WeiyiGeek.reuqets-Prometheus

WeiyiGeek.reuqets-Prometheus

Step 4.在我们的Prometheus服务端将该端点地址添加到prometheus.yml之中。

代码语言:javascript
复制
- job_name: 'jenkins'
    metrics_path: '/prometheus/'
    scheme: 'http'
    bearer_token: 'bearer_token'
    static_configs:
      - targets: ['192.168.12.107:30001']

Step 5.重载prometheus.yml配置(使用方式必须是在有--web.enable-lifecycle启动参数为真的情况下),然后验证监控项:devops_jenkins_executors_available

代码语言:javascript
复制
# 使修改后的配置生效(在也不用重启容器了)
curl -X POST http://192.168.12.107:30090/-/reload

# 查看当前配置
curl http://192.168.12.107:30090/api/v1/status/config
WeiyiGeek.jenkins-prometheus
WeiyiGeek.jenkins-prometheus

WeiyiGeek.jenkins-prometheus

WeiyiGeek.a Jenkins performance and health overview
WeiyiGeek.a Jenkins performance and health overview

WeiyiGeek.a Jenkins performance and health overview


0x04 实现kubernetes外部集群的监控和展示

描述: 我们知道学习测试 Prometheus 一般将其安装在k8s集群中进行数据metrics的采集,但在实际的环境中企业大多选择将 ometheus 单独部署在集群外部进行监控某一集群,如果有多套集群时使用不同的 Prometheus 实例监控不同的集群,然后用联邦的方式进行汇总。 其次由于我们学习环境的原因,本章将使用 Prometheus 监控外部的 Kubernetes 集群进行配置讲解(在kubernetes集群中即可参照下面某些方式进行配置)

Q: Prometheus 如何采集Kubernetes集群数据?

答: 如果我们对集群内部的 Prometheus 自动发现 Kubernetes 的数据比较熟悉的话,那么监控外部集群的原理也是一样的,只是访问 APIServer 的形式有 inCluster 模式变成了 KubeConfig 的模式,inCluster 模式下在 Pod 中就已经自动注入了访问集群的 token 和 ca.crt 文件,所以非常方便,那么在集群外的话就需要我们手动提供这两个文件,才能够做到自动发现了。

Q: Prometheus通过exporter收集各种维度的监控指标

答: Prometheus 通过 kubernetes_sd_configs 从 Kubernetes 的 REST API 查找并拉取指标,并始终与集群状态保持同步,使用endpoints,service,node,pod,ingress等角色进行自动发现

  • endpoints : 自动发现service中的endpoint
  • node: 自动发现每个集群节点发现一个target,其地址默认为Kubelet的HTTP端口如"https://192.168.3.217:10250/metrics"
  • service : 自动发现每个服务的每个服务端口的target
  • pod : 自动发现所有容器及端口
  • ingrsss : 自动发现ingress中path

Q: 可以通过哪几种方式维度收集监控指标?

维度 | 工具 | 监控url(__metrics_path__) |备注| |:—:|:—:|:—|—-| |Node性能 | node-exporter | /api/v1/nodes/node名:9100/proxy/metrics | 节点状态 | |Pod性能 | kubelet cadvisor | /api/v1/nodes/node名:10250/proxy/metrics /api/v1/nodes/node名:10250/proxy/metrics/cadvisor|容器状态| |K8S资源| kube-state-metrics|__scheme__://__address____metrics_path__|Deploy/ds等|

Tips : 注意kube-state-metrics监控的URL的动态发现是基于标签的自动补全,其中标签的值都可以通过Prometheus的relabel_config拼接成最终的监控url,由于集群外部署Prometheus和集群内部署Prometheus是不一样的,因此我们可以通过proxy url集群外Prometheus就可以访问监控url来拉取监控指标。

Q: 如何构造Apiserver proxy url? 描述: 在k8s集群中nodes、pods、services都有自己的私有IP,但是无法从集群外访问;但K8S提供以下几种方式来访问:1.通过public IPs访问service , 2.通过proxy 访问node、pod、service, 3.通过集群内的node或pod间接访问

例如: 通过kubectl cluster-info命令可以查看kube-system命令空间的proxy url

代码语言:javascript
复制
$ kubectl cluster-info
Kubernetes master is running at https://k8s-dev.weiyigeek:6443
KubeDNS is running at https://k8s-dev.weiyigeek:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

所以其默认的构造规则为如下格式。

代码语言:javascript
复制
set other_apiserver_address = k8s-dev.weiyigeek:6443
# 访问node
https://${other_apiserver_address}/api/v1/nodes/node_name:[port_name]/proxy/metrics
# 访问service
https://${other_apiserver_address}/api/v1/namespaces/service_namespace/services/http:service_name[:port_name]/proxy/metrics
# 访问pod
https://${other_apiserver_address}/api/v1/namespaces/pod_namespace/pods/http:pod_name[:port_name]/proxy/metrics

在我们了解如何构造proxy url后,我们可以通过集群外Prometheus的relabel_config自行构造proxy url。

1.Endpoints 之服务自动发现

描述: 此处我们采用进行安装部署k8s集群监控的kube-state-metrics服务, 它监听Kubernetes API服务器并生成关联对象的指标, 它不关注单个Kubernetes组件的运行状况,而是关注内部各种对象(如deployment、node、pod等)的运行状况。

流程步骤:

Step 1.我们先查看当前kube-state-metrics兼容性矩阵与我们kubernetes集群版本的对应参考地址,下面最多记录5个kube状态度量和5个kubernetes版本。

代码语言:javascript
复制
kube-state-metrics 	Kubernetes 1.17 	Kubernetes 1.18 	Kubernetes 1.19 	Kubernetes 1.20 	Kubernetes 1.21
v1.8.0 	- 	- 	- 	- 	-
v1.9.8 	- 	- 	- 	- 	-
v2.0.0 	-/✓ 	-/✓ 	✓ 	✓ 	-/✓
master 	-/✓ 	-/✓ 	✓ 	✓ 	✓

Step 2.k8s集群 ApiServer 访问鉴权账号创建和绑定的集群角色权限配置。 描述: 在访问K8S apiserver需要先进行授权,而集群内部Prometheus可以使用集群内默认配置进行访问,而集群外访问需要使用token+客户端cert进行认证因此需要先进行RBAC授权。

此处测试由于我们需要访问不同的namespace,建议先使用分配绑定cluster-admin权限,但在生产中一定要使用最小权限原则来保证其安全性(后面会进行演示)。

代码语言:javascript
复制
# 1.创建名称空间
kubectl create ns monitor
  # namespace/monitor created

# 2.创建serviceaccounts
kubectl create sa prometheus --namespace monitor
  # serviceaccount/prometheus created

# 3.创建prometheus角色并对其绑定cluster-admin
$ kubectl create clusterrolebinding prometheus --clusterrole cluster-admin --serviceaccount=monitor:prometheus
  # clusterrolebinding.rbac.authorization.k8s.io/prometheus created

# 4.查看创建的角色对应的Token值
$ kubectl get sa
  # NAME         SECRETS   AGE
  # default      1         18d
  # prometheus   1         24s

$ kubectl get sa prometheus -n monitor -o yaml
  # apiVersion: v1
  # kind: ServiceAccount
  # metadata:
  #   creationTimestamp: "2021-05-10T06:10:28Z"
  #   name: prometheus
  #   namespace: default
  #   resourceVersion: "3596438"
  #   selfLink: /api/v1/namespaces/default/serviceaccounts/prometheus
  #   uid: af6a884d-2670-4f46-836e-d8ccf9fd0c38
  # secrets:
  # - name: prometheus-token-ft8bd  # secrets Token 关键点

# 5.一行命令搞定
kubectl get secret -n monitor $(kubectl get sa prometheus -n monitor -o yaml | tail -n 1 | cut -d " " -f 3) -o yaml | grep "token:" | head -n 1 | awk '{print $2}'|base64 -d > k8s_token

# 6.补充k8s集群相关crt和key获取
# client-certificate-data
~$ grep 'client-certificate-data' ~/.kube/config | head -n 1 | awk '{print $2}' | base64 -d > k8s_client.crt
# client-key-data
~$ grep 'client-key-data' ~/.kube/config | head -n 1 | awk '{print $2}' | base64 -d > k8s_client.key
scp -P20211 weiyigeek@weiyigeek-226:~/.kube/k8s_client.crt ./conf.d/ssl/
scp -P20211 weiyigeek@weiyigeek-226:~/.kube/k8s_client.key ./conf.d/ssl/

Step 3.参考采用官方提供的部署资源清单参考地址,此处已采用上面创建的 prometheus 用户。

代码语言:javascript
复制
tee kube-state-metrics.yaml <<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.0.0
  name: kube-state-metrics
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kube-state-metrics
        app.kubernetes.io/version: 2.0.0
    spec:
      containers:
      - image: bitnami/kube-state-metrics:2.0.0
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 8081
          name: telemetry
        readinessProbe:
          httpGet:
            path: /
            port: 8081
          initialDelaySeconds: 5
          timeoutSeconds: 5
        securityContext:
          runAsUser: 65534
      nodeSelector:
        kubernetes.io/os: linux
      serviceAccountName: prometheus
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.0.0
  name: kube-state-metrics
  namespace: kube-system
  annotations:
    # 注意: 此处需要进行添加到annotations(注释)便于prometheus进行自动发现。
    prometheus.io/scrape: 'true'
spec:
  clusterIP: None
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
  - name: telemetry
    port: 8081
    targetPort: telemetry
  selector:
    app.kubernetes.io/name: kube-state-metrics
EOF

# - 创建 monitor 名称空间
kubectl create ns monitor

sed -i "s#kube-system#monitor#g" kube-state-metrics.yaml

# - 部署 kube-state-metrics
kubectl apply -f kube-state-metrics.yaml
  # deployment.apps/kube-state-metrics created
  # service/kube-state-metrics created

Step 4.验证查看部署情况并获取认证token

代码语言:javascript
复制
$ kubectl get pod,svc,ep -n monitor --show-labels
  # NAME                                      READY   STATUS    RESTARTS   AGE     LABELS
  # pod/kube-state-metrics-777789bc9d-9n6jf   1/1     Running   0          3m30s   app.kubernetes.io/name=kube-state-metrics,app.kubernetes.io/version=2.0.0,pod-template-hash=777789bc9d

  # NAME                         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE     LABELS
  # service/kube-state-metrics   ClusterIP   None         <none>        8080/TCP,8081/TCP   3m30s   app.kubernetes.io/name=kube-state-metrics,app.kubernetes.io/version=2.0.0,prometheus.io/scrape=true

  # NAME                           ENDPOINTS                                 AGE     LABELS
  # endpoints/kube-state-metrics   172.16.182.199:8081,172.16.182.199:8080   3m30s   app.kubernetes.io/name=kube-state-metrics,app.kubernetes.io/version=2.0.0,prometheus.io/scrape=true,service.kubernetes.io/headless=

# - ca.crt 文件以及Token
# kubectl get secret -n monitor $(kubectl get sa kube-state-metrics -n monitor -o yaml | tail -n 1 | cut -d " " -f 3) -o yaml | grep "ca.crt:" | head -n 1 | awk '{print $2}' | base64 -d > k8s_ca.crt
kubectl get secret -n monitor $(kubectl get sa kube-state-metrics -n monitor -o yaml | tail -n 1 | cut -d " " -f 3) -o yaml | grep "token:" | head -n 1 | awk '{print $2}'| base64 -d > k8s_token

Step 5.将获取到的k8s_ca.crtk8s_token文件下载到prometheus主配置文件中指定的目录之中;

代码语言:javascript
复制
ansible weiyigeek-226 -m fetch -a "src=/home/weiyigeek/prometheus/k8s_ca.crt dest=/tmp"
ansible weiyigeek-226 -m fetch -a "src=/home/weiyigeek/prometheus/k8s_token dest=/tmp"
  # weiyigeek-226 | CHANGED => {
  #   "changed": true,
  #   "checksum": "22c40a4f83ad82343affbab3f8a732c14accbdcd",
  #   "dest": "/tmp/k8s_token/weiyigeek-226/home/weiyigeek/prometheus/k8s_kube-state-metrics_token",
  #   "md5sum": "c9d780a62db497bbfd995b548887e4ed",
  #   "remote_checksum": "22c40a4f83ad82343affbab3f8a732c14accbdcd",
  #   "remote_md5sum": null
  # }

Step 6.配置prometheus.yml主配置文件添加kubernetes_sd_configs对象配置endpoints角色的自动发现。

代码语言:javascript
复制
- job_name: 'k8s-endpoint-discover'
  scheme: https
  #使用apiserver授权部分解密的token值以文件形式存储
  tls_config:
    ca_file: /etc/prometheus/conf.d/auth/k8s_ca.crt
    insecure_skip_verify: true
  bearer_token_file: /etc/prometheus/conf.d/auth/k8s_token
  # k8s自动发现具体配置
  kubernetes_sd_configs:
  # 使用endpoint级别自动发现
  - role: endpoints
    api_server: 'https://192.168.12.226:6443'
    tls_config:
      ca_file: /etc/prometheus/conf.d/auth/k8s_ca.crt
      insecure_skip_verify: true
    bearer_token_file: /etc/prometheus/conf.d/auth/k8s_token
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_name]
    # 只保留指定匹配正则的标签,不匹配则删除
    action: keep
    regex: '^(kube-state-metrics)$'
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    # 只保留指定匹配正则的标签,不匹配则删除
    action: keep
    regex: true
  - source_labels: [__address__]
    action: replace
    target_label: instance
  - target_label: __address__
    # 使用replacement值替换__address__默认值
    replacement: 192.168.12.226:6443
  - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_name, __meta_kubernetes_pod_container_port_number]
    # 正则匹配
    regex: ([^;]+);([^;]+);([^;]+)
    # 使用replacement值替换__metrics_path__默认值
    target_label: __metrics_path__
    # 自行构建的apiserver proxy url
    replacement: /api/v1/namespaces/${1}/pods/http:${2}:${3}/proxy/metrics
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    # 将标签__meta_kubernetes_namespace修改为kubernetes_namespace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    # 将标签__meta_kubernetes_service_name修改为service_name
    target_label: service_name

Tips: 通过relabel_configs构造 prometheus (endpoints) Role 访问 API Server 的 URL;

标签

默认

构造后

__scheme__

https

https

__address__ | 172.16.182.200:8081 |192.168.12.226:6443| __metrics_path__ | /metrics | /api/v1/namespaces/kube-system/pods/http:kube-state-metrics-6477678b78-6qkjg:8081/proxy/metrics | |URL| https://10.244.2.10:8081/metrics | https://192.168.12.226:6443/api/v1/namespaces/kube-system/pods/http:kube-state-metrics-6477678b78-6qkjg:8081/proxy/metrics |

Step 6.修改主配置文件完成后进行重启 prometheus Server 容器然后查看启动状态,由图中可以看见监控成功。

代码语言:javascript
复制
# k8s-endpoint-discover (2/2 up)
# - Discovered Labels
__meta_kubernetes_endpoint_address_target_kind="Pod"
__meta_kubernetes_endpoint_address_target_name="kube-state-metrics-6477678b78-6qkjg"
__meta_kubernetes_endpoint_node_name="weiyigeek-226"
__meta_kubernetes_endpoint_port_name="telemetry"
__meta_kubernetes_endpoint_port_protocol="TCP"
__meta_kubernetes_endpoint_ready="true"
__meta_kubernetes_endpoints_label_app_kubernetes_io_name="kube-state-metrics"
__meta_kubernetes_endpoints_label_app_kubernetes_io_version="2.0.0"
__meta_kubernetes_endpoints_labelpresent_app_kubernetes_io_name="true"
__meta_kubernetes_endpoints_labelpresent_app_kubernetes_io_version="true"
__meta_kubernetes_endpoints_labelpresent_service_kubernetes_io_headless="true"
__meta_kubernetes_endpoints_name="kube-state-metrics"
__meta_kubernetes_namespace="monitor"
__meta_kubernetes_pod_annotation_cni_projectcalico_org_podIP="172.16.182.200/32"
__meta_kubernetes_pod_annotation_cni_projectcalico_org_podIPs="172.16.182.200/32"
__meta_kubernetes_pod_annotationpresent_cni_projectcalico_org_podIP="true"
__meta_kubernetes_pod_annotationpresent_cni_projectcalico_org_podIPs="true"
__meta_kubernetes_pod_container_name="kube-state-metrics"
__meta_kubernetes_pod_container_port_name="telemetry"
__meta_kubernetes_pod_container_port_number="8081"
__meta_kubernetes_pod_container_port_protocol="TCP"
__meta_kubernetes_pod_controller_kind="ReplicaSet"
__meta_kubernetes_pod_controller_name="kube-state-metrics-6477678b78"
__meta_kubernetes_pod_host_ip="192.168.12.226"
__meta_kubernetes_pod_ip="172.16.182.200"
__meta_kubernetes_pod_label_app_kubernetes_io_name="kube-state-metrics"
__meta_kubernetes_pod_label_app_kubernetes_io_version="2.0.0"
__meta_kubernetes_pod_label_pod_template_hash="6477678b78"
__meta_kubernetes_pod_labelpresent_app_kubernetes_io_name="true"
__meta_kubernetes_pod_labelpresent_app_kubernetes_io_version="true"
__meta_kubernetes_pod_labelpresent_pod_template_hash="true"
__meta_kubernetes_pod_name="kube-state-metrics-6477678b78-6qkjg"
__meta_kubernetes_pod_node_name="weiyigeek-226"
__meta_kubernetes_pod_phase="Running"
__meta_kubernetes_pod_ready="true"
__meta_kubernetes_pod_uid="70037554-7c4c-4372-9128-e9689b7cff10"
__meta_kubernetes_service_annotation_kubectl_kubernetes_io_last_applied_configuration="{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/scrape":"true"},"labels":{"app.kubernetes.io/name":"kube-state-metrics","app.kubernetes.io/version":"2.0.0"},"name":"kube-state-metrics","namespace":"monitor"},"spec":{"clusterIP":"None","ports":[{"name":"http-metrics","port":8080,"targetPort":"http-metrics"},{"name":"telemetry","port":8081,"targetPort":"telemetry"}],"selector":{"app.kubernetes.io/name":"kube-state-metrics"}}} "
__meta_kubernetes_service_annotation_prometheus_io_scrape="true"
__meta_kubernetes_service_annotationpresent_kubectl_kubernetes_io_last_applied_configuration="true"
__meta_kubernetes_service_annotationpresent_prometheus_io_scrape="true"
__meta_kubernetes_service_label_app_kubernetes_io_name="kube-state-metrics"
__meta_kubernetes_service_label_app_kubernetes_io_version="2.0.0"
__meta_kubernetes_service_labelpresent_app_kubernetes_io_name="true"
__meta_kubernetes_service_labelpresent_app_kubernetes_io_version="true"
__meta_kubernetes_service_name="kube-state-metrics"
__metrics_path__="/metrics"
__scheme__="https"
job="k8s-endpoint-discover"
# Target Labels
app_kubernetes_io_name="kube-state-metrics"
app_kubernetes_io_version="2.0.0"
instance="172.16.182.200:8081"
job="k8s-endpoint-discover"
kubernetes_namespace="monitor"
service_name="kube-state-metrics"

# PromQL 表达式
up{job="k8s-endpoint-discover"} or go_info{job="k8s-endpoint-discover"}
  # up{app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_version="2.0.0", instance="172.16.182.199:8080", job="k8s-endpoint-discover", kubernetes_namespace="monitor", service_name="kube-state-metrics"} 1
  # up{app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_version="2.0.0", instance="172.16.182.199:8081", job="k8s-endpoint-discover", kubernetes_namespace="monitor", service_name="kube-state-metrics"} 1
  # go_info{app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_version="2.0.0", instance="172.16.182.199:8081", job="k8s-endpoint-discover", kubernetes_namespace="monitor", service_name="kube-state-metrics", version="go1.16.3"} 1
WeiyiGeek.k8s-endpoint-discover
WeiyiGeek.k8s-endpoint-discover

WeiyiGeek.k8s-endpoint-discover

补充说明: metrics-server 和 kube-state-metrics对比

类别

metrics-server

kube-state-metrics

简单介绍

Metrics Server通过Metrics API公开核心Kubernetes度量

kube state metrics是关于从Kubernetes API对象生成度量而不需要修改,确保了kube状态度量提供的特性具有与kubernetesapi对象本身相同的稳定性。

监控对象

监控Node和Pod等CPU、内存、网络等系统指标

关注Node,Deployment,Pod,Services,Namespace等内部对象的状态

项目地址

https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/metrics-server (已丢失) https://github.com/kubernetes-sigs/metrics-server/ (建议)

https://github.com/kubernetes/kube-state-metrics

服务端口

443

8080

示例: kube-state-metrics 收集到的节点信息, 如验证指标是否采集成功请求kube-state-metrics的pod ip+8080端口出现以下页面则正常

代码语言:javascript
复制
$ kube_node_info{job="k8s-endpoint-discover"}
# kube_node_info{app_kubernetes_io_name="kube-state-metrics", app_kubernetes_io_version="2.0.0", container_runtime_version="docker://19.3.15", instance="172.16.182.200:8080", internal_ip="192.168.12.226", job="k8s-endpoint-discover", kernel_version="5.4.0-73-generic", kubelet_version="v1.19.10", kubeproxy_version="v1.19.10", kubernetes_namespace="monitor", node="weiyigeek-226", os_image="Ubuntu 20.04.2 LTS", pod_cidr="172.16.0.0/24", service_name="kube-state-metrics"}
kube-state-metrics
kube-state-metrics

kube-state-metrics

2.Node 之服务自动发现

描述: 通过node-exporter采集集群node节点的服务器层面的数据,如cpu、内存、磁盘、网络流量等,当然node-exporter可以独立部署在node节点服务器上但是每次都要进行手动配置添加监控是非常不方便。

流程步骤:

Step 1.此处将node-exporter以DaemonSet形式部署,配合Prometheus动态发现更加方便。

代码语言:javascript
复制
tee node-exporter.yaml <<'EOF'
kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: node-exporter
  namespace: monitor
  annotations:
    prometheus.io/scrape: 'true'
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
      name: node-exporter
    spec:
      containers:
      - image: prom/node-exporter:v1.1.2
        name: node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          name: node-exporter
      hostNetwork: true
      hostPID: true
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
---
kind: Service
apiVersion: v1
metadata:
  name: node-exporter
  namespace: monitor
  labels:
    app: node-exporter
  annotations:
    prometheus.io/scrape: 'true'
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: node-exporter
    port: 9100
    protocol: TCP
  selector:
    app: node-exporter
EOF

~$ kubectl apply -f node-exporter.yaml
  # daemonset.apps/node-exporter created

~$ kubectl get pod -n monitor
  # NAME                  READY   STATUS    RESTARTS   AGE
  # node-exporter-p5tbp   1/1     Running   0          20s

Step 2.创建SA账户并对其进行RBAC权限设置(最小权限原则)

代码语言:javascript
复制
$ kubectl create sa prometheus -n monitor
  # serviceaccount/prometheus created

# 集群角色 RBAC 权限申明
tee prometheus-clusterRole.yaml <<'EOF'
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
  namespace: monitor
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
    - ingresses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
EOF

# 集群角色权限
$ kubectl create -f prometheus-clusterRole.yaml
# clusterrole.rbac.authorization.k8s.io/prometheus created
# 集群角色绑定
$ kubectl create clusterrolebinding prometheus --clusterrole prometheus --serviceaccount=monitor:prometheus
# 或者一步搞定(此处是上面一步得yaml资源清单)
# apiVersion: rbac.authorization.k8s.io/v1beta1
# kind: ClusterRoleBinding
# metadata:
#   name: prometheus
# roleRef:
#   apiGroup: rbac.authorization.k8s.io
#   kind: ClusterRole
#   name: prometheus
# subjects:
# - kind: ServiceAccount
#   name: prometheus
#   namespace: monitor


# 获取认证的Token
kubectl get secret -n monitor $(kubectl get sa prometheus -n monitor -o yaml | tail -n 1 | cut -d " " -f 3) -o yaml | grep "token:" | head -n 1 | awk '{print $2}'| base64 -d > k8s_prometheuser_token

# 将k8s_prometheuser_token下载到prometheus服务器中
ansible weiyigeek-226 -m fetch -a "src=/home/weiyigeek/prometheus/k8s_prometheuser_token dest=/tmp"
# weiyigeek-226 | CHANGED => {
#   "changed": true,
#   "checksum": "d4a16cebda1b6037dcb68004d0ff4cdf4079bbc5",
#   "dest": "/tmp/weiyigeek-226/home/weiyigeek/prometheus/k8s_prometheuser_token",
#   "md5sum": "bdcd6c4a77ab6ee2afa5ac6f78ddb94a",
#   "remote_checksum": "d4a16cebda1b6037dcb68004d0ff4cdf4079bbc5",
#   "remote_md5sum": null
# }

Step 3.Prometheus.yaml 主配置文件添加kubernetes_sd_configs对象使用node级别自动发现;

代码语言:javascript
复制
- job_name: 'k8s-nodes-discover'
  scheme: https
  # 使用apiserver授权部分解密的token值,以文件形式存储
  tls_config:
    # ca_file: /etc/prometheus/conf.d/auth/k8s_kube-state-metrics_ca.crt
    insecure_skip_verify: true
  bearer_token_file: /etc/prometheus/conf.d/auth/k8s_prometheuser_token
  # k8s自动发现具体配置
  kubernetes_sd_configs:
  # 使用node级别自动发现
  - role: node
    api_server: 'https://192.168.12.226:6443'
    tls_config:
      # ca_file: /etc/prometheus/conf.d/auth/k8s_kube-state-metrics_ca.crt
      insecure_skip_verify: true
    bearer_token_file: /etc/prometheus/conf.d/auth/k8s_prometheuser_token
  relabel_configs:
  #- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    # 只保留指定匹配正则的标签,不匹配则删除
    #action: keep
    #regex: true
  - target_label: __address__
    # 使用replacement值替换__address__默认值
    replacement: 192.168.12.226:6443
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    # 使用replacement值替换__metrics_path__默认值 , 如果采用默认的kubelet进行数据的采集
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}:9100/proxy/metrics
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    # 将标签__meta_kubernetes_service_name修改为service_name
    target_label: service_name
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    # 将标签__meta_kubernetes_namespace修改为kubernetes_namespace
    target_label: kubernetes_namespace

Tips: 通过relabel_configs构造 prometheus (node) Role 访问 API Server 的 URL;

标签

默认

构造后

__scheme__

https

https

__address__

192.168.3.217:10250

192.168.3.217:6443

__metrics_path__ (node_exporter)

/metrics

/api/v1/nodes/uvmsvr-3-217:9100/proxy/metrics

URL

https://192.168.3.217:10250/metrics

https://192.168.3.217:6443/api/v1/nodes/uvmsvr-3-217:9100/proxy/metrics

__metrics_path__ (kubelet)

/metrics

/api/v1/nodes/uvmsvr-3-217:10250/proxy/metrics

URL

https://192.168.3.217:10250/metrics

https://192.168.3.217:6443/api/v1/nodes/uvmsvr-3-217:10250/proxy/metrics/cadvisor

__metrics_path__ (advisor)

/metrics

/api/v1/nodes/uvmsvr-3-217:10250/proxy/metrics

URL

https://192.168.3.217:10250/metrics

https://192.168.3.217:6443/api/v1/nodes/uvmsvr-3-217:10250/proxy/metrics/cadvisor

Step 4.重启服务查看监控目标状态以及服务发现是否成功监控。

代码语言:javascript
复制
# (1) 该Job的状态信息
k8s-nodes-discover (1/1 up)
Endpoint	State	Labels	Last Scrape	Scrape Duration	Error
https://192.168.12.226:6443/api/v1/nodes/weiyigeek-226:9100/proxy/metrics	UP instance="weiyigeek-226"job="k8s-nodes-discover"

# (2) PromQL 表达式查询
up{job="k8s-nodes-discover"} or go_info{job="k8s-nodes-discover"}
# up{instance="weiyigeek-226", job="k8s-nodes-discover"}	1
# go_info{instance="weiyigeek-226", job="k8s-nodes-discover", version="go1.15.8"} 1
WeiyiGeek.k8s-nodes-discover-9100
WeiyiGeek.k8s-nodes-discover-9100

WeiyiGeek.k8s-nodes-discover-9100

  • Step 5.此时我们可以将__metrics_path__替换成/api/v1/nodes/${1}:10250/proxy/metrics,如此便采用了kubelet采集拉取监控指标。
WeiyiGeek.k8s-nodes-discover-10250
WeiyiGeek.k8s-nodes-discover-10250

WeiyiGeek.k8s-nodes-discover-10250

3.综合实践之(cAdvisor+Kube-state-metrics+Grafana)组合拳方案

描述: Grafana从prometheus数据源读取监控指标并进行图形化,根据其官网提供的众多模板,我们可以针对不同维度的监控指标,我们可以自行选择喜欢的模板直接导入Dashboard id使用。

例如:以下针对于不同场景采用的不同的Dashboard面板:

实践目标: 使用cadvisor采集Pod容器相关信息+使用kube-state-metrics采集集群相关信息+使用Grafana将Prometheus采集到的数据进行展示。

流程步骤:

Step 1.在前面的基础环境之上修改的 Prometheus.yaml 主配置文件内容如下:

代码语言:javascript
复制
tee prometheus.yaml <<'EOF'
global:
  scrape_interval: 2m
  scrape_timeout: 10s
  evaluation_interval: 1m
  external_labels:
    monitor: 'prom-demo'

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - '192.168.12.107:30093'

rule_files:
  - /etc/prometheus/conf.d/rules/*.rules

scrape_configs:
  - job_name: 'prom-Server'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'cAdvisor'
    static_configs:
      - targets: ['192.168.12.111:9100']
  - job_name: 'linux_exporter'
    file_sd_configs:
    - files:
      - /etc/prometheus/conf.d/discovery/k8s_nodes.yaml
      refresh_interval: 1m
  - job_name: 'windows-exporter'
    file_sd_configs:
    - files:
      - /etc/prometheus/conf.d/discovery/win_nodes.yaml
      refresh_interval: 1m
  - job_name: 'mysql_discovery'
    file_sd_configs:
    - files:
      - /etc/prometheus/conf.d/discovery/mysql_discovery.yaml
  - job_name: 'redis_discovery'
    file_sd_configs:
    - files:
      - /etc/prometheus/conf.d/discovery/redis_discovery.yaml
  - job_name: 'k8s-endpoint-discover'
    scheme: https
    #使用apiserver授权部分解密的token值,以文件形式存储
    tls_config: 
      #ca_file: /etc/prometheus/conf.d/auth/k8s_kube-state-metrics_ca.crt
      insecure_skip_verify: true
    bearer_token_file: /etc/prometheus/conf.d/auth/k8s_token
    # k8s自动发现具体配置
    kubernetes_sd_configs:
    # 使用endpoint级别自动发现
    - role: endpoints
      api_server: 'https://192.168.12.226:6443'
      tls_config:
        # ca_file: /etc/prometheus/conf.d/auth/k8s_kube-state-metrics_ca.crt
        insecure_skip_verify: true
      bearer_token_file: /etc/prometheus/conf.d/auth/k8s_token
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_name]
      # 只保留指定匹配正则的标签,不匹配则删除
      action: keep
      regex: '^(kube-state-metrics)$'
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      #只保留指定匹配正则的标签,不匹配则删除
      action: keep
      regex: true
    - source_labels: [__address__]
      action: replace
      target_label: instance
    - target_label: __address__
      # 使用replacement值替换__address__默认值
      replacement: 192.168.12.226:6443
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_name, __meta_kubernetes_pod_container_port_number]
      # 正则匹配
      regex: ([^;]+);([^;]+);([^;]+)
      # 使用replacement值替换__metrics_path__默认值
      target_label: __metrics_path__
      # 自行构建的apiserver proxy url
      replacement: /api/v1/namespaces/${1}/pods/http:${2}:${3}/proxy/metrics
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      # 将标签__meta_kubernetes_namespace修改为kubernetes_namespace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      # 将标签__meta_kubernetes_service_name修改为service_name
      target_label: service_name
    
  - job_name: 'k8s-cadvisor'
    scheme: https
    # 使用apiserver授权部分解密的token值,以文件形式存储
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /etc/prometheus/conf.d/auth/k8s_token
    metrics_path: /metrics/cadvisor
    kubernetes_sd_configs:
    - role: node
      api_server: 'https://192.168.12.226:6443'
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /etc/prometheus/conf.d/auth/k8s_token
    relabel_configs:
    - source_labels: [__address__]
      action: replace
      target_label: instance
    - target_label: __address__
      # 使用replacement值替换__address__默认值
      replacement: 192.168.12.226:6443
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      # 使用replacement值替换__metrics_path__默认值 , 如果采用默认的kubelet进行数据的采集
      target_label: __metrics_path__
      replacement: /api/v1/nodes/${1}:10250/proxy/metrics/cadvisor
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    metric_relabel_configs:
    - source_labels: [instance]
      separator: ;
      regex: (.+)
      target_label: node
      replacement: $1
      action: replace
    - source_labels: [pod_name]
      separator: ;
      regex: (.+)
      target_label: pod
      replacement: $1
      action: replace
    - source_labels: [container_name]
      separator: ;
      regex: (.+)
      target_label: container
      replacement: $1
      action: replace
    - source_labels: [origin_prometheus]
      separator: ;
      regex: (.+)
      target_label: node
      replacement: $1
      action: replace
EOF

Step 2.关键配置说明由于此处我们的Prometheus是在k8s集群外部署的所以需要重新构建__metrics_path__字符串以便代理访问。

代码语言:javascript
复制
# - k8s-cAdvisor
- source_labels: [__meta_kubernetes_node_name]
  regex: (.+)
  # 使用replacement值替换__metrics_path__默认值 , 如果采用默认的kubelet进行数据的采集
  target_label: __metrics_path__
  replacement: /api/v1/nodes/${1}:10250/proxy/metrics/cadvisor

# - kube-state-metrics
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_name, __meta_kubernetes_pod_container_port_number]
  # 正则匹配
  regex: ([^;]+);([^;]+);([^;]+)
  # 使用replacement值替换__metrics_path__默认值
  target_label: __metrics_path__
  # 自行构建的apiserver proxy url
  replacement: /api/v1/namespaces/${1}/pods/http:${2}:${3}/proxy/metrics

Step 3.重启我们的Prometheus服务并验证服务发现和目标。

代码语言:javascript
复制
# - k8s-cadvisor (1/1 up) : https://192.168.12.226:6443/api/v1/nodes/weiyigeek-226:10250/proxy/metrics/cadvisor

# - k8s-endpoint-discover (2/2 up)
# https://192.168.12.226:6443/api/v1/namespaces/monitor/pods/http:kube-state-metrics-6477678b78-6qkjg:8080/proxy/metrics
# https://192.168.12.226:6443/api/v1/namespaces/monitor/pods/http:kube-state-metrics-6477678b78-6qkjg:8081/proxy/metrics
WeiyiGeek.k8s-cadvisor
WeiyiGeek.k8s-cadvisor

WeiyiGeek.k8s-cadvisor

WeiyiGeek.cadvisor+Dashboard
WeiyiGeek.cadvisor+Dashboard

WeiyiGeek.cadvisor+Dashboard

  • Step 5.至此完毕此项实践。

Tips : 通过 Dashboard 模板我们需要自行选择并组合, 灵活有余但规范不足, 我们常常使用grafana专门针对Kubernetes集群监控的插件grafana-kubernetes-app它包括4个仪表板,集群,节点,Pod /容器和部署,但由于其插件作者没有更新维护,所以更多是采用KubeGraf,该插件可以用来可视化和分析 Kubernetes 集群的性能,通过各种图形直观的展示了 Kubernetes 集群的主要服务的指标和特征,还可以用于检查应用程序的生命周期和错误日志。

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2021-04-26,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 0x00 前言简述及环境准备
  • 0x01 实现Windows主机的监控和展示
  • 0x02 实现MySQL与Redis数据库的监控和展示
  • 0x03 实现Jenkins持续集成和交付的服务监控和展示
  • 0x04 实现kubernetes外部集群的监控和展示
    • 1.Endpoints 之服务自动发现
      • 2.Node 之服务自动发现
        • 3.综合实践之(cAdvisor+Kube-state-metrics+Grafana)组合拳方案
        相关产品与服务
        容器服务
        腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
        领券
        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档