前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >使用Prometheus Operator 监控Kubernetes

使用Prometheus Operator 监控Kubernetes

作者头像
菲宇
发布2019-06-12 15:03:19
1.1K0
发布2019-06-12 15:03:19
举报
文章被收录于专栏:菲宇菲宇

一、Prometheus概述

Prometheus是一个开源系统监测和警报工具箱。

主要特征:

  • 多维数据模型(时间序列由metri和key/value定义)
  • 灵活的查询语言
  • 不依赖分布式存储
  • 采用 http 协议,使用 pull 拉取数据
  • 可以通过push gateway进行时序列数据推送
  • 可通过服务发现或静态配置发现目标
  • 多种可视化图表及仪表盘支持

Prometheus架构如下:

Prometheus.png
Prometheus.png

Prometheus组件包括:Prometheus server、push gateway 、alertmanager、Web UI等。

Prometheus server 定期从数据源拉取数据,然后将数据持久化到磁盘。Prometheus 可以配置 rules,然后定时查询数据,当条件触发的时候,会将 alert 推送到配置的 Alertmanager。Alertmanager 收到警告的时候,可以根据配置,聚合并记录新时间序列,或者生成警报。同时还可以使用其他 API 或者 Grafana 来将收集到的数据进行可视化。

二、安装Prometheus Operator

1.Prometheus Operator简化了在 Kubernetes 上部署并管理和运行 Prometheus 和 Alertmanager 集群。

# wget https://codeload.github.com/coreos/prometheus-operator/tar.gz/v0.18.0 -O prometheus-operator-0.18.0.tar.gz
# tar -zxvf prometheus-operator-0.18.0.tar.gz
# cd prometheus-operator-0.18.0
# kubectl apply -f bundle.yaml 
clusterrolebinding "prometheus-operator" configured
clusterrole "prometheus-operator" configured
serviceaccount "prometheus-operator" created
deployment "prometheus-operator" created
# cd contrib/kube-prometheus
# hack/cluster-monitoring/deploy
namespace "monitoring" created
clusterrolebinding "prometheus-operator" created
clusterrole "prometheus-operator" created
serviceaccount "prometheus-operator" created
service "prometheus-operator" created
deployment "prometheus-operator" created
Waiting for Operator to register custom resource definitions...done!
clusterrolebinding "node-exporter" created
clusterrole "node-exporter" created
daemonset "node-exporter" created
serviceaccount "node-exporter" created
service "node-exporter" created
clusterrolebinding "kube-state-metrics" created
clusterrole "kube-state-metrics" created
deployment "kube-state-metrics" created
rolebinding "kube-state-metrics" created
role "kube-state-metrics-resizer" created
serviceaccount "kube-state-metrics" created
service "kube-state-metrics" created
secret "grafana-credentials" created
secret "grafana-credentials" created
configmap "grafana-dashboard-definitions-0" created
configmap "grafana-dashboards" created
configmap "grafana-datasources" created
deployment "grafana" created
service "grafana" created
configmap "prometheus-k8s-rules" created
serviceaccount "prometheus-k8s" created
servicemonitor "alertmanager" created
servicemonitor "kube-apiserver" created
servicemonitor "kube-controller-manager" created
servicemonitor "kube-scheduler" created
servicemonitor "kube-state-metrics" created
servicemonitor "kubelet" created
servicemonitor "node-exporter" created
servicemonitor "prometheus-operator" created
servicemonitor "prometheus" created
service "prometheus-k8s" created
prometheus "k8s" created
role "prometheus-k8s" created
role "prometheus-k8s" created
role "prometheus-k8s" created
clusterrole "prometheus-k8s" created
rolebinding "prometheus-k8s" created
rolebinding "prometheus-k8s" created
rolebinding "prometheus-k8s" created
clusterrolebinding "prometheus-k8s" created
secret "alertmanager-main" created
service "alertmanager-main" created
alertmanager "main" created 
# kubectl get pod -n monitoring
NAME                                   READY     STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2       Running   0          15h
alertmanager-main-1                    2/2       Running   0          15h
alertmanager-main-2                    2/2       Running   0          15h
grafana-567fcdf7b7-44ldd               1/1       Running   0          15h
kube-state-metrics-76b4dc5ffb-2vbh9    4/4       Running   0          15h
node-exporter-9wm8c                    2/2       Running   0          15h
node-exporter-kf6mq                    2/2       Running   0          15h
node-exporter-xtm4r                    2/2       Running   0          15h
prometheus-k8s-0                       2/2       Running   0          15h
prometheus-k8s-1                       2/2       Running   0          15h
prometheus-operator-7466f6887f-9nsk8   1/1       Running   0          15h
# kubectl -n monitoring get svc
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
alertmanager-main       NodePort    10.244.69.39     <none>        9093:30903/TCP      15h
alertmanager-operated   ClusterIP   None             <none>        9093/TCP,6783/TCP   15h
grafana                 NodePort    10.244.86.54     <none>        3000:30902/TCP      15h
kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP   15h
node-exporter           ClusterIP   None             <none>        9100/TCP            15h
prometheus-k8s          NodePort    10.244.226.104   <none>        9090:30900/TCP      15h
prometheus-operated     ClusterIP   None             <none>        9090/TCP            15h
prometheus-operator     ClusterIP   10.244.9.203     <none>        8080/TCP            15h
# kubectl -n monitoring get endpoints
NAME                    ENDPOINTS                                                        AGE
alertmanager-main       10.244.2.10:9093,10.244.35.4:9093,10.244.91.5:9093               15h
alertmanager-operated   10.244.2.10:9093,10.244.35.4:9093,10.244.91.5:9093 + 3 more...   15h
grafana                 10.244.2.8:3000                                                  15h
kube-state-metrics      10.244.2.9:9443,10.244.2.9:8443                                  15h
node-exporter           192.168.100.102:9100,192.168.100.103:9100,192.168.100.105:9100   15h
prometheus-k8s          10.244.2.11:9090,10.244.35.5:9090                                15h
prometheus-operated     10.244.2.11:9090,10.244.35.5:9090                                15h
prometheus-operator     10.244.35.3:8080                                                 15h
# kubectl -n monitoring get servicemonitors
NAME                      AGE
alertmanager              15h
kube-apiserver            15h
kube-controller-manager   15h
kube-scheduler            15h
kube-state-metrics        15h
kubelet                   15h
node-exporter             15h
prometheus                15h
prometheus-operator       15h
# kubectl get customresourcedefinitions
NAME                                    AGE
alertmanagers.monitoring.coreos.com     11d
prometheuses.monitoring.coreos.com      11d
servicemonitors.monitoring.coreos.com   11d

注:在部署过程中我将镜像地址都更改为从本地镜像仓库进行拉取,但是有pod依然会从远端拉取镜像,如下:

QQ截图20180409151515.png
QQ截图20180409151515.png

这里我是无法拉取alertmanager的镜像,解决方法就是先将该镜像拉取到本地,然后打包分发至各节点:

# docker save 23744b2d645c -o alertmanager-v0.14.0.tar.gz
# ansible node -m copy -a 'src=alertmanager-v0.14.0.tar.gz dest=/root'
# ansible node -a 'docker load -i /root/alertmanager-v0.14.0.tar.gz'
192.168.100.104 | SUCCESS | rc=0 >>
Loaded image ID: sha256:23744b2d645c0574015adfba4a90283b79251aee3169dbe67f335d8465a8a63f
192.168.100.103 | SUCCESS | rc=0 >>
Loaded image ID: sha256:23744b2d645c0574015adfba4a90283b79251aee3169dbe67f335d8465a8a63f
# ansible node -a 'docker images quay.io/prometheus/alertmanager'
192.168.100.103 | SUCCESS | rc=0 >>
REPOSITORY                        TAG                 IMAGE ID            CREATED             SIZE
quay.io/prometheus/alertmanager   v0.14.0             23744b2d645c        7 weeks ago         31.9MB

192.168.100.104 | SUCCESS | rc=0 >>
REPOSITORY                        TAG                 IMAGE ID            CREATED             SIZE
quay.io/prometheus/alertmanager   v0.14.0             23744b2d645c        7 weeks ago         31.9MB

2.添加 etcd 监控

Prometheus Operator有 etcd 仪表盘,但是需要额外的配置才能完全监控显示。官方文档:Monitoring external etcd

a.在 namespace 中创建secrets

# kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/ssl/ca.pem --from-file=/etc/kubernetes/ssl/etcd.pem --from-file=/etc/kubernetes/ssl/etcd-key.pem
secret "etcd-certs" created
# kubectl -n monitoring get secrets etcd-certs
NAME         TYPE      DATA      AGE
etcd-certs   Opaque    3         16h

注:这里的证书是在部署 etcd 集群时创建,请更改为自己证书存放的路径。

b.使Prometheus Operator接入secret

# vim manifests/prometheus/prometheus-k8s.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: k8s
  labels:
    prometheus: k8s
spec:
  replicas: 2
  secrets:
  - etcd-certs
  version: v2.2.1
# kubectl -n monitoring replace -f manifests/prometheus/prometheus-k8s.yaml
prometheus "k8s" replaced

注:这里只需加入如下项即可:

  secrets:
  - etcd-certs

c.创建Service、Endpoints和ServiceMonitor服务

# vim manifests/prometheus/prometheus-etcd.yaml 
apiVersion: v1
kind: Service
metadata:
  name: etcd-k8s
  labels:
    k8s-app: etcd
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: api
    port: 2379
    protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
  name: etcd-k8s
  labels:
    k8s-app: etcd
subsets:
- addresses:
  - ip: 192.168.100.102
    nodeName: etcd1
  - ip: 192.168.100.103
    nodeName: etcd2
  - ip: 192.168.100.104
    nodeName: etcd3
  ports:
  - name: api
    port: 2379
    protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: etcd-k8s
  labels:
    k8s-app: etcd-k8s
spec:
  jobLabel: k8s-app
  endpoints:
  - port: api
    interval: 30s
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/secrets/etcd-certs/ca.pem
      certFile: /etc/prometheus/secrets/etcd-certs/etcd.pem
      keyFile: /etc/prometheus/secrets/etcd-certs/etcd-key.pem
      #use insecureSkipVerify only if you cannot use a Subject Alternative Name
      insecureSkipVerify: true 
  selector:
    matchLabels:
      k8s-app: etcd
  namespaceSelector:
    matchNames:
    - monitoring
# kubectl create -f manifests/prometheus/prometheus-etcd.yaml

注1:请将 etcd 的ip地址和 etcd 的节点名更改为自行配置的ip和节点名。

注2:在 tlsconfig 下边的三项只需更改最后的ca.pem、etcd.pem、etcd-key.pem为自己相应的证书名即可。如实在不了解,可登陆进 prometheus-k8s 的pod进行查看:

# kubectl exec -ti -n monitoring prometheus-k8s-0 /bin/sh
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod.
/prometheus $ ls /etc/prometheus/secrets/etcd-certs/
ca.pem        etcd-key.pem  etcd.pem

3.Prometheus Operator 部署完成后会对外暴露三个端口:30900为Prometheus端口、30902为grafana端口、30903为alertmanager端口。

Prometheus显示如下,如何一切正常,所有target都应该是up的。

QQ截图20180408170107.png
QQ截图20180408170107.png

Alertmanager显示如下

QQ截图20180408170229.png
QQ截图20180408170229.png
QQ截图20180408170250.png
QQ截图20180408170250.png

Grafana的监控项显示如下

Prometheus01.PNG
Prometheus01.PNG

etcd相关监控项显示如下

Prometheus02.PNG
Prometheus02.PNG
Prometheus03.PNG
Prometheus03.PNG

kubernetes集群显示如下

Prometheus04.PNG
Prometheus04.PNG
Prometheus05.PNG
Prometheus05.PNG

节点监控显示如下

Prometheus08.PNG
Prometheus08.PNG
Prometheus09.PNG
Prometheus09.PNG
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2018年08月23日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
Prometheus 监控服务
Prometheus 监控服务(TencentCloud Managed Service for Prometheus,TMP)是基于开源 Prometheus 构建的高可用、全托管的服务,与腾讯云容器服务(TKE)高度集成,兼容开源生态丰富多样的应用组件,结合腾讯云可观测平台-告警管理和 Prometheus Alertmanager 能力,为您提供免搭建的高效运维能力,减少开发及运维成本。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档