王先森2024-01-242024-01-24
目前监控k8s集群指标是SkyWalking v9版本新特性,配置的时候网上一篇文章没有,搞了很久,记录一下,经验之谈就是多番找GitHub
中 Issues 和阅读官方文档。
官方文档解释监控k8s集群地址:https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-k8s-monitoring
本次安装采用的是 Prometheus Operator 中部署的kube-state-metric,如果你想只想安装 kube-state-metric 请关注公众号回复:kube-state-metric
获取yaml。
# 验证 Prometheus Operator 安装的需要通过https请求访问
$ kubectl describe secrets -n monitoring prometheus-k8s-token-j5spg
$ curl -k -H "Authorization: Bearer xxxxxxx" https://172.17.130.5:9443/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 4.0873e-05
go_gc_duration_seconds{quantile="0.25"} 7.9406e-05
go_gc_duration_seconds{quantile="0.5"} 0.000125605
go_gc_duration_seconds{quantile="0.75"} 0.000348579
go_gc_duration_seconds{quantile="1"} 0.096992811
go_gc_duration_seconds_sum 0.294185344
# 单独安装的验证
# $ curl 172.17.130.5:8080/metrics
默认情况下,cAdvisor 已集成到 kubelet 中。如果您不知道怎么配置请查看 Prometheus 相关文章进行学习。
Opentelemetry-collector
是一个用于收集、处理和传递遥测数据的工具。它是开源的,并且由CNCF(云原生计算基金会)支持。 Opentelemetry-collector
具有以下主要功能和特点:
如果您的架构比较庞大也可以参考 OpenTelemetry 官方给出的安装办法:https://raw.githubusercontent.com/open-telemetry/opentelemetry-collector/main/examples/k8s/otel-config.yaml Skywalking 官网也给出监控k8s集群样例模板:https://raw.githubusercontent.com/apache/skywalking-showcase/main/deploy/platform/kubernetes/templates/feature-kubernetes-monitor/opentelemetry-config.yaml 本次部署仅作为测试环境。
RBACConfigMapDeployment
# vim rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: skywalking
name: otel-collector
namespace: skywalking
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: otel-collector
namespace: skywalking
labels:
app: otel-collector
rules:
- apiGroups: [""]
resources: ["pods", "endpoints", "services", "nodes", "nodes/metrics"]
verbs: ["get", "watch", "list"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: otel-collector
namespace: skywalking
labels:
app: otel-collector
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: otel-collector
subjects:
- kind: ServiceAccount
name: otel-collector
namespace: skywalking
# vim cm.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-conf
labels:
app: opentelemetry
component: otel-collector-conf
namespace: skywalking
data:
otel-collector-config: |
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https # 通过https访问
metrics_path: /metrics/cadvisor # metrics地址
tls_config: # 证书配置,忽略证书验证。
insecure_skip_verify: true
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
authorization: # 认证配置
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
replacement: $$1
- source_labels: [ ]
target_label: cluster
replacement: k8s-cluster-1.23 ## skywalking仪表盘集群显示的名称,需要修改
# @feature: kubernetes-monitor; configuration to scrape Kubernetes Endpoints metrics
- job_name: kube-state-metrics
scheme: https
metrics_path: /metrics
tls_config:
insecure_skip_verify: true
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [ __meta_kubernetes_service_label_app_kubernetes_io_name ]
regex: kube-state-metrics
replacement: $$1
action: keep
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [ ]
target_label: cluster
replacement: k8s-cluster-1.23 ## skywalking仪表盘集群显示的名称,需要修改
otlp:
protocols:
grpc:
endpoint: ${env:MY_POD_IP}:4317
http:
endpoint: ${env:MY_POD_IP}:4318
exporters:
otlp:
endpoint: "http://oap-svc:11800" # skywalking oap后端地址 oap-svc:11800,需要修改
tls:
insecure: true
logging: # 日志输出,开启debug调试
loglevel: debug
service:
# extensions: [memory_ballast]
pipelines:
metrics:
receivers: [prometheus]
exporters: [otlp, logging]
# vim dp.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
labels:
app: opentelemetry
component: otel-collector
namespace: skywalking
spec:
selector:
matchLabels:
app: opentelemetry
component: otel-collector
minReadySeconds: 5
progressDeadlineSeconds: 120
replicas: 1 #TODO - adjust this to your own requirements
template:
metadata:
labels:
app: opentelemetry
component: otel-collector
spec:
serviceAccountName: otel-collector
containers:
- command:
- "/otelcol"
- "--config=/conf/otel-collector-config.yaml"
image: otel/opentelemetry-collector:0.92.0
name: otel-collector
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 200m
memory: 400Mi
ports:
- containerPort: 55679 # Default endpoint for ZPages.
- containerPort: 4317 # Default endpoint for OpenTelemetry receiver.
- containerPort: 14250 # Default endpoint for Jaeger gRPC receiver.
- containerPort: 14268 # Default endpoint for Jaeger HTTP receiver.
- containerPort: 9411 # Default endpoint for Zipkin receiver.
- containerPort: 8888 # Default endpoint for querying metrics.
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: GOMEMLIMIT
value: 1000MiB
volumeMounts:
- name: otel-collector-config-vol
mountPath: /conf
# - name: otel-collector-secrets
# mountPath: /secrets
volumes:
- configMap:
name: otel-collector-conf
items:
- key: otel-collector-config
path: otel-collector-config.yaml
name: otel-collector-config-vol
# - secret:
# name: otel-collector-secrets
# items:
# - key: cert.pem
# path: cert.pem
# - key: key.pem
# path: key.pem
kubectl apply -f rbac.yaml
kubectl apply -f cm.yaml
kubectl apply -f dp.yaml
查看状态
$ kubectl get pods -n skywalking
NAME READY STATUS RESTARTS AGE
oap-7c9cc4f7bd-rtksc 1/1 Running 0 22s
otel-collector-7b66c5664d-kbrpq 1/1 Running 0 22h
skywalking-es-init-sx6vh 0/1 Completed 0 22h
ui-5445497c77-htwxw 1/1 Running 0 22h
打开http://skywalking.od.com/ 你会发现自动出现Kubernetes监控指标
默认情况下, 遥测功能(telemetry)是关闭的(selector
为 none
),像这样:
telemetry:
selector: ${SW_TELEMETRY:none}
none:
prometheus:
host: ${SW_TELEMETRY_PROMETHEUS_HOST:0.0.0.0}
port: ${SW_TELEMETRY_PROMETHEUS_PORT:1234}
sslEnabled: ${SW_TELEMETRY_PROMETHEUS_SSL_ENABLED:false}
sslKeyPath: ${SW_TELEMETRY_PROMETHEUS_SSL_KEY_PATH:""}
sslCertChainPath: ${SW_TELEMETRY_PROMETHEUS_SSL_CERT_CHAIN_PATH:""}
Prometheus 可做为遥测功能(telemetry)的实现者。使用这个功能,Prometheus 就可以收集 Skywalking OAP 的 metrics 数据。
containers:
- name: oap
image: skywalking.docker.scarf.sh/apache/skywalking-oap-server:9.7.0
imagePullPolicy: IfNotPresent
.....
ports:
- containerPort: 11800
name: grpc
- containerPort: 1234 # 监听端口
name: prometheus-port
- containerPort: 12800
name: rest
env:
- name: JAVA_OPTS
value: "-Dmode=no-init -Xmx2g -Xms2g"
- name: TZ
value: Asia/Shanghai
- name: SW_TELEMETRY # 开启SW_TELEMETRY监控
value: "prometheus"
默认情况下,端点在开放在 http://0.0.0.0:1234
和 http://0.0.0.0:1234/metrics
。也可以根据需要设置主机和端口。
设置OpenTelemetry收集器并配置数据抓取任务:
- job_name: 'skywalking-so11y' # make sure to use this in the so11y.yaml to filter only so11y metrics
metrics_path: '/metrics'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_name, __meta_kubernetes_pod_container_port_name]
action: keep
regex: oap;prometheus-port
- source_labels: []
target_label: service
replacement: oap-server
- source_labels: [__meta_kubernetes_pod_name]
target_label: host_name
regex: (.+)
replacement: $$1
打开http://skywalking.od.com/ 你会发现自动出现自监控监控指标
SkyWalking 利用 Prometheus 的node-exporter
收集指标数据,并利用 OpenTelemetry Collector 将指标传输到 OpenTelemetry 接收器并传输到OAP中。
本次安装采用的是 Prometheus Operator 中部署的node-exporter,如果你想只想安装 node-exporter 请关注公众号回复:node-exporter
获取yaml。
- job_name: "vm-monitoring" # make sure to use this in the vm.yaml to filter only VM metrics
scrape_interval: 10s
scheme: https
metrics_path: /metrics
tls_config:
insecure_skip_verify: true
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
static_configs:
- targets: ["k8s-master:9100"]
- targets: ["k8s-node1:9100"]
- targets: ["k8s-node2:9100"]
打开http://skywalking.od.com/ 你会发现自动出现基础设施监控指标