Prometheus(普罗米修斯)是一套最初在SoundCloud上构建的开源监视和告警系统 。
普罗米修斯的主要特点是:
大部Prometheus分组件都是用go写的,因此很容易构建和部署为静态二进制文件。
Prometheus直接或通过中间推送网关从仪表工作中删除指标,用于短期工作。它在本地存储所有已删除的样本,并对此数据运行规则,以汇总和记录现有数据的新时间序列或生成警报。Grafana或其他API使用者可用于可视化收集的数据。
Prometheus官方给出了多重部署方案,比如:Docker容器、Ansible、Chef、Puppet、Saltstack等。
Prometheus用Golang实现,因此具有天然可移植性(支持Linux、Windows、macOS和Freebsd)。这里直接使用预编译的二进制文件部署,开箱即用。
$ wget https://github.com/prometheus/prometheus/releases/download/v2.8.0/prometheus-2.8.0.linux-amd64.tar.gz
$ tar zxvf prometheus-2.8.0.linux-amd64.tar.gz
$ mv prometheus-2.8.0.linux-amd64 /usr/local/prometheus
Prometheus服务是一个单独的二进制文件 prometheus
,使用方法:
$ ./prometheus --help
usage: prometheus [<flags>]
The Prometheus monitoring server
Flags:
-h, --help Show context-sensitive help (also try --help-long and --help-man).
--version Show application version.
--config.file="prometheus.yml"
Prometheus configuration file path.
... ...
这里单独创建一个专门用于运行prometheus的用户,不用root运行程序是一种好习惯。
$ useradd prometheus
$ id prometheus
uid=1000(prometheus) gid=1000(prometheus) 组=1000(prometheus)
$ chown -R prometheus.prometheus /usr/local/prometheus
[Unit]
Description=prometheus
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/usr/local/prometheus/data/
Restart=on-failure
[Install]
WantedBy=multi-user.target
Prometheus通过命令行参数和配置文件共同进行配置。
配置文件为 YAML格式 ,结构如下,括号表示参数是可选的。对于非列表参数,该值设置为指定的默认值。
<boolean>
:布尔值,取值为 true
或 false
<duration>
:与正则表达式匹配的持续时间 [0-9]+(ms|[smhdwy])
<labelname>
:与正则表达式匹配的字符串 [a-zA-Z_][a-zA-Z0-9_]*
<labelvalue>
:一串unicode字符<filename>
:当前工作目录中的有效路径<host>
:由主机名或IP后跟可选端口号组成的有效字符串<path>
:有效的URL路径<scheme>
:字符串,取值为 http
或 https
<string>
:常规字符串<secret>
:一个加密的常规字符串,例如密码<tmpl_string>
:模板字符串通过 ./prometheus --help
查看所有命令行配置参数。Prometheus运行时重新加载配置,如果新配置格式错误,则不会应用更改。可以通过向Prometheus进程发送 SIGHUP
信号或向服务发送 /-/reload
HTTP POST请求(前提是启动时开启--web.enable-lifecycle
参数)来 重新加载 服务,这会重新加载配置。
Prometheus的配置文件为 prometheus.yml
,文件中 #
表示注释。
global:
# How frequently to scrape targets by default.
[ scrape_interval: <duration> | default = 1m ]
# How long until a scrape request times out.
[ scrape_timeout: <duration> | default = 10s ]
# How frequently to evaluate rules.
[ evaluation_interval: <duration> | default = 1m ]
# The labels to add to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
[ <labelname>: <labelvalue> ... ]
# Rule files specifies a list of globs. Rules and alerts are read from
# all matching files.
rule_files:
[ - <filepath_glob> ... ]
# A list of scrape configurations.
scrape_configs:
[ - <scrape_config> ... ]
# Alerting specifies settings related to the Alertmanager.
alerting:
alert_relabel_configs:
[ - <relabel_config> ... ]
alertmanagers:
[ - <alertmanager_config> ... ]
# Settings related to the remote write feature.
remote_write:
[ - <remote_write> ... ]
# Settings related to the remote read feature.
remote_read:
[ - <remote_read> ... ]
配置文件分为6个模块:global、rule_files、scrape_configs、alerting、remote_write、remote_read。
配置监控项,及相关参数。目标可以是静态的(static_configs指定),也可以是动态获取(通过Prometheus某种自发现功能),relabel_configs
可在抓取数据前对目标及起标签进行编辑。
# The targets specified by the static config.
targets:
[ - '<host>' ]
# Labels assigned to all metrics scraped from the targets.
labels:
[ <labelname>: <labelvalue> ... ]
job_name: <job_name> # 任务名称,全局唯一
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ] # 抓取数据的频率
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ] # 抓取数据超时时间
[ metrics_path: <path> | default = /metrics ] # 获取metrics的文件
# honor_labels controls how Prometheus handles conflicts between labels that are
# already present in scraped data and labels that Prometheus would attach
# server-side ("job" and "instance" labels, manually configured target
# labels, and labels generated by service discovery implementations).
#
# If honor_labels is set to "true", label conflicts are resolved by keeping label
# values from the scraped data and ignoring the conflicting server-side labels.
#
# If honor_labels is set to "false", label conflicts are resolved by renaming
# conflicting labels in the scraped data to "exported_<original-label>" (for
# example "exported_instance", "exported_job") and then attaching server-side
# labels. This is useful for use cases such as federation, where all labels
# specified in the target should be preserved.
#
# Note that any globally configured "external_labels" are unaffected by this
# setting. In communication with external systems, they are always applied only
# when a time series does not have a given label yet and are ignored otherwise.
[ honor_labels: <boolean> | default = false ]
[ scheme: <scheme> | default = http ] # web请求协议类型
params: # 可选的 http URL 参数
[ <string>: [<string>, ...] ]
basic_auth: # 用户认证相关配置
[ username: <string> ]
[ password: <secret> ]
[ password_file: <string> ]
# Sets the `Authorization` header on every scrape request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token: <secret> ] # 认证token
# Sets the `Authorization` header on every scrape request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]
# Configures the scrape request's TLS settings.
tls_config:
[ <tls_config> ]
# Optional proxy URL.
[ proxy_url: <string> ]
# List of Azure service discovery configurations.
azure_sd_configs:
[ - <azure_sd_config> ... ]
# List of Consul service discovery configurations.
consul_sd_configs:
[ - <consul_sd_config> ... ]
# List of DNS service discovery configurations.
dns_sd_configs:
[ - <dns_sd_config> ... ]
# List of EC2 service discovery configurations.
ec2_sd_configs:
[ - <ec2_sd_config> ... ]
# List of OpenStack service discovery configurations.
openstack_sd_configs:
[ - <openstack_sd_config> ... ]
# List of file service discovery configurations.
file_sd_configs:
[ - <file_sd_config> ... ]
# List of GCE service discovery configurations.
gce_sd_configs:
[ - <gce_sd_config> ... ]
# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
[ - <kubernetes_sd_config> ... ]
# List of Marathon service discovery configurations.
marathon_sd_configs:
[ - <marathon_sd_config> ... ]
# List of AirBnB's Nerve service discovery configurations.
nerve_sd_configs:
[ - <nerve_sd_config> ... ]
# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
[ - <serverset_sd_config> ... ]
# List of Triton service discovery configurations.
triton_sd_configs:
[ - <triton_sd_config> ... ]
# List of labeled statically configured targets for this job.
static_configs:
[ - <static_config> ... ]
# List of target relabel configurations.
relabel_configs:
[ - <relabel_config> ... ]
# List of metric relabel configurations.
metric_relabel_configs:
[ - <relabel_config> ... ]
# Per-scrape limit on number of scraped samples that will be accepted.
# If more than this number of samples are present after metric relabelling
# the entire scrape will be treated as failed. 0 means no limit.
[ sample_limit: <int> | default = 0 ]
# CA certificate to validate API server certificate with.
[ ca_file: <filename> ]
# Certificate and key files for client cert authentication to the server.
[ cert_file: <filename> ]
[ key_file: <filename> ]
# ServerName extension to indicate the name of the server.
# https://tools.ietf.org/html/rfc4366#section-3.1
[ server_name: <string> ]
# Disable validation of the server certificate.
[ insecure_skip_verify: <boolean> ]
prometheus支持好多种服务收集,从上面的配置信息说明就可以看得出比如kubernetes、openstack、ec2、dns等等,下面了解一下当前应用最多的kubernetes监控,参考文档 。
(adsbygoogle = window.adsbygoogle || []).push({});