前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >prometheus使用总结(1)

prometheus使用总结(1)

原创
作者头像
Bob hadoop
修改2021-03-26 14:17:52
1.2K0
修改2021-03-26 14:17:52
举报
文章被收录于专栏:日常杂记

1、安装

官网地址:https://prometheus.io/download/

Linux安装

解压即可使用

代码语言:javascript
复制
tar xvf prometheus-2.10.0.linux-amd64 

启动方式

代码语言:javascript
复制
/opt/monitor/prometheus/prometheus --config.file="/opt/monitor/prometheus/prometheus.yml"> /opt/monitor/prometheus/prometheus.log --web.enable-lifecycle 2>&1 &

Mac安装

brew安装

代码语言:javascript
复制
brew install prometheus

第一种启动方式

代码语言:javascript
复制
brew services start prometheus

第二种启动方式

代码语言:javascript
复制
prometheus --config.file=/usr/local/etc/prometheus.yml --web.enable-lifecycle 2>&1 &

总结:两种启动方式都加了--web.enable-lifecycle 称为远程热加载配置文件,在修改了prometheus.yml后可以直接远程刷新配置。刷新命令为:curl -XPOST http://localhost:9090/-/reload 根据自己的服务启动端口以及ip进行相应修改。

2、配置

因为我使用的场景有限

代码语言:javascript
复制
[root@bigdata3 prometheus]# cat  prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['192.168.1.5:9093']        # alertmanagers所在地址
      # - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
#################rules#############################
 - "/opt/monitor/prometheus/rules/hosts/*.yml" # 告警规则存放目录
#################rules#############################
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['192.168.1.5:9090']          #Prometheus安装机器地址
#################hosts#############################
  - job_name: 'dmp_hosts'	   # 标签用于区分各个监控项目的机器
    file_sd_configs:
    - files: ['/opt/monitor/prometheus/monitor_config/dmp/*.yml']   # 监控dmp集群的机器配置放置目录
      refresh_interval: 5s
#################hdfs#############################

官网文档说明

代码语言:javascript
复制
global:
  # 默认情况下刮取目标的频率
  [ scrape_interval: <duration> | default = 1m ]

  # 刮取目标的超时时间.
  [ scrape_timeout: <duration> | default = 10s ]

  # 评估规则的频率.
  [ evaluation_interval: <duration> | default = 1m ]

  # 与外部系统(联邦、远程存储、警报管理器)通信时要添加到任何时间序列或警报的标签。
  external_labels:
    [ <labelname>: <labelvalue> ... ]

  # PromQL查询记录到的文件.
  # 从新加载配置从新打开文件.
  [ query_log_file: <string> ]

# 规则文件指定全局的列表。从中读取规则和警报所有匹配的文件。
rule_files:
  [ - <filepath_glob> ... ]

# 刮擦的配置列表.
scrape_configs:
  [ - <scrape_config> ... ]

# 报警指定与alertmanager相关的配置.
alerting:
  alert_relabel_configs:
    [ - <relabel_config> ... ]
  alertmanagers:
    [ - <alertmanager_config> ... ]

# 远程写入相关的配置,比如后端时间序列数据库对接clickhouse等.
remote_write:
  [ - <remote_write> ... ]

# 远程读取功能相关的配置.
remote_read:
  [ - <remote_read> ... ]

其实需要着重的学习下scrape_configs因为我们刮擦的东西较多时,可能会有不同的配置,官网给出的方式如下

代码语言:javascript
复制
# 默认情况下分配给刮取度量的作业名称.
job_name: <job_name>

# 这个job中刮擦目标的频率.
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ]

# 每次执行这个job进行刮取的超时时间.
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ]

# 从目标获取量度的http地址资源的路径.
[ metrics_path: <path> | default = /metrics ]

# honor_labels controls how Prometheus handles conflicts between labels that are
# already present in scraped data and labels that Prometheus would attach
# server-side ("job" and "instance" labels, manually configured target
# labels, and labels generated by service discovery implementations).
# 荣誉标签控制Prometheus如何处理已存在于刮取数据中的标签与Prometheus将附加到服务器端的标签之间的冲突
#(“作业”和“实例”标签、手动配置的目标标签以及服务发现实现生成的标签)。
#
# If honor_labels is set to "true", label conflicts are resolved by keeping label
# values from the scraped data and ignoring the conflicting server-side labels.
# 如果将标签设置为“true”,则通过保留刮取数据中的标签值并忽略冲突的服务器端标签来解决标签冲突。
# If honor_labels is set to "false", label conflicts are resolved by renaming
# conflicting labels in the scraped data to "exported_<original-label>" (for
# example "exported_instance", "exported_job") and then attaching server-side
# labels.
# 如果荣誉标签设置为“false”,则标签冲突可以通过将刮取数据中的冲突标签重命名为“exported\<original label>”(例如“exported\\ instance”、“exported\\ job”),然后附加服务器端标签来解决。
# Setting honor_labels to "true" is useful for use cases such as federation and
# scraping the Pushgateway, where all labels specified in the target should be
# preserved.
# 将pushu labels设置为“true”对于诸如联合和清除Pushgateway之类的用例非常有用,在这些用例中,应该保留在目标中指定的所有标签。
# Note that any globally configured "external_labels" are unaffected by this
# setting. In communication with external systems, they are always applied only
# when a time series does not have a given label yet and are ignored otherwise.
# 请注意,任何全局配置的“外部\u标签”都不受此设置的影响。在与外部系统的通信中,它们总是仅在时间序列没有给定标签时才应用,否则将被忽略

[ honor_labels: <boolean> | default = false ]

# honor_timestamps controls whether Prometheus respects the timestamps present in scraped data.
# “时间戳”控制普罗米修斯是否尊重被刮取数据中的时间戳。
# If honor_timestamps is set to "true", the timestamps of the metrics exposed by the target will be used.
# 如果将timestamps设置为“true”,则将使用目标公开的度量的时间戳。
#
# If honor_timestamps is set to "false", the timestamps of the metrics exposed by the target will be ignored.
# 如果将timestamps设置为“false”,则将忽略目标公开的度量的时间戳。
 
[ honor_timestamps: <boolean> | default = true ]

# Configures the protocol scheme used for requests.
# 配置用于请求的协议方案

[ scheme: <scheme> | default = http ]

# Optional HTTP URL parameters可选http url 参数.
params:
  [ <string>: [<string>, ...] ]

# Sets the `Authorization` header on every scrape request with the configured username and password.
# 使用配置的用户名和密码在每个scrape请求上设置“Authorization”头。
# password and password_file are mutually exclusive.
# 密码和密码文件是互斥的。
basic_auth:
  [ username: <string> ]
  [ password: <secret> ]
  [ password_file: <string> ]

# Sets the `Authorization` header on every scrape request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token: <secret> ]

# Sets the `Authorization` header on every scrape request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: <filename> ]

# Configures the scrape request's TLS settings.
tls_config:
  [ <tls_config> ]

# Optional proxy URL.
[ proxy_url: <string> ]

# List of Azure service discovery configurations.
azure_sd_configs:
  [ - <azure_sd_config> ... ]

# List of Consul service discovery configurations.
consul_sd_configs:
  [ - <consul_sd_config> ... ]

# List of DigitalOcean service discovery configurations.
digitalocean_sd_configs:
  [ - <digitalocean_sd_config> ... ]

# List of Docker Swarm service discovery configurations.
dockerswarm_sd_configs:
  [ - <dockerswarm_sd_config> ... ]

# List of DNS service discovery configurations.
dns_sd_configs:
  [ - <dns_sd_config> ... ]

# List of EC2 service discovery configurations.
ec2_sd_configs:
  [ - <ec2_sd_config> ... ]

# List of Eureka service discovery configurations.
eureka_sd_configs:
  [ - <eureka_sd_config> ... ]

# List of file service discovery configurations.
file_sd_configs:
  [ - <file_sd_config> ... ]

# List of GCE service discovery configurations.
gce_sd_configs:
  [ - <gce_sd_config> ... ]

# List of Hetzner service discovery configurations.
hetzner_sd_configs:
  [ - <hetzner_sd_config> ... ]

# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
  [ - <kubernetes_sd_config> ... ]

# List of Marathon service discovery configurations.
marathon_sd_configs:
  [ - <marathon_sd_config> ... ]

# List of AirBnB's Nerve service discovery configurations.
nerve_sd_configs:
  [ - <nerve_sd_config> ... ]

# List of OpenStack service discovery configurations.
openstack_sd_configs:
  [ - <openstack_sd_config> ... ]

# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
  [ - <serverset_sd_config> ... ]

# List of Triton service discovery configurations.
triton_sd_configs:
  [ - <triton_sd_config> ... ]

# List of labeled statically configured targets for this job.
static_configs:
  [ - <static_config> ... ]

# List of target relabel configurations.
relabel_configs:
  [ - <relabel_config> ... ]

# List of metric relabel configurations.
metric_relabel_configs:
  [ - <relabel_config> ... ]

# Per-scrape limit on number of scraped samples that will be accepted.
# If more than this number of samples are present after metric relabeling
# the entire scrape will be treated as failed. 0 means no limit.
[ sample_limit: <int> | default = 0 ]

# Per-scrape config limit on number of unique targets that will be
# accepted. If more than this number of targets are present after target
# relabeling, Prometheus will mark the targets as failed without scraping them.
# 0 means no limit. This is an experimental feature, this behaviour could
# change in the future.
[ target_limit: <int> | default = 0 ]

太多了懒得翻译的,大部分都是用不到的,在这里我推荐下面这种方式配置,

基于文件的服务发现提供了一种配置静态目标的更通用的方法,并充当了插入自定义服务发现机制的接口。

它读取一组包含零个或多个的文件。通过磁盘监视可检测到对所有已定义文件的更改,并立即应用这些更改。文件可以以YAML或JSON格式提供。仅应用导致形成良好目标组的更改。

JSON json [ { "targets": [ "<host>", ... ], "labels": { "<labelname>": "<labelvalue>", ... } }, ... ]

YAML yaml - targets: [ - '<host>' ] labels: [ <labelname>: <labelvalue> ... ]

代码语言:javascript
复制
文件必须包含使用以下格式的静态配置列表:##########################host#######################################################
  - job_name: 'a_host'
    file_sd_configs:
    - files: ['/opt/prometheus/monitor_config/host_config/clien_cm_config/*.yml']
      refresh_interval:    5s

file_sd_configs这种我们只要规划好刮擦目标的文件目录就可以了,像我的生产集群就会按照角色分类

我可以在prometheus软件目录里建立monitor_config文件夹在这里面再建立其他文件夹来区分我的刮擦任务,同时配置文件可以注释好哪部分任务在哪个文件夹目录

比如:需求datanode机器状态mysql状态,client状态

那么prometheus中的monitor_config目录就可以如下配置

主机状态的

/opt/monitor/prometheus/monitor_config/hosts/client/

/opt/monitor/prometheus/monitor_config/hosts/datanode/

/opt/monitor/prometheus/monitor_config/hosts/mysql/

服务状态的

/opt/monitor/prometheus/monitor_config/cm/datanode/

/opt/monitor/prometheus/monitor_config/cm/namenode/

/opt/monitor/prometheus/monitor_config/cm/nodemanager/

/opt/monitor/prometheus/monitor_config/mysql/status/

/opt/monitor/prometheus/monitor_config/mysql/max_connet/

这里面放的文件即为我们监控的

xxxx.yml

代码语言:javascript
复制
- targets: [ "xx.xx.xx.xx:9275" ]
  labels:
    group: "a_host"
    kind: "cm"

吃午饭去。。。。。下次再写!

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1、安装
    • Linux安装
      • Mac安装
      • 2、配置
      相关产品与服务
      云数据库 SQL Server
      腾讯云数据库 SQL Server (TencentDB for SQL Server)是业界最常用的商用数据库之一,对基于 Windows 架构的应用程序具有完美的支持。TencentDB for SQL Server 拥有微软正版授权,可持续为用户提供最新的功能,避免未授权使用软件的风险。具有即开即用、稳定可靠、安全运行、弹性扩缩等特点。
      领券
      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档