前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Promscale-Prometheus的分析平台和长期存储测试

Promscale-Prometheus的分析平台和长期存储测试

作者头像
公众号: 云原生生态圈
发布2022-04-08 14:20:59
1.8K0
发布2022-04-08 14:20:59
举报
文章被收录于专栏:云原生生态圈云原生生态圈

promscale 是一个开源的可观察性后端,用于由 SQL 提供支持的指标和跟踪。

它建立在 PostgreSQL 和 TimescaleDB 的强大和高性能基础之上。它通过 OpenTelemetry Collector 原生支持 Prometheus 指标和 OpenTelemetry 跟踪以及许多其他格式,如 StatsD、Jaeger 和 Zipkin,并且100% 兼容 PromQL[1]。其完整的 SQL 功能使开发人员能够关联指标、跟踪和业务数据,从而获得新的有价值的见解,当数据在不同系统中孤立时是不可能的。它很容易与 Grafana 和 Jaeger 集成,以可视化指标和跟踪。

它建立在 PostgreSQL 和 TimescaleDB 之上,继承了坚如磐石的可靠性、高达 90% 的本机压缩、连续聚合以及在全球数百万个实例上运行的系统的操作成熟度。

Promscale 可以用作 Grafana[2]PromLens[3]等可视化工具的 Prometheus 数据源。

Promscale 包括两个组件:

Promscale 连接器:一种无状态服务,为可观察性数据提供摄取接口,处理该数据并将其存储在 TimescaleDB 中。它还提供了一个使用 PromQL 查询数据的接口。Promscale 连接器自动设置 TimescaleDB 中的数据结构以存储数据并在需要升级到新版本的 Promscale 时处理这些数据结构中的更改。

TimescaleDB:存储所有可观察性数据的基于 Postgres 的数据库。它提供了用于查询数据的完整 SQL 接口以及分析函数、列压缩和连续聚合等高级功能。TimescaleDB 提供了很大的灵活性来存储业务和其他类型的数据,然后你可以使用这些数据与可观察性数据相关联。

Promscale 连接器使用 Prometheusremote_write接口摄取 Prometheus 指标、元数据和 OpenMetrics 示例。它还使用 OpenTelemetry 协议 (OTLP) 摄取 OpenTelemetry 跟踪。它还可以使用 OpenTelemetry 收集器以其他格式摄取指标和跟踪,以通过 Prometheusremote_write接口和 OpenTelemetry 协议处理和发送它们。例如,你可以使用 OpenTelemetry Collector 将 Jaeger 跟踪和 StatsD 指标摄取到 Promscale。

对于 Prometheus 指标,Promscale 连接器公开 Prometheus API 端点,用于运行 PromQL 查询和读取元数据。这允许你将支持 Prometheus API 的工具(例如 Grafana)直接连接到 Promscale 进行查询。也可以向 Prometheus 发送查询,并让 Prometheus 使用接口上的 Promscale 连接器从 Promscale 读取数据 remote_read

你还可以使用 SQL 在 Promscale 中查询指标和跟踪,这允许你使用与 PostgreSQL 集成的许多不同的可视化工具。例如,Grafana 支持通过 PostgreSQL 数据源使用开箱即用的 SQL 查询 Promscale 中的数据

我准备通过容器的方式进行尝试,我们先安装 docker 和 docker-compose

代码语言:javascript
复制
yum install -y yum-utils \
  device-mapper-persistent-data \
  lvm2
 yum-config-manager \
    --add-repo \
    https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/centos/docker-ce.repo
yum install docker-ce docker-ce-cli containerd.io docker-compose -y

编排

我按照官网的 docker 配置,进行编排了 compose 进行测试

代码语言:javascript
复制
version: '2.2'
services:
  timescaledb:
    image: timescaledev/promscale-extension:latest-ts2-pg13
    container_name: timescaledb
    restart: always
    hostname: "timescaledb"
    network_mode: "host"
    environment:
      - csynchronous_commit=off
      - POSTGRES_PASSWORD=123
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /data/prom/timescaledb/data:/var/lib/postgresql/data:rw
    mem_limit: 512m
    user: root
    stop_grace_period: 1m
  promscale:
    image: timescale/promscale:0.10
    container_name: promscale
    restart: always
    hostname: "promscale"
    network_mode: "host"
    environment:
      - PROMSCALE_DB_PASSWORD=123
      - PROMSCALE_DB_PORT=5432
      - PROMSCALE_DB_NAME=postgres
      - PROMSCALE_DB_HOST=127.0.0.1
      - PROMSCALE_DB_SSL_MODE=allow
    volumes:
      - /etc/localtime:/etc/localtime:ro
   #   - /data/prom/postgresql/data:/var/lib/postgresql/data:rw
    mem_limit: 512m
    user: root
    stop_grace_period: 1m
  grafana:
    image: grafana/grafana:8.3.7
    container_name: grafana
    restart: always
    hostname: "grafana"
    network_mode: "host"
    #environment:
    #  - GF_INSTALL_PLUGINS="grafana-clock-panel,grafana-simple-json-datasource"
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /data/grafana/plugins:/var/lib/grafana/plugins
    mem_limit: 512m
    user: root

  prometheus:
    image: prom/prometheus:v2.33.4
    container_name: prometheus
    restart: always
    hostname: "prometheus"
    network_mode: "host"
    #environment:
    volumes:
    - /etc/localtime:/etc/localtime:ro
    - /data/prom/prometheus/data:/prometheus:rw  # NOTE: chown 65534:65534 /data/prometheus/
    - /data/prom/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    - /data/prom/prometheus/alert:/etc/prometheus/alert
    #- /data/prom/prometheus/ssl:/etc/prometheus/ssl
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention=45d'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
      - '--web.enable-admin-api'
    mem_limit: 512m
    user: root
    stop_grace_period: 1m
  node_exporter:
    image: prom/node-exporter:v1.3.1
    container_name: node_exporter
    user: root
    privileged: true
    network_mode: "host"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
    restart: unless-stopped

grafana

我们这里使用的 root 用户就是因为需要手动安装下插件

代码语言:javascript
复制
bash-5.1# grafana-cli plugins install  grafana-clock-panel
✔ Downloaded grafana-clock-panel v1.3.0 zip successfully

Please restart Grafana after installing plugins. Refer to Grafana documentation for instructions if necessary.

bash-5.1# grafana-cli plugins install grafana-simple-json-datasource
✔ Downloaded grafana-simple-json-datasource v1.4.2 zip successfully

Please restart Grafana after installing plugins. Refer to Grafana documentation for instructions if necessary.

配置 grafana

在这里下载一些模板https://grafana.com/grafana/dashboards/?pg=hp&plcmt=lt-box-dashboards&search=prometheus

Visualize data in Promscale[4]

prometheus

我们可以尝试配置远程, 配置参数可以查看官网[5]

代码语言:javascript
复制
remote_write:
  - url: "http://127.0.0.1:9201/write"
remote_read:
  - url: "http://127.0.0.1:9201/read"
    read_recent: true

远程配置如下

代码语言:javascript
复制
remote_write:
   - url: "http://127.0.0.1:9201/write"
     write_relabel_configs:
     - source_labels: [__name__]
       regex: '.*:.*'
       action: drop
     remote_timeout: 100s
     queue_config:
       capacity: 500000
       max_samples_per_send: 50000
       batch_send_deadline: 30s
       min_backoff: 100ms
       max_backoff: 10s
       min_shards: 16
       max_shards: 16
remote_read:
  - url: "http://127.0.0.1:9201/read"
    read_recent: true

prometheus.yaml 如下

代码语言:javascript
复制
global:
  scrape_interval:     15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - '127.0.0.1:9093'
rule_files:
  - "alert/host.alert.rules"
  - "alert/container.alert.rules"
  - "alert/targets.alert.rules"
scrape_configs:
  - job_name: prometheus
    scrape_interval: 30s
    static_configs:
    - targets: ['127.0.0.1:9090']
    - targets: ['127.0.0.1:9093']
    - targets: ['127.0.0.1:9100']
remote_write:
   - url: "http://127.0.0.1:9201/write"
     write_relabel_configs:
     - source_labels: [__name__]
       regex: '.*:.*'
       action: drop
     remote_timeout: 100s
     queue_config:
       capacity: 500000
       max_samples_per_send: 50000
       batch_send_deadline: 30s
       min_backoff: 100ms
       max_backoff: 10s
       min_shards: 16
       max_shards: 16
remote_read:
  - url: "http://127.0.0.1:9201/read"
    read_recent: true

重新启动后查看日志

代码语言:javascript
复制
ts=2022-03-03T01:35:28.123Z caller=main.go:1128 level=info msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
ts=2022-03-03T01:35:28.137Z caller=dedupe.go:112 component=remote level=info remote_name=797d34 url=http://127.0.0.1:9201/write msg="Starting WAL watcher" queue=797d34
ts=2022-03-03T01:35:28.138Z caller=dedupe.go:112 component=remote level=info remote_name=797d34 url=http://127.0.0.1:9201/write msg="Starting scraped metadata watcher"
ts=2022-03-03T01:35:28.277Z caller=dedupe.go:112 component=remote level=info remote_name=797d34 url=http://127.0.0.1:9201/write msg="Replaying WAL" queue=797d34
ts=2022-03-03T01:35:38.177Z caller=main.go:1165 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=10.053377011s db_storage=1.82µs remote_storage=13.752341ms web_handler=549ns query_engine=839ns scrape=10.038744417s scrape_sd=44.249µs notify=41.342µs notify_sd=6.871µs rules=30.465µs
ts=2022-03-03T01:35:38.177Z caller=main.go:896 level=info msg="Server is ready to receive web requests."
ts=2022-03-03T01:35:53.584Z caller=dedupe.go:112 component=remote level=info remote_name=797d34 url=http://127.0.0.1:9201/write msg="Done replaying WAL" duration=25.446317635s

查看数据

代码语言:javascript
复制
[root@localhost data]# docker exec -it timescaledb  sh
/ # su - postgres
timescaledb:~$ psql
psql (13.4)
Type "help" for help.

postgres=#

我们查询一个过去五分钟 io 的指标

查询指标

代码语言:javascript
复制
SELECT * from node_disk_io_now
WHERE time > now() - INTERVAL '5 minutes';
            time            | value | series_id |    labels     | device_id | instance_id | job_id
----------------------------+-------+-----------+---------------+-----------+-------------+--------
 2022-03-02 21:03:58.373-05 |     0 |       348 | {51,140,91,3} |       140 |          91 |      3
 2022-03-02 21:04:28.373-05 |     0 |       348 | {51,140,91,3} |       140 |          91 |      3
 2022-03-02 21:04:58.373-05 |     0 |       348 | {51,140,91,3} |       140 |          91 |      3
 2022-03-02 21:05:28.373-05 |     0 |       348 | {51,140,91,3} |       140 |          91 |      3
 2022-03-02 21:05:58.376-05 |     0 |       348 | {51,140,91,3} |       140 |          91 |      3
 2022-03-02 21:06:28.373-05 |     0 |       348 | {51,140,91,3} |       140 |          91 |      3
 2022-03-02 21:06:58.373-05 |     0 |       348 | {51,140,91,3} |       140 |          91 |      3
 2022-03-02 21:07:28.373-05 |     0 |       348 | {51,140,91,3} |       140 |          91 |      3
 2022-03-02 21:07:58.373-05 |     0 |       348 | {51,140,91,3} |       140 |          91 |      3
 2022-03-02 21:08:28.373-05 |     0 |       348 | {51,140,91,3} |       140 |          91 |      3
 2022-03-02 21:03:58.373-05 |     0 |       349 | {51,252,91,3} |       252 |          91 |      3
 2022-03-02 21:04:28.373-05 |     0 |       349 | {51,252,91,3} |       252 |          91 |      3
 2022-03-02 21:04:58.373-05 |     0 |       349 | {51,252,91,3} |       252 |          91 |      3
 2022-03-02 21:05:28.373-05 |     0 |       349 | {51,252,91,3} |       252 |          91 |      3
 2022-03-02 21:05:58.376-05 |     0 |       349 | {51,252,91,3} |       252 |          91 |      3
 2022-03-02 21:06:28.373-05 |     0 |       349 | {51,252,91,3} |       252 |          91 |      3
 2022-03-02 21:06:58.373-05 |     0 |       349 | {51,252,91,3} |       252 |          91 |      3
 2022-03-02 21:07:28.373-05 |     0 |       349 | {51,252,91,3} |       252 |          91 |      3
 2022-03-02 21:07:58.373-05 |     0 |       349 | {51,252,91,3} |       252 |          91 |      3
 2022-03-02 21:03:58.373-05 |     0 |       350 | {51,253,91,3} |       253 |          91 |      3
 2022-03-02 21:04:28.373-05 |     0 |       350 | {51,253,91,3} |       253 |          91 |      3
 2022-03-02 21:04:58.373-05 |     0 |       350 | {51,253,91,3} |       253 |          91 |      3
 2022-03-02 21:05:28.373-05 |     0 |       350 | {51,253,91,3} |       253 |          91 |      3
 2022-03-02 21:05:58.376-05 |     0 |       350 | {51,253,91,3} |       253 |          91 |      3
 2022-03-02 21:06:28.373-05 |     0 |       350 | {51,253,91,3} |       253 |          91 |      3

在进行一次聚合查询

标签键的查询值

每个标签键都扩展为自己的列,该列将外键标识符存储为其值。这允许JOIN按标签键和值进行聚合和过滤。

要检索由标签 ID 表示的文本,可以使用该val(field_id) 函数。这使你可以使用特定的标签键对所有系列进行聚合等操作。

例如,要查找指标的中值node_disk_io_now,按与其关联的工作分组:

代码语言:javascript
复制
SELECT
    val(job_id) as job,
    percentile_cont(0.5) within group (order by value) AS median
FROM
    node_disk_io_now
WHERE
    time > now() - INTERVAL '5 minutes'
GROUP BY job_id;

如下

代码语言:javascript
复制
postgres=# SELECT
postgres-#     val(job_id) as job,
postgres-#     percentile_cont(0.5) within group (order by value) AS median
postgres-# FROM
postgres-#     node_disk_io_now
postgres-# WHERE
postgres-#     time > now() - INTERVAL '5 minutes'
postgres-# GROUP BY job_id;
    job     | median
------------+--------
 prometheus |      0
(1 row)

查询指标的标签集

任何度量行中的labels字段表示与测量相关的完整标签集。它表示为标识符数组。要以 JSON 格式返回整个标签集,你可以使用该jsonb()函数,如下所示:

代码语言:javascript
复制
SELECT
    time, value, jsonb(labels) as labels
FROM
    node_disk_io_now
WHERE
    time > now() - INTERVAL '5 minutes';

如下

代码语言:javascript
复制
            time            | value |                                                labels
----------------------------+-------+-------------------------------------------------------------------------------------------------------
 2022-03-02 21:09:58.373-05 |     0 | {"job": "prometheus", "device": "dm-0", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:10:28.373-05 |     0 | {"job": "prometheus", "device": "dm-0", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:10:58.373-05 |     0 | {"job": "prometheus", "device": "dm-0", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:11:28.373-05 |     0 | {"job": "prometheus", "device": "dm-0", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:11:58.373-05 |     0 | {"job": "prometheus", "device": "dm-0", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:12:28.373-05 |     0 | {"job": "prometheus", "device": "dm-0", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:12:58.373-05 |     0 | {"job": "prometheus", "device": "dm-0", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:13:28.373-05 |     0 | {"job": "prometheus", "device": "dm-0", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:13:58.373-05 |     0 | {"job": "prometheus", "device": "dm-0", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:14:28.373-05 |     0 | {"job": "prometheus", "device": "dm-0", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:09:58.373-05 |     0 | {"job": "prometheus", "device": "dm-1", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:10:28.373-05 |     0 | {"job": "prometheus", "device": "dm-1", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:10:58.373-05 |     0 | {"job": "prometheus", "device": "dm-1", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:11:28.373-05 |     0 | {"job": "prometheus", "device": "dm-1", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:11:58.373-05 |     0 | {"job": "prometheus", "device": "dm-1", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:12:28.373-05 |     0 | {"job": "prometheus", "device": "dm-1", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:12:58.373-05 |     0 | {"job": "prometheus", "device": "dm-1", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:13:28.373-05 |     0 | {"job": "prometheus", "device": "dm-1", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:13:58.373-05 |     0 | {"job": "prometheus", "device": "dm-1", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:09:58.373-05 |     0 | {"job": "prometheus", "device": "sda", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:10:28.373-05 |     0 | {"job": "prometheus", "device": "sda", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:10:58.373-05 |     0 | {"job": "prometheus", "device": "sda", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:11:28.373-05 |     0 | {"job": "prometheus", "device": "sda", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:11:58.373-05 |     0 | {"job": "prometheus", "device": "sda", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}
 2022-03-02 21:12:28.373-05 |     0 | {"job": "prometheus", "device": "sda", "__name__": "node_disk_io_now", "instance": "127.0.0.1:9100"}

查询 node_disk_info

代码语言:javascript
复制
postgres=# SELECT * FROM prom_series.node_disk_info;
 series_id |         labels         | device |    instance    |    job     | major | minor
-----------+------------------------+--------+----------------+------------+-------+-------
       250 | {150,140,91,3,324,325} | dm-0   | 127.0.0.1:9100 | prometheus | 253   | 0
       439 | {150,253,91,3,508,325} | sda    | 127.0.0.1:9100 | prometheus | 8     | 0
       440 | {150,258,91,3,507,325} | sr0    | 127.0.0.1:9100 | prometheus | 11    | 0
       516 | {150,252,91,3,324,564} | dm-1   | 127.0.0.1:9100 | prometheus | 253   | 1
(4 rows)

带标签查询

代码语言:javascript
复制
SELECT
  jsonb(labels) as labels,
  value
FROM node_disk_info
WHERE time < now();
Results:
labels                                                               | value
-----------------------------------------------------------------------------------------------------------------------------------+-------
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |   NaN
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1
 {"job": "prometheus", "major": "253", "minor": "0", "device": "dm-0", "__name__": "node_disk_info", "instance": "127.0.0.1:9100"} |     1

通过命令进行查看她的指标视图

代码语言:javascript
复制
postgres=# \d+ node_disk_info
                                View "prom_metric.node_disk_info"
   Column    |           Type           | Collation | Nullable | Default | Storage  | Description
-------------+--------------------------+-----------+----------+---------+----------+-------------
 time        | timestamp with time zone |           |          |         | plain    |
 value       | double precision         |           |          |         | plain    |
 series_id   | bigint                   |           |          |         | plain    |
 labels      | label_array              |           |          |         | extended |
 device_id   | integer                  |           |          |         | plain    |
 instance_id | integer                  |           |          |         | plain    |
 job_id      | integer                  |           |          |         | plain    |
 major_id    | integer                  |           |          |         | plain    |
 minor_id    | integer                  |           |          |         | plain    |
View definition:
 SELECT data."time",
    data.value,
    data.series_id,
    series.labels,
    series.labels[2] AS device_id,
    series.labels[3] AS instance_id,
    series.labels[4] AS job_id,
    series.labels[5] AS major_id,
    series.labels[6] AS minor_id
   FROM prom_data.node_disk_info data
     LEFT JOIN prom_data_series.node_disk_info series ON series.id = data.series_id;

更多的查询,可以查看官网的教程[6]

计划删除

promscale 里是 pg 里配删除计划的是 90 天删除

通过SELECT * FROM prom_info.metric;查看

我们可以通过调整来修改

TimescaleDB 包括一个后台作业调度框架,用于自动化数据管理任务,例如启用简单的数据保留策略。

为了添加这样的数据保留策略,数据库管理员可以创建、删除或更改导致drop_chunks根据某个定义的计划自动执行的策略。

要在超表上添加这样的策略,不断导致超过 24 小时的块被删除,只需执行以下命令:

代码语言:javascript
复制
SELECT add_retention_policy('conditions', INTERVAL '24 hours');

随后删除该策略:

代码语言:javascript
复制
SELECT remove_retention_policy('conditions');

调度程序框架还允许查看已调度的作业:

代码语言:javascript
复制
SELECT * FROM timescaledb_information.job_stats;

创建数据保留策略以丢弃超过 6 个月的数据块:

代码语言:javascript
复制
SELECT add_retention_policy('conditions', INTERVAL '6 months');

复制

使用基于整数的时间列创建数据保留策略:

代码语言:javascript
复制
SELECT add_retention_policy('conditions', BIGINT '600000');

我们可以调整 prometheus 的数据,参考Data Retention[7]

代码语言:javascript
复制
SELECT
set_default_retention_period(180 * INTERVAL '1 day')

如下

代码语言:javascript
复制
postgres=# SELECT
postgres-# set_default_retention_period(180 * INTERVAL '1 day');
 set_default_retention_period
------------------------------
 t
(1 row)

已经修改为 180 天

我们打开 prometheus 和 grafana 都可以正常的查看

grafana

关于当前版本的压测

在 slack 上,有一个朋友做了压测:

Hello everyone, I am doing a promscale+timescaledb performance test with 1 promscale(8cpu 32GB memory), 1 timescaledb(postgre12.9+timescale2.5.2 with 16cpu 32G mem), 1 prometheus(8cpu 32G mem), simulate 2500 node_exporters( 1000 metrics/min * 2500 = 2.5 million metrics/min ) . But it seams not stable,

做一个 promscale+timescaledb 性能测试,1 个 promscale(8cpu 32GB 内存),1 个 timescaledb(postgre12.9+timescale2.5.2 with 16cpu 32G mem),1 个 prometheus(8cpu 32G mem),模拟 2500 个 node_exporters(1000 指标/分钟 * 2500 = 250 万指标/分钟)。但它的接缝不稳定,异常如下

代码语言:javascript
复制
there are warninigs in prometheus:
level=info ts=2022-03-01T12:36:33.365Z caller=throughput.go:76 msg="ingestor throughput" samples/sec=35000 metrics-max-sent-ts=2022-03-01T11:21:48.129Z
level=info ts=2022-03-01T12:36:34.365Z caller=throughput.go:76 msg="ingestor throughput" samples/sec=35000 metrics-max-sent-ts=2022-03-01T11:21:48.129Z
level=warn ts=2022-03-01T12:36:34.482Z caller=watcher.go:101 msg="[WARNING] Ingestion is a very long time" duration=5m9.705407837s threshold=1m0s
level=info ts=2022-03-01T12:36:35.365Z caller=throughput.go:76 msg="ingestor throughput" samples/sec=35000 metrics-max-sent-ts=2022-03-01T11:21:48.129Z
level=info ts=2022-03-01T12:36:40.365Z caller=throughput.go:76 msg="ingestor throughput" samples/sec=70000 metrics-max-sent-ts=2022-03-01T11:21:48.129Z
and errors in prometheus:
Mar 01 20:38:55 localhost start-prometheus.sh[887]: ts=2022-03-01T12:38:55.288Z caller=dedupe.go:112 component=remote level=warn remote_name=ceec38 url=http://192.168.105.76:9201/write msg="Failed to send batch, retrying" err="Post \"http://192.168.105.76:9201/write\": context deadline exceeded"
any way to increase the thoughput at current configuration?

他们推荐使用

代码语言:javascript
复制
 remote_write:
    - url: "http://promscale:9201/write"
      write_relabel_configs:
      - source_labels: [__name__]
        regex: '.*:.*'
        action: drop
      remote_timeout: 100s
      queue_config:
        capacity: 500000
        max_samples_per_send: 50000
        batch_send_deadline: 30s
        min_backoff: 100ms
        max_backoff: 10s
        min_shards: 16
        max_shards: 16

随后将配置 pg 更改为 14.2 和 remote_write 设置,将 promscale mem 增加到 32G,仍然不稳定。

请注意,这是 2500 台的节点压测,那么,这个朋友的测试可以看到至少在目前看来,promscale 的开发版本仍然是处于一个初期。官方并没有进行可靠性压测 .我们期待未来的稳定版本

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2022-03-21,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 云原生生态圈 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 编排
  • grafana
  • prometheus
  • 查看数据
    • 计划删除
    • 关于当前版本的压测
    相关产品与服务
    数据库
    云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档