打破瓶颈 Prometheus Remote Storage 实践

文章来源：企鹅号 - Caicloud

Prometheus 的设计者非常看重监控系统自身的稳定性，所以Prometheus 仅仅依赖了本地文件系统，而这就决定了Prometheus 自身并不适合存储长期数据。

“长期”具体是多久，需要根据具体的数据量和服务器资源来看。如果数据不过期，最先达到瓶颈的资源通常是内存，因为Prometheus 会将需要的 time series 都先读到内存，所以一个时间范围长，涉及 time series 非常多的 query 很容易触发 OOM。

为什么Prometheus 需要 Remote Storage

Prometheus 起初打算寻找一个合适的外部存储，但发现现有的时序数据库都不能很好地满足Prometheus 的要求。

详见Prometheus issue 史上的 #10：

https://github.com/prometheus/prometheus/issues/10

所以Prometheus 提供了 remote read 和 remote write 的接口，让用户自己去实现对接。

Prometheus 的 remote read 和 remote writes

Prometheus doc 中对Prometheus 与外部系统集成方式

Adapter 是一个中间组件，Prometheus 与Adapter 之间通过由Prometheus 定义的标准格式发送和接收数据。Adapter 与外部存储系统之间的通信可以自定义。目前Prometheus 和Adapter 之间通过 grpc 通信。Prometheus 将 samples 发送到Adapter。为了提高效率，samples 会在队列中先缓存，再打包发送给Adapter。而一个读请求中包含了 start_timestamp，end_timestamp 和 label_matchers，response 则包含所有 match 到的 time series 。也就是说，Prometheus 仅通过Adapter 来获取时间序列，进一步的处理都在Prometheus 中完成。

Prometheus v2.0.0 中 RemoteWriteConfig 结构

Prometheus v2.0.0 中 RemoteWriteConfig 的结构定义了数据发送给 Remote Storage 的方式。尽管在官方文档中 remote read 和 remote write 的配置还没有稳定，我们还是可以从代码中来一探究竟。HTTPClientConfig 可以用来配置 HTTP 相关的 auth 信息，proxy 方式，以及 tls。WriteRelabelConfigs 用在发送过程中对 timeseries 进行 relabel。QueueConfig 定义了发送队列的 batch size，queue 数量，发送失败时的重试次数与等待时间等参数。默认的 QueueConfig 如下:

默认的 QueueConfig

可以看到Prometheus 默认定义了 1000 个 queue，batch size 为 100，预期可以达到 1M samples/s 的发送速率。Prometheus 输出了一些 queue 相关的指标，例如 failed_samples_total, dropped_samples_total，如果这两个指标的 rate 大于 0，就需要说明Remote Storage出现了问题导致发送失败，或者队列满了导致 samples 被丢弃掉。

再来看看 RemoteReadConfig 结构:

RemoteReadConfig 结构

ReadRecent 如果为 false，Prometheus 会在处理查询时比较本地存储中最早的数据的 timestamp 与 query 的 start timestamp，如果发现需要的数据都在本地存储中，则会跳过对 Remote Storage 的查询。

这是一个比较重要的优化，详情可见 #3129：

https://github.com/prometheus/prometheus/pull/3129

Prometheus 与 Influxdb

Prometheus 与 Influxdb 之间的数据格式转化很方便，所以Prometheus 与Influxdb 的对接也是比价简单的。Influxdb 官方提供了用来对接Prometheus 的 read 和 write api，所以Adapter 可以去掉。遗憾的是Influxdb 集群不再开源。所以本文中也就没有过多去探究Influxdb。

Read and Write api:

https://github.com/influxdata/influxdb/pull/8784

Prometheus 与 Opentsdb

Opentsdb 是一个基于 hbase 的分布式时序数据库。它的一大优势便是长期保存大量数据，并且能够水平扩展。本文中使用的Opentsdb 版本是 v2.3.0。

Opentsdb 中的 sample 格式为：

将 Prometheus 的数据写入Opentsdb 需要注意以下几点:

Prometheus 的 remote_storage_adapter 不支持从Opentsdb 中读取数据。为了查询Opentsdb 中存储的数据，可以直接使用 grafana。

下图是在 grafana 中分别从Opentsdb 和Prometheus 查询同样的指标得到的结果：

我们也可以自己实现一个Adapter，以支持Prometheus 从Opentsdb 直接读取数据。根据前面对Prometheus read 协议的描述，只需要实现 “=”, “!=”, “=~”, “!~” 这四种 matcher。”=” 和 “!=” 可以转化为Opentsdb 中的 “literal_or” filter。而 “=~” 和 “!~” 没有办法直接转化成 filter，只能先转化成 match all, 从Opentsdb 中查出数据然后再过滤（这样可能会导致 OOM，但是一般来说还有其它 filter，加上 downsample，可以让返回的数据量不至于过大）。

下面是两个Prometheus：

一个仅从 local storage 读取数据（同时向Opentsdb 写数据）；

另一个仅从Opentsdb 读数据，执行相同的查询得到的结果对比。

可以看到从 remoteread 的查询速度相对较慢，但结果是基本一致的：

prometheus 从 local storage 读取数据

Prometheus 从 Remote Storage 读取数据

用Opentsdb 来作为Prometheus 的长期存储可以说是一个比较可靠的方案。另外有许多其它的时序数据库也提供了对Prometheus 的集成，详见：

https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage

关于Opentsdb 的 schema，可参考：

http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html

https://yq.aliyun.com/articles/54785

本文作者：Caicloud 软件工程师蔡通

-END-

发表于: 2018-01-092018-01-09 17:58:39
原文链接：http://kuaibao.qq.com/s/20180109G0NC1S00?refer=cp_1026
腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
如有侵权，请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长进交流群

领取专属 10元无门槛券

私享最新 技术干货

打破瓶颈 Prometheus Remote Storage 实践

相关快讯

扫码

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐