thanos内核

原创

王磊-字节跳动

发布于 2019-12-29 23:20:29

2.5K0

发布于 2019-12-29 23:20:29

文章被收录于专栏：01ZOO

设计

组件

Sidecar: StoreAPI：查询 Prometheus 数据；Shipper 上传数据到（对象）存储
Store Gateway: StoreAPI：查询存储在（对象）存储上的指标数据.
Compactor: 对对象存储中的数据进行：压缩，降采样，期限管理 (retention).
Receiver: 从 Prometheus 的 remote-write WAL 获取数据，暴露或者上传到（对象）存储（图中未画出，是目前还在 beta 状态的实现，用Prometheus的 remote-write 接口实现）.
Ruler/Rule: 和 Prometheus中的 recording/alerting rules 相同，只不过目标数据是 Thanos中的数据，可以暴露查询或上传.
Querier/Query: 实现 Prometheus v1 API，他是一个 API 聚合层，查询底层数据源，并汇总去重.

上面的组件又可以按照作用分成三类：

Metric Sources
Stores
Queriers

Metric Sources

产生数据，目前只有两种 Prometheus sidecar 和 rule nodes.

┌────────────┬─────────┐         ┌────────────┬─────────┐     ┌─────────┐
│ Prometheus │ Sidecar │   ...   │ Prometheus │ Sidecar │     │   Rule  │
└────────────┴────┬────┘         └────────────┴────┬────┘     └┬────────┘
                  │                                │           │
                Blocks                           Blocks      Blocks
                  │                                │           │
                  v                                v           v
              ┌──────────────────────────────────────────────────┐
              │                   Object Storage                 │
              └──────────────────────────────────────────────────┘

Stores

这里 store 指 store gateway，转换用户的 metrics 请求为对象存储 API，It implements various strategies to minimize the number of requests to the object storage such as filtering relevant blocks by their metadata (e.g. time range and labels) and caching frequent index lookups.

Store 和 Data Sources 一样, 实现了相同的 gRPC Store API，所以对 client 来说，他们是一样的，无需特殊处理。同时各种 Store API 的实现者会提供他们存储数据的 meta information, 这使得客户端可以最小化他们的查询目标。

┌──────────────────────┐  ┌────────────┬─────────┐   ┌────────────┐
│ Google Cloud Storage │  │ Prometheus │ Sidecar │   │    Rule    │
└─────────────────┬────┘  └────────────┴────┬────┘   └─┬──────────┘
                  │                         │          │
         Block File Ranges                  │          │
                  │                     Store API      │
                  v                         │          │
                ┌──────────────┐            │          │
                │     Store    │            │      Store API
                └────────┬─────┘            │          │
                         │                  │          │
                     Store API              │          │
                         │                  │          │
                         v                  v          v
                       ┌──────────────────────────────────┐
                       │              Client              │
                       └──────────────────────────────────┘

Query Layer

无状态，自动发现 store，查询数据。Based on the metadata of store and source nodes, they attempt to minimize the request fanout to fetch data for a particular query.

┌──────────────────┐  ┌────────────┬─────────┐   ┌────────────┐
│    Store Node    │  │ Prometheus │ Sidecar │   │    Rule    │
└─────────────┬────┘  └────────────┴────┬────┘   └─┬──────────┘
              │                         │          │
              │                         │          │
              │                         │          │
              v                         v          v
        ┌─────────────────────────────────────────────────────┐
        │                      Query layer                    │
        └─────────────────────────────────────────────────────┘
                ^                  ^                  ^
                │                  │                  │
       ┌────────┴────────┐  ┌──────┴─────┐       ┌────┴───┐
       │ Alert Component │  │ Dashboards │  ...  │ Web UI │
       └─────────────────┘  └────────────┘       └────────┘

Compactor

Compactor 是一个可选的单独组件，不参与 Thanos 集群

实现

各种组件以 thanos 子命令的方式提供

// /thanos-io/thanos/cmd/thanos/main.go
registerSidecar(cmds, app)
registerStore(cmds, app)
registerQuery(cmds, app)
registerRule(cmds, app)
registerCompact(cmds, app)
registerBucket(cmds, app, "bucket")
registerDownsample(cmds, app)
registerReceive(cmds, app)
registerChecks(cmds, app, "check")

// 使用的第三方库
// github.com/golang/snappy - snappy 压缩算法s
// github.com/hashicorp/serf - Gossip-based Membership
// github.com/prometheus/prometheus - 指标,version,发现,promql等等
// google.golang.org/grpc - gprc
// github.com/go-kit/kit - log, sd, transport, ratelimit 等等
// github.com/oklog/run - goroutine
// gopkg.in/alecthomas/kingpin.v2 - cmd工具
// go.uber.org/automaxprocs/maxprocs - maxproc设置小工具
// github.com/opentracing/opentracing-go - tracing
// gopkg.in/check.v1；onsi/ginkgo；onsi/gomega；smartystreets/goconvey - testing
// github.com/fsnotify/fsnotify
// github.com/fortytw2/leaktest - leak test
// github.com/fatih/structtag - parsing and manipulating struct tag fields
// github.com/hashicorp/golang-lru
// github.com/mwitkow/go-conntrack - Go middleware for net.Conn tracking (Prometheus/trace)

Bucket

对（对象）存储中的 metrics 数据进行 ls/inspect/verify/repair

➜ ./thanos bucket --objstore.config-file=`pwd`/cos.yaml inspect
level=info ts=2019-12-29T08:43:27.744236Z caller=main.go:149 msg="Tracing will be disabled"
level=info ts=2019-12-29T08:43:27.744752Z caller=factory.go:43 msg="loading bucket configuration"
|            ULID            |        FROM         |        UNTIL        | RANGE  | UNTIL-DOWN | #SERIES | #SAMPLES  | #CHUNKS | COMP-LEVEL | COMP-FAILED |                                     LABELS                                      | RESOLUTION | SOURCE  |
|----------------------------|---------------------|---------------------|--------|------------|---------|-----------|---------|------------|-------------|---------------------------------------------------------------------------------|------------|---------|
| 01DWQ7AVJAST6B734CCBWJYXV0 | 22-12-2019 20:00:00 | 22-12-2019 22:00:00 | 2h0m0s | 38h0m0s    | 36,534  | 1,858,180 | 36,534  | 1          | false       | prometheus=monitoring/prometheus-2,prometheus_replica=prometheus-prometheus-2-0 | 0s         | sidecar |
| 01DWQ8TA11YH5AZ592JC83HGXP | 22-12-2019 22:00:00 | 23-12-2019 00:00:00 | 2h0m0s | 38h0m0s    | 38,277  | 8,785,813 | 74,002  | 1          | false       | prometheus=monitoring/prometheus-2,prometheus_replica=prometheus-prometheus-2-0 | 0s         | sidecar |

Check

tools for validation of Prometheus rules.

Compact

压缩，降采样，retention 工具，非并发安全，需要单独部署；需要本地盘存储中间数据

creating 5m downsampling for blocks larger than 40 hours (2d, 2w)
creating 1h downsampling for blocks larger than 10 days (2w).
downsampling 不是为了节约空间，二是为了提高查询效率 (不会减少空间，反而对一个 raw block会增加两个 block)
compact 内部使用 tsdb.NewLeveledCompactor 进行压缩
downsample 是降低采样 thanos-io/thanos/pkg/compact/downsample/downsample.go

# compact / downsample 后
➜ ./thanos bucket --objstore.config-file=`pwd`/cos.yaml inspect
level=info ts=2019-12-29T09:02:54.011068Z caller=main.go:149 msg="Tracing will be disabled"
level=info ts=2019-12-29T09:02:54.011681Z caller=factory.go:43 msg="loading bucket configuration"
|            ULID            |        FROM         |        UNTIL        | RANGE  | UNTIL-DOWN | #SERIES |  #SAMPLES  | #CHUNKS | COMP-LEVEL | COMP-FAILED |                                     LABELS                                      | RESOLUTION |  SOURCE   |
|----------------------------|---------------------|---------------------|--------|------------|---------|------------|---------|------------|-------------|---------------------------------------------------------------------------------|------------|-----------|
| 01DX8E6DKHG0TEJQG85CPXEEEB | 22-12-2019 20:00:00 | 23-12-2019 00:00:00 | 4h0m0s | 36h0m0s    | 38,562  | 10,643,993 | 110,536 | 2          | false       | prometheus=monitoring/prometheus-2,prometheus_replica=prometheus-prometheus-2-0 | 0s         | compactor |
| 01DWQ97RAVF6ADC47073366595 | 22-12-2019 22:00:00 | 23-12-2019 00:00:00 | 2h0m0s | 38h0m0s    | 36,616  | 8,229,587  | 36,664  | 1          | false       | prometheus=monitoring/prometheus-1,prometheus_replica=prometheus-prometheus-1-0 | 0s         | sidecar   |
| 01DWQFP1915WVACNP2W187QWJ6 | 23-12-2019 00:00:00 | 23-12-2019 02:00:00 | 2h0m0s | 38h0m0s    | 36,625  | 8,793,602  | 73,282  | 1          | false       | prometheus=monitoring/prometheus-2,prometheus_replica=prometheus-prometheus-2-0 | 0s         | sidecar   |
| 01DX8E0MV53DH41VWNW4TJXXKN | 23-12-2019 00:00:00 | 23-12-2019 08:00:00 | 8h0m0s | 32h0m0s    | 36,625  | 35,169,407 | 293,112 | 2          | false       | prometheus=monitoring/prometheus-1,prometheus_replica=prometheus-prometheus-1-0 | 0s         | compactor |

Query

query 是 thanos 中核心组件之一，从 StoreAPIs 收集数据，实现 Prometheus HTTP v1 API 返回给 client.
他的最主要的作用是 提供全局视图(Global View), 和某些 exporter 的作用很类似，只不过他是在查询层，而不是在存储层做聚合
另一个作用是 Run-time deduplication of HA group, prometheus可以起多个，作为一个高可用架构，由 query 做数据的聚合和去重，（比如一个 prometheus 有宕机，query可以fill gap）
数据源支持:
- Prometheus (Sidecar)
- Object Storage (Store Gateway)
- Global alerting/recording rules evaluations (Ruler)
- Metrics received from Prometheus remote write streams (Thanos Receiver)
- Another Querier (you can stack Queriers on top of each other)
- Non-Prometheus systems. e.g OpenTSDB
- 笔者todo: 这里我们可以实现云上监控的 adapter, 作为 store api 的一种实现
除了compatible with Prometheus 2.x. API 之外增加了支持
- partial response behaviour 当其中一个 StoreAPI 出错或者超时的时候返回 warnings
- several additional parameters listed below
- custom response fields.
实现：
- 有 grpc 的 (proxy) store api, 会去发现的 store查询（storeMatches 时间+labelSetsMatch）
- Http 查询 api, 创建查询 engine，最后会形成 grpc 查询请求，查询本地的 grpc api
- dns provider：dns 发现 store

Store

在（对象）存储上实现 store api, 本地会定期同步 metadata, index. 实现在 bucket.go里面，同时实现了 store的 grpc service hanos-io/thanos/pkg/store/storepb/rpc.proto.

// BucketStore implements the store API backed by a bucket. It loads all index
// files to local disk.
type BucketStore struct {
    // ...略
	indexCache storecache.IndexCache

	// Sets of blocks that have the same labels. They are indexed by a hash over their label set.
	blocks    map[ulid.ULID]*bucketBlock
	blockSets map[uint64]*bucketBlockSet

	// samplesLimiter limits the number of samples per each Series() call.
	samplesLimiter *Limiter
	partitioner    partitioner
}

// Series implements the storepb.StoreServer interface.
// 省略了非核心代码, 本地有 cache(posting, series), 先查本地，查远程，存本地cache
func (s *BucketStore) Series(req *storepb.SeriesRequest, srv storepb.Store_SeriesServer) (err error) {
    
    // Concurrently get data from all blocks.
	for _, bs := range s.blockSets {
		blockMatchers, ok := bs.labelMatchers(matchers...)
		blocks := bs.getFor(req.MinTime, req.MaxTime, req.MaxResolutionWindow)
		for _, b := range blocks {
			// We must keep the readers open until all their data has been sent.
			indexr := b.indexReader(ctx)
			chunkr := b.chunkReader(ctx)

			g.Go(func() error {
				part, pstats, err := blockSeries(ctx, b.meta.ULID, b.meta.Thanos.Labels,
					indexr, chunkr, blockMatchers, req, s.samplesLimiter)
				res = append(res, part)
				return nil
			})
		}
	}

	// Merge the sub-results from each selected block.
	{
		// Merge series set into an union of all block sets. This exposes all blocks are single seriesSet.
		// Chunks of returned series might be out of order w.r.t to their time range.
		// This must be accounted for later by clients.
		set := storepb.MergeSeriesSets(res...)
		for set.Next() {
			var series storepb.Series
			series.Labels, series.Chunks = set.At()
			stats.mergedSeriesCount++
			stats.mergedChunksCount += len(series.Chunks)
			if err := srv.Send(storepb.NewSeriesResponse(&series)); err != nil {
				return status.Error(codes.Unknown, errors.Wrap(err, "send series response").Error())
			}
		}
	}
	return nil
}

Time based partitioning

store 默认会根据对象存储中的时间范围提供 metrcis; 但是也可以用 --min-time, --max-time 设置提供metrics的时间范围。
新数据不一定能马上被查询到，Time partitioning 每3分钟同步一次
建议 Thanos Store gateways 中的时间范围和Thanos Sidecar 由一定重合，以预防失败

Rule

You can think of Rule as a simplified Prometheus that does not require a sidecar and does not scrape and do PromQL evaluation

Sidecar

作用

实现 Store API 直接查询 promethues 数据
可选的每2小时上传 tsdb数据，这样promethues的 retention 可以设置得很短
这里的 Store API 实现刚好和 Query 相反，这里是把 grpc请求转换成 promethues 的http 请求
Shipper 实现在 thanos-io/thanos/pkg/shipper/shipper.go, 把每二小时形成的 block/meta.. 进行上传

Receiver

实现 promethues 的remote write 接口

参考

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

腾讯云可观测平台

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

腾讯云可观测平台

登录后参与评论

0 条评论

热度

thanos内核

thanos内核

设计

组件

Metric Sources

Stores

Query Layer

Compactor

实现

Bucket

Check

Compact

Query

Store

Time based partitioning

Rule

Sidecar

Receiver

参考

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐