上面的组件又可以按照作用分成三类:
产生数据,目前只有两种 Prometheus sidecar 和 rule nodes.
┌────────────┬─────────┐ ┌────────────┬─────────┐ ┌─────────┐
│ Prometheus │ Sidecar │ ... │ Prometheus │ Sidecar │ │ Rule │
└────────────┴────┬────┘ └────────────┴────┬────┘ └┬────────┘
│ │ │
Blocks Blocks Blocks
│ │ │
v v v
┌──────────────────────────────────────────────────┐
│ Object Storage │
└──────────────────────────────────────────────────┘
这里 store 指 store gateway,转换用户的 metrics 请求为 对象存储 API,It implements various strategies to minimize the number of requests to the object storage such as filtering relevant blocks by their metadata (e.g. time range and labels) and caching frequent index lookups.
Store 和 Data Sources 一样, 实现了相同的 gRPC Store API,所以对 client 来说,他们是一样的,无需特殊处理。同时各种 Store API 的实现者会提供他们存储数据的
meta information
, 这使得客户端可以最小化他们的查询目标。
┌──────────────────────┐ ┌────────────┬─────────┐ ┌────────────┐
│ Google Cloud Storage │ │ Prometheus │ Sidecar │ │ Rule │
└─────────────────┬────┘ └────────────┴────┬────┘ └─┬──────────┘
│ │ │
Block File Ranges │ │
│ Store API │
v │ │
┌──────────────┐ │ │
│ Store │ │ Store API
└────────┬─────┘ │ │
│ │ │
Store API │ │
│ │ │
v v v
┌──────────────────────────────────┐
│ Client │
└──────────────────────────────────┘
无状态,自动发现 store,查询数据。Based on the metadata of store and source nodes, they attempt to minimize the request fanout to fetch data for a particular query.
┌──────────────────┐ ┌────────────┬─────────┐ ┌────────────┐
│ Store Node │ │ Prometheus │ Sidecar │ │ Rule │
└─────────────┬────┘ └────────────┴────┬────┘ └─┬──────────┘
│ │ │
│ │ │
│ │ │
v v v
┌─────────────────────────────────────────────────────┐
│ Query layer │
└─────────────────────────────────────────────────────┘
^ ^ ^
│ │ │
┌────────┴────────┐ ┌──────┴─────┐ ┌────┴───┐
│ Alert Component │ │ Dashboards │ ... │ Web UI │
└─────────────────┘ └────────────┘ └────────┘
Compactor 是一个可选的单独组件,不参与 Thanos 集群
各种组件以 thanos 子命令的方式提供
// /thanos-io/thanos/cmd/thanos/main.go
registerSidecar(cmds, app)
registerStore(cmds, app)
registerQuery(cmds, app)
registerRule(cmds, app)
registerCompact(cmds, app)
registerBucket(cmds, app, "bucket")
registerDownsample(cmds, app)
registerReceive(cmds, app)
registerChecks(cmds, app, "check")
// 使用的第三方库
// github.com/golang/snappy - snappy 压缩算法s
// github.com/hashicorp/serf - Gossip-based Membership
// github.com/prometheus/prometheus - 指标,version,发现,promql等等
// google.golang.org/grpc - gprc
// github.com/go-kit/kit - log, sd, transport, ratelimit 等等
// github.com/oklog/run - goroutine
// gopkg.in/alecthomas/kingpin.v2 - cmd工具
// go.uber.org/automaxprocs/maxprocs - maxproc设置小工具
// github.com/opentracing/opentracing-go - tracing
// gopkg.in/check.v1;onsi/ginkgo;onsi/gomega;smartystreets/goconvey - testing
// github.com/fsnotify/fsnotify
// github.com/fortytw2/leaktest - leak test
// github.com/fatih/structtag - parsing and manipulating struct tag fields
// github.com/hashicorp/golang-lru
// github.com/mwitkow/go-conntrack - Go middleware for net.Conn tracking (Prometheus/trace)
对(对象)存储中的 metrics 数据进行 ls/inspect/verify/repair
➜ ./thanos bucket --objstore.config-file=`pwd`/cos.yaml inspect
level=info ts=2019-12-29T08:43:27.744236Z caller=main.go:149 msg="Tracing will be disabled"
level=info ts=2019-12-29T08:43:27.744752Z caller=factory.go:43 msg="loading bucket configuration"
| ULID | FROM | UNTIL | RANGE | UNTIL-DOWN | #SERIES | #SAMPLES | #CHUNKS | COMP-LEVEL | COMP-FAILED | LABELS | RESOLUTION | SOURCE |
|----------------------------|---------------------|---------------------|--------|------------|---------|-----------|---------|------------|-------------|---------------------------------------------------------------------------------|------------|---------|
| 01DWQ7AVJAST6B734CCBWJYXV0 | 22-12-2019 20:00:00 | 22-12-2019 22:00:00 | 2h0m0s | 38h0m0s | 36,534 | 1,858,180 | 36,534 | 1 | false | prometheus=monitoring/prometheus-2,prometheus_replica=prometheus-prometheus-2-0 | 0s | sidecar |
| 01DWQ8TA11YH5AZ592JC83HGXP | 22-12-2019 22:00:00 | 23-12-2019 00:00:00 | 2h0m0s | 38h0m0s | 38,277 | 8,785,813 | 74,002 | 1 | false | prometheus=monitoring/prometheus-2,prometheus_replica=prometheus-prometheus-2-0 | 0s | sidecar |
tools for validation of Prometheus rules.
压缩,降采样,retention 工具,非并发安全,需要单独部署;需要本地盘存储中间数据
tsdb.NewLeveledCompactor
进行压缩thanos-io/thanos/pkg/compact/downsample/downsample.go
# compact / downsample 后
➜ ./thanos bucket --objstore.config-file=`pwd`/cos.yaml inspect
level=info ts=2019-12-29T09:02:54.011068Z caller=main.go:149 msg="Tracing will be disabled"
level=info ts=2019-12-29T09:02:54.011681Z caller=factory.go:43 msg="loading bucket configuration"
| ULID | FROM | UNTIL | RANGE | UNTIL-DOWN | #SERIES | #SAMPLES | #CHUNKS | COMP-LEVEL | COMP-FAILED | LABELS | RESOLUTION | SOURCE |
|----------------------------|---------------------|---------------------|--------|------------|---------|------------|---------|------------|-------------|---------------------------------------------------------------------------------|------------|-----------|
| 01DX8E6DKHG0TEJQG85CPXEEEB | 22-12-2019 20:00:00 | 23-12-2019 00:00:00 | 4h0m0s | 36h0m0s | 38,562 | 10,643,993 | 110,536 | 2 | false | prometheus=monitoring/prometheus-2,prometheus_replica=prometheus-prometheus-2-0 | 0s | compactor |
| 01DWQ97RAVF6ADC47073366595 | 22-12-2019 22:00:00 | 23-12-2019 00:00:00 | 2h0m0s | 38h0m0s | 36,616 | 8,229,587 | 36,664 | 1 | false | prometheus=monitoring/prometheus-1,prometheus_replica=prometheus-prometheus-1-0 | 0s | sidecar |
| 01DWQFP1915WVACNP2W187QWJ6 | 23-12-2019 00:00:00 | 23-12-2019 02:00:00 | 2h0m0s | 38h0m0s | 36,625 | 8,793,602 | 73,282 | 1 | false | prometheus=monitoring/prometheus-2,prometheus_replica=prometheus-prometheus-2-0 | 0s | sidecar |
| 01DX8E0MV53DH41VWNW4TJXXKN | 23-12-2019 00:00:00 | 23-12-2019 08:00:00 | 8h0m0s | 32h0m0s | 36,625 | 35,169,407 | 293,112 | 2 | false | prometheus=monitoring/prometheus-1,prometheus_replica=prometheus-prometheus-1-0 | 0s | compactor |
StoreAPIs
收集数据,实现 Prometheus HTTP v1 API
返回给 client. 提供全局视图(Global View)
, 和某些 exporter 的作用很类似,只不过他是在查询层,而不是在存储层做聚合Run-time deduplication of HA group
, prometheus可以起多个,作为一个高可用架构,由 query 做数据的聚合和去重,(比如一个 prometheus 有宕机,query可以fill gap)在 (对象)存储上实现 store api, 本地会定期同步 metadata, index. 实现在 bucket.go
里面,同时实现了 store的 grpc service hanos-io/thanos/pkg/store/storepb/rpc.proto
.
// BucketStore implements the store API backed by a bucket. It loads all index
// files to local disk.
type BucketStore struct {
// ...略
indexCache storecache.IndexCache
// Sets of blocks that have the same labels. They are indexed by a hash over their label set.
blocks map[ulid.ULID]*bucketBlock
blockSets map[uint64]*bucketBlockSet
// samplesLimiter limits the number of samples per each Series() call.
samplesLimiter *Limiter
partitioner partitioner
}
// Series implements the storepb.StoreServer interface.
// 省略了非核心代码, 本地有 cache(posting, series), 先查本地,查远程,存本地cache
func (s *BucketStore) Series(req *storepb.SeriesRequest, srv storepb.Store_SeriesServer) (err error) {
// Concurrently get data from all blocks.
for _, bs := range s.blockSets {
blockMatchers, ok := bs.labelMatchers(matchers...)
blocks := bs.getFor(req.MinTime, req.MaxTime, req.MaxResolutionWindow)
for _, b := range blocks {
// We must keep the readers open until all their data has been sent.
indexr := b.indexReader(ctx)
chunkr := b.chunkReader(ctx)
g.Go(func() error {
part, pstats, err := blockSeries(ctx, b.meta.ULID, b.meta.Thanos.Labels,
indexr, chunkr, blockMatchers, req, s.samplesLimiter)
res = append(res, part)
return nil
})
}
}
// Merge the sub-results from each selected block.
{
// Merge series set into an union of all block sets. This exposes all blocks are single seriesSet.
// Chunks of returned series might be out of order w.r.t to their time range.
// This must be accounted for later by clients.
set := storepb.MergeSeriesSets(res...)
for set.Next() {
var series storepb.Series
series.Labels, series.Chunks = set.At()
stats.mergedSeriesCount++
stats.mergedChunksCount += len(series.Chunks)
if err := srv.Send(storepb.NewSeriesResponse(&series)); err != nil {
return status.Error(codes.Unknown, errors.Wrap(err, "send series response").Error())
}
}
}
return nil
}
--min-time
, --max-time
设置提供metrics的时间范围。You can think of Rule as a simplified Prometheus that does not require a sidecar and does not scrape and do PromQL evaluation
作用
thanos-io/thanos/pkg/shipper/shipper.go
, 把每二小时形成的 block/meta.. 进行上传实现 promethues 的remote write 接口
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。