前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >go-kit 微服务 整合Promtheus解决监控告警问题

go-kit 微服务 整合Promtheus解决监控告警问题

原创
作者头像
Johns
修改2022-06-30 10:32:52
1.3K0
修改2022-06-30 10:32:52
举报
文章被收录于专栏:代码工具代码工具

一. 介绍

Prometheus是一个开源的完整的监控告警解决方案。

传统的监控告警模型往往存在一下问题:

  • 与业务脱离的监控:监控系统获取到的监控指标与业务本身也是一种分离的关系。好比客户可能关注的是服务的可用性、服务的SLA等级,而监控系统却只能根据系统负载去产生告警;
  • 运维管理难度大:系统需要有专业的人员进行安装,配置和管理;
  • 可扩展性低: 监控系统自身难以扩展,以适应监控规模的变化;
  • 问题定位难度大:指标对于业务来说是一个黑盒且过于简单,出现问题无法直接定位问题的源头。

Prometheus基于中央化的规则计算、统一分析和告警的新模型, 完美地解决了传统监控模型的痛点。所以说其对传统监控系统的测试和告警模型进行了彻底的颠覆。


二. 指标

一般对于一个服务来说主要关注以下几类指标:

  • 请求速率
  • 请求量
  • 请求时延
  • 错误数
  • 使用率

为了能够帮助用户理解和区分这些不同监控指标之间的差异,Prometheus定义了4种不同的指标类型(metric type):Counter(计数器)、Gauge(仪表盘)、Histogram(直方图)、Summary(摘要)。

Counter(计数器)

Counter类型的指标其工作方式和计数器一样,只增不减(除非系统发生重置)。 一般在定义Counter类型指标的名称时推荐使用_total作为后缀。例入http_requests_total表示请求总量。

使用PromQL通过rate()函数获取HTTP请求量的请求速率

代码语言:txt
复制
rate(http_requests_total[1m])

Gauge(仪表盘)

与Counter不同,Gauge类型的指标侧重于反应系统的当前状态, 因此这类指标的样本数据可增可减。

一般的使用率和饱和度这种波动的指标我们会采用这类指标类定义,比如CPU的使用情况,内存占用情况等监控项。

对于一些波动很快的情况,很多时候我们不希望真的达到了阈值才被通知,这个时候通知可能服务已经宕机有段时间了。即使阈值设置的小,对于增长过快的指标也是可能没法及时通知的。所以PromQL内置了一个predict_linear()使用deriv()计算样本的线性回归模型,对数据的变化趋势进行预测从而实现预先通知的效果。例入对于一个线上服务,预测未来60分钟内存是否会被打爆.

代码语言:txt
复制
predict_linear(node_memory_available{}[1h], 3600)<=0

Histogram(直方图)

主用用于统计和分析样本的分布情况, 监控数据常需要服从正态分布。 一般使用这类指标计算请求时延,统计各个时延段的请求数,请求耗时。主要包含了一下3部分:

  1. 对每个采样点进行统计(并不是一段时间的统计),打到各个桶(bucket)中
  2. 对每个采样点值累计和(sum)
  3. 对采样点的次数累计和(count)

举例

假设现在有一个指标名称为microservece的柱状图, 则上面三部分的作用度量指标名称为:

代码语言:txt
复制
    microservice_bucket{le="上边界"}, 这个值为小于等于上边界的所有采样点数量
    microservice_sum
    microservice_count

在设置bucket=5,10,30的情况下,样本为{3.5, 4.1, 8.0, 9.2, 12.1, 15.3, 51.0, 65.5 }

histogram的最终结果bucket计数是向下包含的):

代码语言:txt
复制
microservice_bucket{le="5"} = 2
microservice_bucket{le="10"} = 4
microservice_bucket{le="30"} = 6
microservice_bucket{le="+Inf"} = 8
microservice_count = 8
microservice_sum =168.7

histogram并不会保存数据采样点值,每个bucket只有个记录样本数的counter(float64),即histogram存储的是区间的样本数统计值,对客户端的性能消耗较小, 适合高并发的数据收集。

Summary(摘要)

Summary 用来收集服从正态分布的采样数据。Summary 中 quantile 实际上是正态分布中的分位点.

假设现在有一个指标名称为microservice的的摘要, 则上面三部分的作用度量指标名称为:

代码语言:txt
复制
    观察时间的φ-quantiles (0 ≤ φ ≤ 1), 显示为[microservice]{分位数="[φ]"}
    [microservice]_sum, 是指所有观察值的总和
    [microservice]_count, 是指已观察到的事件计数值

在设置quantile={0.5: 0.05, 0.85: 0.005, 0.95: 0.005, 0.99: 0.001}的情况下,样本为{3.5, 4.1, 8.0, 9.2, 12.1, 15.3, 51.0, 65.5 }

summary的最终结果quantile计数是向下包含的):

代码语言:txt
复制
    microservice{quantile="0.5"} = 32.11
    microservice{quantile="0.85"} = 39.83
    microservice{quantile="0.95"} = 40.93
    microservice{quantile="0.99"} = 42.92
    microservice_count = 8
    microservice_sum =168.7

Summary直接存储了 quantile 数据,而不是根据统计区间计算出来的, 所以summary就是解决百分位准确的问题而来的。因为每次数据进入都需要重新计算各个分位的值,内部需要一些锁操作防止数据并发出现不一致问题,对高并发程序性能存在一定影响,而且quantile在采集时就已经指定,所以只能使用已经指定的分位数的数据,使用上可能不太灵活,也不支持聚合操作,毕竟分位值是不能直接聚合的。


三. go-kit整合使用

step1. 定义服务接口和实现

  • service/service.go
代码语言:txt
复制
package service

import (
	"context"
	"errors"
	kitprometheus "github.com/go-kit/kit/metrics/prometheus"
	stdprometheus "github.com/prometheus/client_golang/prometheus"
	"go-kit-microservice/internal/pkg/comn"
	"go-kit-microservice/pb"
)

// service interface
type Service interface {
	Multiply(ctx context.Context, in *pb.MultiplyRequest) (errCode int32, resp *pb.MultiplyResponse, err error)
	Add(ctx context.Context, in *pb.AddRequest) (errCode int32, resp *pb.AddResponse, err error)
	Sub(ctx context.Context, in *pb.SubRequest) (errCode int32, resp *pb.SubResponse, err error)
	Div(ctx context.Context, in *pb.DivRequest) (errCode int32, resp *pb.DivResponse, err error)
}

// Service struct, has an implementation of Service interface
type baseService struct {
}

func NewService() Service {

	fieldKeys := []string{"method", "error_code"}
	requestCount := kitprometheus.NewCounterFrom(stdprometheus.CounterOpts{
		Namespace: "microservice",
		Subsystem: "calculate",
		Name:      "request_count",
		Help:      "Number of requests received.",
	}, fieldKeys)

	requestLatency := kitprometheus.NewHistogramFrom(stdprometheus.HistogramOpts{
		Namespace: "microservice",
		Subsystem: "calculate",
		Name:      "request_latency_microseconds",
		Help:      "Total duration of requests in milliseconds.",
		Buckets:   []float64{5, 10, 20, 30, 50, 80, 100, 120, 150, 200, 300, 500},
	}, fieldKeys)

	return &instrumentMiddleware{
		requestCount:   requestCount,
		requestLatency: requestLatency,
		next:           baseService{},
	}
}

func (s baseService) Multiply(ctx context.Context, in *pb.MultiplyRequest) (errCode int32, resp *pb.MultiplyResponse, err error) {
	return comn.SUCC.Code, &pb.MultiplyResponse{
		Res: in.A * in.B,
	}, nil
}

func (s baseService) Add(ctx context.Context, in *pb.AddRequest) (errCode int32, resp *pb.AddResponse, err error) {
	return comn.SUCC.Code, &pb.AddResponse{
		Res: in.A + in.B,
	}, nil
}

func (s baseService) Sub(ctx context.Context, in *pb.SubRequest) (errCode int32, response *pb.SubResponse, err error) {
	return comn.SUCC.Code, &pb.SubResponse{
		Res: in.A - in.B,
	}, nil
}

func (s baseService) Div(ctx context.Context, in *pb.DivRequest) (errCode int32, response *pb.DivResponse, err error) {
	if in.B == int64(0) {
		return comn.ErrRequestParamIllegal.Code, &pb.DivResponse{
			Code: comn.ErrRequestParamIllegal.Code,
			Msg:  comn.ErrRequestParamIllegal.Msg,
			Res:  0.000,
		}, errors.New(comn.ErrRequestParamIllegal.Msg)
	}

	return comn.SUCC.Code, &pb.DivResponse{
		Code: comn.SUCC.Code,
		Msg:  comn.SUCC.Msg,
		Res:  float32(in.A) / float32(in.B),
	}, nil
}

step2. 定义业务指标及其维度

  • service/middleware.go
代码语言:txt
复制
package service

import (
	"context"
	"fmt"
	"github.com/go-kit/kit/metrics"
	"go-kit-microservice/pb"
	"time"
)

type instrumentMiddleware struct {
	requestCount   metrics.Counter
	requestLatency metrics.Histogram
	next           Service
}

func (mw instrumentMiddleware) Multiply(ctx context.Context, req *pb.MultiplyRequest) (errCode int32, resp *pb.MultiplyResponse, err error) {
	defer func(begin time.Time) {
		lvs := []string{
			"method", "Multiply",
			"error_code", fmt.Sprintf("%d", errCode),
		}

		mw.requestCount.With(lvs...).Add(1)
		mw.requestLatency.With(lvs...).Observe(float64(time.Since(begin).Microseconds()))
	}(time.Now())
	return mw.next.Multiply(ctx, req)
}

func (mw instrumentMiddleware) Add(ctx context.Context, req *pb.AddRequest) (errCode int32, resp *pb.AddResponse, err error) {
	defer func(begin time.Time) {
		lvs := []string{
			"method", "Add",
			"error_code", fmt.Sprintf("%d", errCode),
		}

		mw.requestCount.With(lvs...).Add(1)
		mw.requestLatency.With(lvs...).Observe(float64(time.Since(begin).Microseconds()))
	}(time.Now())
	return mw.next.Add(ctx, req)
}

func (mw instrumentMiddleware) Sub(ctx context.Context, req *pb.SubRequest) (errCode int32, response *pb.SubResponse, err error) {
	defer func(begin time.Time) {
		lvs := []string{
			"method", "Sub",
			"error_code", fmt.Sprintf("%d", errCode),
		}

		mw.requestCount.With(lvs...).Add(1)
		mw.requestLatency.With(lvs...).Observe(float64(time.Since(begin).Microseconds()))
	}(time.Now())
	return mw.next.Sub(ctx, req)
}

func (mw instrumentMiddleware) Div(ctx context.Context, req *pb.DivRequest) (errCode int32, response *pb.DivResponse, err error) {
	defer func(begin time.Time) {
		lvs := []string{
			"method", "Div",
			"error_code", fmt.Sprintf("%d", errCode),
		}

		mw.requestCount.With(lvs...).Add(1)
		mw.requestLatency.With(lvs...).Observe(float64(time.Since(begin).Microseconds()))
	}(time.Now())
	return mw.next.Div(ctx, req)
}

step3. 常规的编解码和路由挂载

  • transport/http/http.go
代码语言:txt
复制
package http

import (
	"context"
	"encoding/json"
	"fmt"
	"github.com/go-kit/kit/endpoint"
	httptransport "github.com/go-kit/kit/transport/http"
	uuid "github.com/satori/go.uuid"
	endpoints "go-kit-microservice/internal/endpoint"
	log "go-kit-microservice/internal/pkg/log"
	"go-kit-microservice/internal/pkg/utils"
	"go-kit-microservice/pb"
	"io/ioutil"
	"net/http"
)

func MewHttpHandler(endpoints endpoints.EndPoints) http.Handler {
	options := []httptransport.ServerOption{

		// Unified exception handling
		httptransport.ServerErrorEncoder(errorEncoder),

		// before request set request id
		httptransport.ServerBefore(func(ctx context.Context, request *http.Request) context.Context {
			reqId := uuid.NewV5(uuid.NewV4(), "req_id").String()
			ctx = context.WithValue(ctx, utils.BaseRequestId, reqId)
			ctx = context.WithValue(ctx, utils.JwtTokenKey, request.Header.Get("Authorization"))
			return ctx
		}),
	}

	m := http.NewServeMux()
	m.Handle("/multiply", httptransport.NewServer(
		endpoints.MultiplyEndPoint,
		decodeMultiplyRequest,
		encodeMultiplyResponse,
		options...,
	))

	m.Handle("/add", httptransport.NewServer(
		endpoints.AddEndPoint,
		decodeAddRequest,
		encodeAddResponse,
		options...,
	))

	m.Handle("/sub", httptransport.NewServer(
		endpoints.SubEndPoint,
		decodeSubRequest,
		encodeSubResponse,
		options...,
	))

	m.Handle("/div", httptransport.NewServer(
		endpoints.DivEndPoint,
		decodeDivRequest,
		encodeDivResponse,
		options...,
	))

	return m
}

// decode test
func decodeMultiplyRequest(ctx context.Context, r *http.Request) (interface{}, error) {

	bs, err := ioutil.ReadAll(r.Body)
	if err != nil {
		return nil, err
	}

	fmt.Printf(string(bs))

	req := &pb.MultiplyRequest{}
	err = json.Unmarshal(bs, req)
	if err != nil {
		return nil, err
	}

	return req, nil
}

// encode the response data to user
func encodeMultiplyResponse(ctx context.Context, w http.ResponseWriter, response interface{}) error {
	if f, ok := response.(endpoint.Failer); ok && f.Failed() != nil {
		errorEncoder(ctx, f.Failed(), w)
		return nil
	}

	resp := response.(*pb.MultiplyResponse)
	w.Header().Set("Content-Type", "application/json; charset=utf-8")
	bs, err := resp.MarshalJSON()
	if err != nil {
		return err
	}
	w.Write(bs)
	return nil
}

// decode test
func decodeAddRequest(ctx context.Context, r *http.Request) (interface{}, error) {

	bs, err := ioutil.ReadAll(r.Body)
	if err != nil {
		return nil, err
	}

	fmt.Printf(string(bs))

	req := &pb.AddRequest{}
	err = json.Unmarshal(bs, req)
	if err != nil {
		return nil, err
	}

	return req, nil
}

// encode the response data to user
func encodeAddResponse(ctx context.Context, w http.ResponseWriter, response interface{}) error {
	if f, ok := response.(endpoint.Failer); ok && f.Failed() != nil {
		errorEncoder(ctx, f.Failed(), w)
		return nil
	}

	resp := response.(*pb.AddResponse)
	w.Header().Set("Content-Type", "application/json; charset=utf-8")
	bs, err := resp.MarshalJSON()
	if err != nil {
		return err
	}
	w.Write(bs)
	return nil
}

// decode test
func decodeSubRequest(ctx context.Context, r *http.Request) (interface{}, error) {

	bs, err := ioutil.ReadAll(r.Body)
	if err != nil {
		return nil, err
	}

	fmt.Printf(string(bs))

	req := &pb.SubRequest{}
	err = json.Unmarshal(bs, req)
	if err != nil {
		return nil, err
	}

	return req, nil
}

// encode the response data to user
func encodeSubResponse(ctx context.Context, w http.ResponseWriter, response interface{}) error {
	if f, ok := response.(endpoint.Failer); ok && f.Failed() != nil {
		errorEncoder(ctx, f.Failed(), w)
		return nil
	}

	resp := response.(*pb.SubResponse)
	w.Header().Set("Content-Type", "application/json; charset=utf-8")
	bs, err := resp.MarshalJSON()
	if err != nil {
		return err
	}
	w.Write(bs)
	return nil
}

// decode test
func decodeDivRequest(ctx context.Context, r *http.Request) (interface{}, error) {

	bs, err := ioutil.ReadAll(r.Body)
	if err != nil {
		return nil, err
	}

	fmt.Printf(string(bs))

	req := &pb.DivRequest{}
	err = json.Unmarshal(bs, req)
	if err != nil {
		return nil, err
	}

	return req, nil
}

// encode the response data to user
func encodeDivResponse(ctx context.Context, w http.ResponseWriter, response interface{}) error {
	if f, ok := response.(endpoint.Failer); ok && f.Failed() != nil {
		errorEncoder(ctx, f.Failed(), w)
		return nil
	}

	resp := response.(*pb.DivResponse)
	w.Header().Set("Content-Type", "application/json; charset=utf-8")
	bs, err := resp.MarshalJSON()
	if err != nil {
		return err
	}
	w.Write(bs)
	return nil
}

// error EncodeHandler
func errorEncoder(ctx context.Context, err error, w http.ResponseWriter) {
	w.WriteHeader(http.StatusOK)
	log.GetLogger().Error(err.Error())
	e := json.NewEncoder(w).Encode(errorWrapper{Error: err.Error()})
	if e != nil {
		log.GetLogger().Error("json encode failed: " + e.Error())
	}
}

type errorWrapper struct {
	Error string `json:"error"`
}

step4. 服务暴露

  • endpoint/endpoint.go
代码语言:txt
复制
package endpoint

import (
	"context"
	"github.com/go-kit/kit/endpoint"
	"go-kit-microservice/internal/service"
	"go-kit-microservice/pb"
)

type EndPoints struct {
	MultiplyEndPoint endpoint.Endpoint
	AddEndPoint      endpoint.Endpoint
	SubEndPoint      endpoint.Endpoint
	DivEndPoint      endpoint.Endpoint
}

func NewEndpoints(svc service.Service) EndPoints {
	multiplyEndpoint := makeMultiplyEndPoint(svc)
	addEndpoint := makeAddEndPoint(svc)
	subEndpoint := makeSubEndPoint(svc)
	divEndpoint := makeDivEndPoint(svc)

	return EndPoints{
		MultiplyEndPoint: multiplyEndpoint,
		AddEndPoint:      addEndpoint,
		SubEndPoint:      subEndpoint,
		DivEndPoint:      divEndpoint,
	}
}

func makeMultiplyEndPoint(s service.Service) endpoint.Endpoint {
	return func(ctx context.Context, request interface{}) (response interface{}, err error) {
		req := request.(*pb.MultiplyRequest)
		_, resp, _ := s.Multiply(ctx, req)
		return resp, nil
	}
}

func makeAddEndPoint(s service.Service) endpoint.Endpoint {
	return func(ctx context.Context, request interface{}) (response interface{}, err error) {
		req := request.(*pb.AddRequest)
		_, resp, _ := s.Add(ctx, req)
		return resp, nil
	}
}
func makeSubEndPoint(s service.Service) endpoint.Endpoint {
	return func(ctx context.Context, request interface{}) (response interface{}, err error) {
		req := request.(*pb.SubRequest)
		_, resp, _ := s.Sub(ctx, req)
		return resp, nil
	}
}
func makeDivEndPoint(s service.Service) endpoint.Endpoint {
	return func(ctx context.Context, request interface{}) (response interface{}, err error) {
		req := request.(*pb.DivRequest)
		_, resp, _ := s.Div(ctx, req)
		return resp, nil
	}
}

step5. 服务启动

  • cmd/main.go
代码语言:txt
复制
package main

import (
	"fmt"
	"github.com/gorilla/mux"
	"github.com/oklog/oklog/pkg/group"
	"github.com/prometheus/client_golang/prometheus/promhttp"
	"github.com/spf13/viper"
	"go-kit-microservice/internal/endpoint"
	"go-kit-microservice/internal/pkg/cfg"
	"go-kit-microservice/internal/pkg/log"
	"go-kit-microservice/internal/service"
	http2 "go-kit-microservice/internal/transport/http"
	"net"
	"net/http"
)

func main() {

	// init yaml config
	cfg.InitYmlConfig()

	// init logger
	log.InitLog()

	svc := service.NewService()

	endpoints := endpoint.NewEndpoints(svc)

	httpHandler := http2.MewHttpHandler(endpoints)

	promHandler := NewPromHandler()

	var g group.Group
	{
		// listen:10086
		httpListener, err := net.Listen("tcp", ":"+viper.GetString("server.port"))
		if err != nil {
			fmt.Println("http server start failed:" + err.Error())
		}

		g.Add(func() error {
			return http.Serve(httpListener, httpHandler)
		}, func(e error) {
			err = httpListener.Close()
			if err != nil {
				fmt.Println("http server close failed", err.Error())
			}
		})

	    // listen:10081
		httpListener2, err2 := net.Listen("tcp", ":"+viper.GetString("promServer.port"))
		g.Add(func() error {
			return http.Serve(httpListener2, promHandler)
		}, func(e error) {
			err2 = httpListener2.Close()
			if err2 != nil {
				fmt.Println("prom http server close failed", err2.Error())
			}
		})
	}

	log.GetLogger().Info("http server start success, port " + viper.GetString("server.port"))
	log.GetLogger().Info("prom server start success, port " + viper.GetString("promServer.port"))

	_ = g.Run()
}

func NewPromHandler() *mux.Router {
	r := mux.NewRouter()
	r.Handle("/metrics", promhttp.Handler())
	return r
}
  • 查看指标
代码语言:txt
复制
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.8084e-05
go_gc_duration_seconds{quantile="0.25"} 1.8084e-05
go_gc_duration_seconds{quantile="0.5"} 1.8084e-05
go_gc_duration_seconds{quantile="0.75"} 1.8084e-05
go_gc_duration_seconds{quantile="1"} 1.8084e-05
go_gc_duration_seconds_sum 1.8084e-05
go_gc_duration_seconds_count 1
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 11
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.15.6"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 3.155672e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 4.133568e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.445117e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 3689
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 2.1668426398222153e-07
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 4.638304e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 3.155672e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.2144512e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 4.34176e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 2660
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.1087744e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6486272e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.6178815423531399e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 6349
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 20832
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 32768
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 89488
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 98304
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 6.078704e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.667235e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 622592
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 622592
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.4990592e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 12
# HELP microservice_calculate_request_count Number of requests received.
# TYPE microservice_calculate_request_count counter
microservice_calculate_request_count{error_code="0",method="Add"} 2
microservice_calculate_request_count{error_code="0",method="Div"} 1
microservice_calculate_request_count{error_code="0",method="Multiply"} 1
microservice_calculate_request_count{error_code="0",method="Sub"} 1
microservice_calculate_request_count{error_code="4003",method="Div"} 4
# HELP microservice_calculate_request_latency_microseconds Total duration of requests in milliseconds.
# TYPE microservice_calculate_request_latency_microseconds histogram
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="5"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="10"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="20"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="30"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="50"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="80"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="100"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="120"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="150"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="200"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="300"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="500"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="+Inf"} 2
microservice_calculate_request_latency_microseconds_sum{error_code="0",method="Add"} 22
microservice_calculate_request_latency_microseconds_count{error_code="0",method="Add"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="5"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="10"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="20"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="30"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="50"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="80"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="100"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="120"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="150"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="200"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="300"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="500"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="+Inf"} 1
microservice_calculate_request_latency_microseconds_sum{error_code="0",method="Div"} 11
microservice_calculate_request_latency_microseconds_count{error_code="0",method="Div"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="5"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="10"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="20"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="30"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="50"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="80"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="100"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="120"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="150"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="200"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="300"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="500"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="+Inf"} 1
microservice_calculate_request_latency_microseconds_sum{error_code="0",method="Multiply"} 12
microservice_calculate_request_latency_microseconds_count{error_code="0",method="Multiply"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="5"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="10"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="20"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="30"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="50"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="80"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="100"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="120"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="150"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="200"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="300"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="500"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="+Inf"} 1
microservice_calculate_request_latency_microseconds_sum{error_code="0",method="Sub"} 12
microservice_calculate_request_latency_microseconds_count{error_code="0",method="Sub"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="5"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="10"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="20"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="30"} 3
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="50"} 3
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="80"} 3
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="100"} 4
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="120"} 4
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="150"} 4
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="200"} 4
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="300"} 4
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="500"} 4
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="+Inf"} 4
microservice_calculate_request_latency_microseconds_sum{error_code="4003",method="Div"} 138
microservice_calculate_request_latency_microseconds_count{error_code="4003",method="Div"} 4
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 3
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

四.可视化监控

1. 请求量(PromQL 表达式)

代码语言:txt
复制
-- 请求总量
sum(increase(microservice_calculate_request_count{}[5m]))
-- 接口请求总量
sum(increase(microservice_calculate_request_count{method="Add"}[5m]))

2. 请求异常数(PromQL 表达式)

代码语言:txt
复制
-- 异常数
sum(increase(microservice_calculate_request_count{error_code!="0"}[5m]))

-- 接口异常数
sum(increase(microservice_calculate_request_count{error_code!="0", method="Add"}[5m]))

-- 参数异常数,通过错误码区分各种类型的异常, 比如redis异常, mysql异常,参数异常, 甚至依赖的某个服务的异常等
sum(increase(microservice_calculate_request_count{error_code="4001", method="Add"}[5m]))

3. 请求QPS(请求速率)

代码语言:txt
复制
sum(rate(microservice_calculate_request_count{method="Add"}[1m]))

4. 请求时延

代码语言:txt
复制
-- 0~5ms占比:
sum(microservice_calculate_request_count{le="5"})/sum(microservice_calculate_request_count{le="+Inf"})

-- 0~10ms占比:
sum(microservice_calculate_request_latency_microseconds_bucket{le="10"})/sum(microservice_calculate_request_latency_microseconds_bucket{le="+Inf"})

5. 请求时延分布

代码语言:txt
复制
-- 5~10ms时延的请求数:
sum(microservice_calculate_request_latency_microseconds_bucket{le="10"}) -sum(microservice_calculate_request_latency_microseconds_bucket{le="5"})

-- 10~30ms时延的请求数:
sum(microservice_calculate_request_latency_microseconds_bucket{le="30"}) -sum(microservice_calculate_request_latency_microseconds_bucket{le="10"})

-- 30~50ms时延的请求数:
sum(microservice_calculate_request_latency_microseconds_bucket{le="50"}) -sum(microservice_calculate_request_latency_microseconds_bucket{le="30"})

6. 内存使用率

代码语言:txt
复制
delta(go_memstats_mcache_inuse_bytes{}[zh])

在Granfa中使用PromQL选择合适的图表后,效果如下:

请求概览.png
请求概览.png
时延统计.png
时延统计.png

五. 告警服务

告警可以直接使用promtheus自带的alertManager,例如配置接口平均时延超过200ms告警如下图:

告警配置
告警配置

六. 更多参考

【官方文档】https://prometheus.io/docs/prometheus/latest

【源码分析】 https://www.infoq.cn/article/Prometheus-theory-source-code

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 一. 介绍
  • 二. 指标
    • Counter(计数器)
      • Gauge(仪表盘)
        • Histogram(直方图)
          • Summary(摘要)
          • 三. go-kit整合使用
            • step1. 定义服务接口和实现
              • step2. 定义业务指标及其维度
                • step3. 常规的编解码和路由挂载
                  • step4. 服务暴露
                    • step5. 服务启动
                    • 四.可视化监控
                      • 1. 请求量(PromQL 表达式)
                        • 2. 请求异常数(PromQL 表达式)
                          • 3. 请求QPS(请求速率)
                            • 4. 请求时延
                              • 5. 请求时延分布
                                • 6. 内存使用率
                                • 五. 告警服务
                                • 六. 更多参考
                                相关产品与服务
                                Prometheus 监控服务
                                Prometheus 监控服务(TencentCloud Managed Service for Prometheus,TMP)是基于开源 Prometheus 构建的高可用、全托管的服务,与腾讯云容器服务(TKE)高度集成,兼容开源生态丰富多样的应用组件,结合腾讯云可观测平台-告警管理和 Prometheus Alertmanager 能力,为您提供免搭建的高效运维能力,减少开发及运维成本。
                                领券
                                问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档