Prometheus是一个开源的完整的监控告警解决方案。
传统的监控告警模型往往存在一下问题:
Prometheus基于中央化的规则计算、统一分析和告警的新模型, 完美地解决了传统监控模型的痛点。所以说其对传统监控系统的测试和告警模型进行了彻底的颠覆。
一般对于一个服务来说主要关注以下几类指标:
为了能够帮助用户理解和区分这些不同监控指标之间的差异,Prometheus定义了4种不同的指标类型(metric type):Counter(计数器)、Gauge(仪表盘)、Histogram(直方图)、Summary(摘要)。
Counter类型的指标其工作方式和计数器一样,只增不减(除非系统发生重置)。 一般在定义Counter类型指标的名称时推荐使用_total
作为后缀。例入http_requests_total
表示请求总量。
使用PromQL通过rate()
函数获取HTTP请求量的请求速率
rate(http_requests_total[1m])
与Counter不同,Gauge类型的指标侧重于反应系统的当前状态, 因此这类指标的样本数据可增可减。
一般的使用率和饱和度这种波动的指标我们会采用这类指标类定义,比如CPU的使用情况,内存占用情况等监控项。
对于一些波动很快的情况,很多时候我们不希望真的达到了阈值才被通知,这个时候通知可能服务已经宕机有段时间了。即使阈值设置的小,对于增长过快的指标也是可能没法及时通知的。所以PromQL内置了一个predict_linear()
使用deriv()
计算样本的线性回归模型,对数据的变化趋势进行预测从而实现预先通知的效果。例入对于一个线上服务,预测未来60分钟内存是否会被打爆.
predict_linear(node_memory_available{}[1h], 3600)<=0
主用用于统计和分析样本的分布情况, 监控数据常需要服从正态分布。 一般使用这类指标计算请求时延,统计各个时延段的请求数,请求耗时。主要包含了一下3部分:
举例
假设现在有一个指标名称为microservece
的柱状图, 则上面三部分的作用度量指标名称为:
microservice_bucket{le="上边界"}, 这个值为小于等于上边界的所有采样点数量
microservice_sum
microservice_count
在设置bucket=5,10,30的情况下,样本为{3.5, 4.1, 8.0, 9.2, 12.1, 15.3, 51.0, 65.5 }
histogram的最终结果bucket计数是向下包含的):
microservice_bucket{le="5"} = 2
microservice_bucket{le="10"} = 4
microservice_bucket{le="30"} = 6
microservice_bucket{le="+Inf"} = 8
microservice_count = 8
microservice_sum =168.7
histogram并不会保存数据采样点值,每个bucket只有个记录样本数的counter(float64),即histogram存储的是区间的样本数统计值,对客户端的性能消耗较小, 适合高并发的数据收集。
Summary 用来收集服从正态分布的采样数据。Summary 中 quantile 实际上是正态分布中的分位点.
假设现在有一个指标名称为microservice
的的摘要, 则上面三部分的作用度量指标名称为:
观察时间的φ-quantiles (0 ≤ φ ≤ 1), 显示为[microservice]{分位数="[φ]"}
[microservice]_sum, 是指所有观察值的总和
[microservice]_count, 是指已观察到的事件计数值
在设置quantile={0.5: 0.05, 0.85: 0.005, 0.95: 0.005, 0.99: 0.001}的情况下,样本为{3.5, 4.1, 8.0, 9.2, 12.1, 15.3, 51.0, 65.5 }
summary的最终结果quantile计数是向下包含的):
microservice{quantile="0.5"} = 32.11
microservice{quantile="0.85"} = 39.83
microservice{quantile="0.95"} = 40.93
microservice{quantile="0.99"} = 42.92
microservice_count = 8
microservice_sum =168.7
Summary直接存储了 quantile 数据,而不是根据统计区间计算出来的, 所以summary就是解决百分位准确的问题而来的。因为每次数据进入都需要重新计算各个分位的值,内部需要一些锁操作防止数据并发出现不一致问题,对高并发程序性能存在一定影响,而且quantile在采集时就已经指定,所以只能使用已经指定的分位数的数据,使用上可能不太灵活,也不支持聚合操作,毕竟分位值是不能直接聚合的。
package service
import (
"context"
"errors"
kitprometheus "github.com/go-kit/kit/metrics/prometheus"
stdprometheus "github.com/prometheus/client_golang/prometheus"
"go-kit-microservice/internal/pkg/comn"
"go-kit-microservice/pb"
)
// service interface
type Service interface {
Multiply(ctx context.Context, in *pb.MultiplyRequest) (errCode int32, resp *pb.MultiplyResponse, err error)
Add(ctx context.Context, in *pb.AddRequest) (errCode int32, resp *pb.AddResponse, err error)
Sub(ctx context.Context, in *pb.SubRequest) (errCode int32, resp *pb.SubResponse, err error)
Div(ctx context.Context, in *pb.DivRequest) (errCode int32, resp *pb.DivResponse, err error)
}
// Service struct, has an implementation of Service interface
type baseService struct {
}
func NewService() Service {
fieldKeys := []string{"method", "error_code"}
requestCount := kitprometheus.NewCounterFrom(stdprometheus.CounterOpts{
Namespace: "microservice",
Subsystem: "calculate",
Name: "request_count",
Help: "Number of requests received.",
}, fieldKeys)
requestLatency := kitprometheus.NewHistogramFrom(stdprometheus.HistogramOpts{
Namespace: "microservice",
Subsystem: "calculate",
Name: "request_latency_microseconds",
Help: "Total duration of requests in milliseconds.",
Buckets: []float64{5, 10, 20, 30, 50, 80, 100, 120, 150, 200, 300, 500},
}, fieldKeys)
return &instrumentMiddleware{
requestCount: requestCount,
requestLatency: requestLatency,
next: baseService{},
}
}
func (s baseService) Multiply(ctx context.Context, in *pb.MultiplyRequest) (errCode int32, resp *pb.MultiplyResponse, err error) {
return comn.SUCC.Code, &pb.MultiplyResponse{
Res: in.A * in.B,
}, nil
}
func (s baseService) Add(ctx context.Context, in *pb.AddRequest) (errCode int32, resp *pb.AddResponse, err error) {
return comn.SUCC.Code, &pb.AddResponse{
Res: in.A + in.B,
}, nil
}
func (s baseService) Sub(ctx context.Context, in *pb.SubRequest) (errCode int32, response *pb.SubResponse, err error) {
return comn.SUCC.Code, &pb.SubResponse{
Res: in.A - in.B,
}, nil
}
func (s baseService) Div(ctx context.Context, in *pb.DivRequest) (errCode int32, response *pb.DivResponse, err error) {
if in.B == int64(0) {
return comn.ErrRequestParamIllegal.Code, &pb.DivResponse{
Code: comn.ErrRequestParamIllegal.Code,
Msg: comn.ErrRequestParamIllegal.Msg,
Res: 0.000,
}, errors.New(comn.ErrRequestParamIllegal.Msg)
}
return comn.SUCC.Code, &pb.DivResponse{
Code: comn.SUCC.Code,
Msg: comn.SUCC.Msg,
Res: float32(in.A) / float32(in.B),
}, nil
}
package service
import (
"context"
"fmt"
"github.com/go-kit/kit/metrics"
"go-kit-microservice/pb"
"time"
)
type instrumentMiddleware struct {
requestCount metrics.Counter
requestLatency metrics.Histogram
next Service
}
func (mw instrumentMiddleware) Multiply(ctx context.Context, req *pb.MultiplyRequest) (errCode int32, resp *pb.MultiplyResponse, err error) {
defer func(begin time.Time) {
lvs := []string{
"method", "Multiply",
"error_code", fmt.Sprintf("%d", errCode),
}
mw.requestCount.With(lvs...).Add(1)
mw.requestLatency.With(lvs...).Observe(float64(time.Since(begin).Microseconds()))
}(time.Now())
return mw.next.Multiply(ctx, req)
}
func (mw instrumentMiddleware) Add(ctx context.Context, req *pb.AddRequest) (errCode int32, resp *pb.AddResponse, err error) {
defer func(begin time.Time) {
lvs := []string{
"method", "Add",
"error_code", fmt.Sprintf("%d", errCode),
}
mw.requestCount.With(lvs...).Add(1)
mw.requestLatency.With(lvs...).Observe(float64(time.Since(begin).Microseconds()))
}(time.Now())
return mw.next.Add(ctx, req)
}
func (mw instrumentMiddleware) Sub(ctx context.Context, req *pb.SubRequest) (errCode int32, response *pb.SubResponse, err error) {
defer func(begin time.Time) {
lvs := []string{
"method", "Sub",
"error_code", fmt.Sprintf("%d", errCode),
}
mw.requestCount.With(lvs...).Add(1)
mw.requestLatency.With(lvs...).Observe(float64(time.Since(begin).Microseconds()))
}(time.Now())
return mw.next.Sub(ctx, req)
}
func (mw instrumentMiddleware) Div(ctx context.Context, req *pb.DivRequest) (errCode int32, response *pb.DivResponse, err error) {
defer func(begin time.Time) {
lvs := []string{
"method", "Div",
"error_code", fmt.Sprintf("%d", errCode),
}
mw.requestCount.With(lvs...).Add(1)
mw.requestLatency.With(lvs...).Observe(float64(time.Since(begin).Microseconds()))
}(time.Now())
return mw.next.Div(ctx, req)
}
package http
import (
"context"
"encoding/json"
"fmt"
"github.com/go-kit/kit/endpoint"
httptransport "github.com/go-kit/kit/transport/http"
uuid "github.com/satori/go.uuid"
endpoints "go-kit-microservice/internal/endpoint"
log "go-kit-microservice/internal/pkg/log"
"go-kit-microservice/internal/pkg/utils"
"go-kit-microservice/pb"
"io/ioutil"
"net/http"
)
func MewHttpHandler(endpoints endpoints.EndPoints) http.Handler {
options := []httptransport.ServerOption{
// Unified exception handling
httptransport.ServerErrorEncoder(errorEncoder),
// before request set request id
httptransport.ServerBefore(func(ctx context.Context, request *http.Request) context.Context {
reqId := uuid.NewV5(uuid.NewV4(), "req_id").String()
ctx = context.WithValue(ctx, utils.BaseRequestId, reqId)
ctx = context.WithValue(ctx, utils.JwtTokenKey, request.Header.Get("Authorization"))
return ctx
}),
}
m := http.NewServeMux()
m.Handle("/multiply", httptransport.NewServer(
endpoints.MultiplyEndPoint,
decodeMultiplyRequest,
encodeMultiplyResponse,
options...,
))
m.Handle("/add", httptransport.NewServer(
endpoints.AddEndPoint,
decodeAddRequest,
encodeAddResponse,
options...,
))
m.Handle("/sub", httptransport.NewServer(
endpoints.SubEndPoint,
decodeSubRequest,
encodeSubResponse,
options...,
))
m.Handle("/div", httptransport.NewServer(
endpoints.DivEndPoint,
decodeDivRequest,
encodeDivResponse,
options...,
))
return m
}
// decode test
func decodeMultiplyRequest(ctx context.Context, r *http.Request) (interface{}, error) {
bs, err := ioutil.ReadAll(r.Body)
if err != nil {
return nil, err
}
fmt.Printf(string(bs))
req := &pb.MultiplyRequest{}
err = json.Unmarshal(bs, req)
if err != nil {
return nil, err
}
return req, nil
}
// encode the response data to user
func encodeMultiplyResponse(ctx context.Context, w http.ResponseWriter, response interface{}) error {
if f, ok := response.(endpoint.Failer); ok && f.Failed() != nil {
errorEncoder(ctx, f.Failed(), w)
return nil
}
resp := response.(*pb.MultiplyResponse)
w.Header().Set("Content-Type", "application/json; charset=utf-8")
bs, err := resp.MarshalJSON()
if err != nil {
return err
}
w.Write(bs)
return nil
}
// decode test
func decodeAddRequest(ctx context.Context, r *http.Request) (interface{}, error) {
bs, err := ioutil.ReadAll(r.Body)
if err != nil {
return nil, err
}
fmt.Printf(string(bs))
req := &pb.AddRequest{}
err = json.Unmarshal(bs, req)
if err != nil {
return nil, err
}
return req, nil
}
// encode the response data to user
func encodeAddResponse(ctx context.Context, w http.ResponseWriter, response interface{}) error {
if f, ok := response.(endpoint.Failer); ok && f.Failed() != nil {
errorEncoder(ctx, f.Failed(), w)
return nil
}
resp := response.(*pb.AddResponse)
w.Header().Set("Content-Type", "application/json; charset=utf-8")
bs, err := resp.MarshalJSON()
if err != nil {
return err
}
w.Write(bs)
return nil
}
// decode test
func decodeSubRequest(ctx context.Context, r *http.Request) (interface{}, error) {
bs, err := ioutil.ReadAll(r.Body)
if err != nil {
return nil, err
}
fmt.Printf(string(bs))
req := &pb.SubRequest{}
err = json.Unmarshal(bs, req)
if err != nil {
return nil, err
}
return req, nil
}
// encode the response data to user
func encodeSubResponse(ctx context.Context, w http.ResponseWriter, response interface{}) error {
if f, ok := response.(endpoint.Failer); ok && f.Failed() != nil {
errorEncoder(ctx, f.Failed(), w)
return nil
}
resp := response.(*pb.SubResponse)
w.Header().Set("Content-Type", "application/json; charset=utf-8")
bs, err := resp.MarshalJSON()
if err != nil {
return err
}
w.Write(bs)
return nil
}
// decode test
func decodeDivRequest(ctx context.Context, r *http.Request) (interface{}, error) {
bs, err := ioutil.ReadAll(r.Body)
if err != nil {
return nil, err
}
fmt.Printf(string(bs))
req := &pb.DivRequest{}
err = json.Unmarshal(bs, req)
if err != nil {
return nil, err
}
return req, nil
}
// encode the response data to user
func encodeDivResponse(ctx context.Context, w http.ResponseWriter, response interface{}) error {
if f, ok := response.(endpoint.Failer); ok && f.Failed() != nil {
errorEncoder(ctx, f.Failed(), w)
return nil
}
resp := response.(*pb.DivResponse)
w.Header().Set("Content-Type", "application/json; charset=utf-8")
bs, err := resp.MarshalJSON()
if err != nil {
return err
}
w.Write(bs)
return nil
}
// error EncodeHandler
func errorEncoder(ctx context.Context, err error, w http.ResponseWriter) {
w.WriteHeader(http.StatusOK)
log.GetLogger().Error(err.Error())
e := json.NewEncoder(w).Encode(errorWrapper{Error: err.Error()})
if e != nil {
log.GetLogger().Error("json encode failed: " + e.Error())
}
}
type errorWrapper struct {
Error string `json:"error"`
}
package endpoint
import (
"context"
"github.com/go-kit/kit/endpoint"
"go-kit-microservice/internal/service"
"go-kit-microservice/pb"
)
type EndPoints struct {
MultiplyEndPoint endpoint.Endpoint
AddEndPoint endpoint.Endpoint
SubEndPoint endpoint.Endpoint
DivEndPoint endpoint.Endpoint
}
func NewEndpoints(svc service.Service) EndPoints {
multiplyEndpoint := makeMultiplyEndPoint(svc)
addEndpoint := makeAddEndPoint(svc)
subEndpoint := makeSubEndPoint(svc)
divEndpoint := makeDivEndPoint(svc)
return EndPoints{
MultiplyEndPoint: multiplyEndpoint,
AddEndPoint: addEndpoint,
SubEndPoint: subEndpoint,
DivEndPoint: divEndpoint,
}
}
func makeMultiplyEndPoint(s service.Service) endpoint.Endpoint {
return func(ctx context.Context, request interface{}) (response interface{}, err error) {
req := request.(*pb.MultiplyRequest)
_, resp, _ := s.Multiply(ctx, req)
return resp, nil
}
}
func makeAddEndPoint(s service.Service) endpoint.Endpoint {
return func(ctx context.Context, request interface{}) (response interface{}, err error) {
req := request.(*pb.AddRequest)
_, resp, _ := s.Add(ctx, req)
return resp, nil
}
}
func makeSubEndPoint(s service.Service) endpoint.Endpoint {
return func(ctx context.Context, request interface{}) (response interface{}, err error) {
req := request.(*pb.SubRequest)
_, resp, _ := s.Sub(ctx, req)
return resp, nil
}
}
func makeDivEndPoint(s service.Service) endpoint.Endpoint {
return func(ctx context.Context, request interface{}) (response interface{}, err error) {
req := request.(*pb.DivRequest)
_, resp, _ := s.Div(ctx, req)
return resp, nil
}
}
package main
import (
"fmt"
"github.com/gorilla/mux"
"github.com/oklog/oklog/pkg/group"
"github.com/prometheus/client_golang/prometheus/promhttp"
"github.com/spf13/viper"
"go-kit-microservice/internal/endpoint"
"go-kit-microservice/internal/pkg/cfg"
"go-kit-microservice/internal/pkg/log"
"go-kit-microservice/internal/service"
http2 "go-kit-microservice/internal/transport/http"
"net"
"net/http"
)
func main() {
// init yaml config
cfg.InitYmlConfig()
// init logger
log.InitLog()
svc := service.NewService()
endpoints := endpoint.NewEndpoints(svc)
httpHandler := http2.MewHttpHandler(endpoints)
promHandler := NewPromHandler()
var g group.Group
{
// listen:10086
httpListener, err := net.Listen("tcp", ":"+viper.GetString("server.port"))
if err != nil {
fmt.Println("http server start failed:" + err.Error())
}
g.Add(func() error {
return http.Serve(httpListener, httpHandler)
}, func(e error) {
err = httpListener.Close()
if err != nil {
fmt.Println("http server close failed", err.Error())
}
})
// listen:10081
httpListener2, err2 := net.Listen("tcp", ":"+viper.GetString("promServer.port"))
g.Add(func() error {
return http.Serve(httpListener2, promHandler)
}, func(e error) {
err2 = httpListener2.Close()
if err2 != nil {
fmt.Println("prom http server close failed", err2.Error())
}
})
}
log.GetLogger().Info("http server start success, port " + viper.GetString("server.port"))
log.GetLogger().Info("prom server start success, port " + viper.GetString("promServer.port"))
_ = g.Run()
}
func NewPromHandler() *mux.Router {
r := mux.NewRouter()
r.Handle("/metrics", promhttp.Handler())
return r
}
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.8084e-05
go_gc_duration_seconds{quantile="0.25"} 1.8084e-05
go_gc_duration_seconds{quantile="0.5"} 1.8084e-05
go_gc_duration_seconds{quantile="0.75"} 1.8084e-05
go_gc_duration_seconds{quantile="1"} 1.8084e-05
go_gc_duration_seconds_sum 1.8084e-05
go_gc_duration_seconds_count 1
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 11
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.15.6"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 3.155672e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 4.133568e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.445117e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 3689
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 2.1668426398222153e-07
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 4.638304e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 3.155672e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.2144512e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 4.34176e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 2660
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.1087744e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6486272e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.6178815423531399e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 6349
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 20832
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 32768
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 89488
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 98304
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 6.078704e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.667235e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 622592
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 622592
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.4990592e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 12
# HELP microservice_calculate_request_count Number of requests received.
# TYPE microservice_calculate_request_count counter
microservice_calculate_request_count{error_code="0",method="Add"} 2
microservice_calculate_request_count{error_code="0",method="Div"} 1
microservice_calculate_request_count{error_code="0",method="Multiply"} 1
microservice_calculate_request_count{error_code="0",method="Sub"} 1
microservice_calculate_request_count{error_code="4003",method="Div"} 4
# HELP microservice_calculate_request_latency_microseconds Total duration of requests in milliseconds.
# TYPE microservice_calculate_request_latency_microseconds histogram
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="5"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="10"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="20"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="30"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="50"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="80"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="100"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="120"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="150"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="200"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="300"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="500"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Add",le="+Inf"} 2
microservice_calculate_request_latency_microseconds_sum{error_code="0",method="Add"} 22
microservice_calculate_request_latency_microseconds_count{error_code="0",method="Add"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="5"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="10"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="20"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="30"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="50"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="80"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="100"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="120"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="150"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="200"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="300"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="500"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Div",le="+Inf"} 1
microservice_calculate_request_latency_microseconds_sum{error_code="0",method="Div"} 11
microservice_calculate_request_latency_microseconds_count{error_code="0",method="Div"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="5"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="10"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="20"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="30"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="50"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="80"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="100"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="120"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="150"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="200"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="300"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="500"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Multiply",le="+Inf"} 1
microservice_calculate_request_latency_microseconds_sum{error_code="0",method="Multiply"} 12
microservice_calculate_request_latency_microseconds_count{error_code="0",method="Multiply"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="5"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="10"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="20"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="30"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="50"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="80"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="100"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="120"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="150"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="200"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="300"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="500"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="0",method="Sub",le="+Inf"} 1
microservice_calculate_request_latency_microseconds_sum{error_code="0",method="Sub"} 12
microservice_calculate_request_latency_microseconds_count{error_code="0",method="Sub"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="5"} 0
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="10"} 1
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="20"} 2
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="30"} 3
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="50"} 3
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="80"} 3
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="100"} 4
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="120"} 4
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="150"} 4
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="200"} 4
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="300"} 4
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="500"} 4
microservice_calculate_request_latency_microseconds_bucket{error_code="4003",method="Div",le="+Inf"} 4
microservice_calculate_request_latency_microseconds_sum{error_code="4003",method="Div"} 138
microservice_calculate_request_latency_microseconds_count{error_code="4003",method="Div"} 4
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 3
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
-- 请求总量
sum(increase(microservice_calculate_request_count{}[5m]))
-- 接口请求总量
sum(increase(microservice_calculate_request_count{method="Add"}[5m]))
-- 异常数
sum(increase(microservice_calculate_request_count{error_code!="0"}[5m]))
-- 接口异常数
sum(increase(microservice_calculate_request_count{error_code!="0", method="Add"}[5m]))
-- 参数异常数,通过错误码区分各种类型的异常, 比如redis异常, mysql异常,参数异常, 甚至依赖的某个服务的异常等
sum(increase(microservice_calculate_request_count{error_code="4001", method="Add"}[5m]))
sum(rate(microservice_calculate_request_count{method="Add"}[1m]))
-- 0~5ms占比:
sum(microservice_calculate_request_count{le="5"})/sum(microservice_calculate_request_count{le="+Inf"})
-- 0~10ms占比:
sum(microservice_calculate_request_latency_microseconds_bucket{le="10"})/sum(microservice_calculate_request_latency_microseconds_bucket{le="+Inf"})
-- 5~10ms时延的请求数:
sum(microservice_calculate_request_latency_microseconds_bucket{le="10"}) -sum(microservice_calculate_request_latency_microseconds_bucket{le="5"})
-- 10~30ms时延的请求数:
sum(microservice_calculate_request_latency_microseconds_bucket{le="30"}) -sum(microservice_calculate_request_latency_microseconds_bucket{le="10"})
-- 30~50ms时延的请求数:
sum(microservice_calculate_request_latency_microseconds_bucket{le="50"}) -sum(microservice_calculate_request_latency_microseconds_bucket{le="30"})
delta(go_memstats_mcache_inuse_bytes{}[zh])
在Granfa中使用PromQL选择合适的图表后,效果如下:
告警可以直接使用promtheus自带的alertManager,例如配置接口平均时延超过200ms告警如下图:
【官方文档】https://prometheus.io/docs/prometheus/latest
【源码分析】 https://www.infoq.cn/article/Prometheus-theory-source-code
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。