请快速询问*_bucket类型的度量标准。
我的应用程序生成度量,如下所示:
# HELP http_server_requests_seconds
# TYPE http_server_requests_seconds histogram
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/health",le="0.005592405",} 273.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/health",le="0.006990506",} 797.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/health",le="0.008388607",} 2638.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/health",le="0.009786708",} 3543.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/health",le="0.011184809",} 3932.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/health",le="0.01258291",} 4154.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/health",le="0.013981011",} 4279.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/health",le="0.015379112",} 4380.0
和
# HELP resilience4j_circuitbreaker_calls_seconds Total number of successful calls
# TYPE resilience4j_circuitbreaker_calls_seconds histogram
resilience4j_circuitbreaker_calls_seconds_bucket{kind="successful",name="someName",le="0.001",} 0.0
resilience4j_circuitbreaker_calls_seconds_bucket{kind="successful",name="someName",le="0.001048576",} 0.0
resilience4j_circuitbreaker_calls_seconds_bucket{kind="successful",name="someName",le="0.001398101",} 0.0
resilience4j_circuitbreaker_calls_seconds_bucket{kind="successful",name="someName",le="0.001747626",} 0.0
resilience4j_circuitbreaker_calls_seconds_bucket{kind="successful",name="someName",le="0.002097151",} 0.0
resilience4j_circuitbreaker_calls_seconds_bucket{kind="successful",name="someName",le="0.002446676",} 0.0
resilience4j_circuitbreaker_calls_seconds_bucket{kind="successful",name="someName",le="0.002796201",} 0.0
我相信它们确实有用,但不幸的是,我不知道该如何处理它们。
我尝试了一些查询,如rate(http_server_requests_seconds{_bucket_=\"+Inf\", status=~\"2..\"}[5m])
,但似乎没有带来任何有价值的东西。
请问使用*_bucket类型的度量标准的正确方法是什么,例如,如何构建最适合这些*_bucket的Grafana仪表板和可视化?
谢谢
发布于 2022-01-17 03:55:55
您可以使用此度量找到给定端点的延迟的第99百分位数/第95百分位数,并为此使用histogram_quantile函数。例如,第99百分位数:
histogram_quantile(
0.99,
sum(
rate(
http_server_requests_seconds_bucket{exception="None", uri = "/your-uri"}[5m])
) by (le)
)
第95百分位数:
histogram_quantile(
0.95,
sum(
rate(http_server_requests_seconds_bucket{exception="None", uri = "/your-uri"}[5m])
) by (le)
)
更多信息:参考文献中的一个很好的片段:https://idanlupinsky.com/blog/application-monitoring-with-micrometer-prometheus-grafana-and-cloudwatch/
柱状图是一个桶(或计数器)的集合,每个桶(或计数器)保持观察到的事件的数量,这些事件的持续时间由le标记指定。让我们看看我们的演示应用程序发布的直方图的一部分:
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/demo",le="0.067108864",} 0.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/demo",le="0.089478485",} 0.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/demo",le="0.111848106",} 92382.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/demo",le="0.134217727",} 99050.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/demo",le="0.156587348",} 99703.0
...
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/demo",le="0.984263336",} 99987.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/demo",le="1.0",} 99987.0
http_server_requests_seconds_bucket{exception="None",method="GET",outcome="SUCCESS",status="200",uri="/demo",le="+Inf",} 100000.0
上面清单中的第二行表示没有观察到最多占用89 by的请求(由le
标记指定)。考虑到在处理请求时有100 is的睡眠时间,这是预期的。第3行显示,观察到92 382个请求,其持续时间长达111 up。注意,直方图是累加的,请求的全部计数都落在没有上限的le="+Inf"
的最后一个桶中。
https://stackoverflow.com/questions/68394394
复制相似问题