前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >零基础学Flink:监控 on Prometheus & Grafana

零基础学Flink:监控 on Prometheus & Grafana

作者头像
麒思妙想
发布2020-07-10 10:15:46
1.9K0
发布2020-07-10 10:15:46
举报
文章被收录于专栏:麒思妙想麒思妙想

在上一篇文章中,对使用 Prometheus 监控Flink进行了阐述(传送门),这里就不再赘述了。

尽管 Prometheus 自我标榜是监控解决方案,

From metrics to insight Power your metrics and alerting with a leading open-source monitoring solution.

但是在我们日常使用中,Prometheus 更多担任的是数据采集平台和任务调度的职责,对于监控数据的可视化,我们更多是交给 Grafana 来完成。

The open observability platform Grafana is the open source analytics & monitoring solution for every database

从 Grafana 的 solgan 可以看出来,其在分析领域的野心。其主要特性可以归纳如下:

  1. 可视化:快速和灵活的客户端图形具有多种选项。面板插件为许多不同的方式可视化指标和日志。
  2. 报警:可视化地为最重要的指标定义警报规则。Grafana将持续评估它们,并发送通知。
  3. 通知:警报更改状态时,它会发出通知。接收电子邮件通知。
  4. 动态仪表盘:使用模板变量创建动态和可重用的仪表板,这些模板变量作为下拉菜单出现在仪表板顶部。
  5. 混合数据源:在同一个图中混合不同的数据源!可以根据每个查询指定数据源。这甚至适用于自定义数据源。
  6. 注释:注释来自不同数据源图表。将鼠标悬停在事件上可以显示完整的事件元数据和标记。
  7. 过滤器:过滤器允许您动态创建新的键/值过滤器,这些过滤器将自动应用于使用该数据源的所有查询。

给我触动最深的,还是其整体的架构设计,这里并非指的代码结构,而是其内部对使用逻辑,系统动作,行为抽象等的架构设计。在最近的使用过程中,给与了我很深的触动。对于一直做BI产品架构师和产品经理的我来说, Grafana 的整体设计,沿用到一般BI的可视化产品中,都是可行的。这个结构真的太美了,太妙了,今天居然看着屏幕笑了起来......

好了,上面是我夹带的一些私货,下面来说一说,使用吧,我并不想在这篇文章里手把手做一个仪表盘。而是通过之前文章的案例,迅速导入一个现成的仪表盘。(想入门的童靴,可以翻阅参考连接里的文章

引用之前案例的结构,设置好 Prometheus 对 Flink主要指标的监控

启动 grafana-server

然后设置Prometheus数据源

打开 Create --> Import 页面,将仪表盘配置的json导入(json全文在文章末尾可以找到)。

保存后就可以直观监控了Flink的主要指标了。

参考连接:

https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter

https://www.jianshu.com/p/0d82c7ccc85a

参考配置文件 flink-dashboard_rev1.json

{

"__inputs": [

{

"name": "DS_PROMETHEUS",

"label": "prometheus",

"description": "",

"type": "datasource",

"pluginId": "prometheus",

"pluginName": "Prometheus"

}

],

"__requires": [

{

"type": "grafana",

"id": "grafana",

"name": "Grafana",

"version": "4.2.0"

},

{

"type": "panel",

"id": "graph",

"name": "Graph",

"version": ""

},

{

"type": "datasource",

"id": "prometheus",

"name": "Prometheus",

"version": "1.0.0"

},

{

"type": "panel",

"id": "singlestat",

"name": "Singlestat",

"version": ""

}

],

"annotations": {

"list": []

},

"editable": true,

"gnetId": 10369,

"graphTooltip": 0,

"hideControls": false,

"id": null,

"links": [],

"refresh": "5s",

"rows": [

{

"collapse": false,

"height": 337,

"panels": [

{

"aliasColors": {},

"bars": false,

"datasource": "${DS_PROMETHEUS}",

"fill": 1,

"id": 1,

"legend": {

"avg": false,

"current": false,

"max": false,

"min": false,

"show": true,

"total": false,

"values": false

},

"lines": true,

"linewidth": 1,

"links": [],

"nullPointMode": "null",

"percentage": false,

"pointradius": 5,

"points": false,

"renderer": "flot",

"seriesOverrides": [],

"span": 3,

"stack": false,

"steppedLine": false,

"targets": [

{

"expr": "flink_jobmanager_Status_JVM_CPU_Load",

"intervalFactor": 10,

"legendFormat": "{{instance}}",

"metric": "flink_jobmanager_Status_JVM_CPU_Load",

"refId": "A",

"step": 20

}

],

"thresholds": [],

"timeFrom": null,

"timeShift": null,

"title": "JobManager CPU Load",

"tooltip": {

"shared": true,

"sort": 0,

"value_type": "individual"

},

"type": "graph",

"xaxis": {

"mode": "time",

"name": null,

"show": true,

"values": []

},

"yaxes": [

{

"format": "short",

"label": null,

"logBase": 1,

"max": null,

"min": null,

"show": true

},

{

"format": "short",

"label": null,

"logBase": 1,

"max": null,

"min": null,

"show": true

}

]

},

{

"aliasColors": {},

"bars": false,

"datasource": "${DS_PROMETHEUS}",

"fill": 1,

"id": 2,

"legend": {

"avg": false,

"current": false,

"max": false,

"min": false,

"show": true,

"total": false,

"values": false

},

"lines": true,

"linewidth": 1,

"links": [],

"nullPointMode": "null",

"percentage": false,

"pointradius": 5,

"points": false,

"renderer": "flot",

"seriesOverrides": [],

"span": 3,

"stack": false,

"steppedLine": false,

"targets": [

{

"expr": "flink_taskmanager_Status_JVM_CPU_Load",

"hide": false,

"intervalFactor": 2,

"legendFormat": "{{instance}}",

"metric": "flink_taskmanager_Status_JVM_CPU_Load",

"refId": "A",

"step": 4

}

],

"thresholds": [],

"timeFrom": null,

"timeShift": null,

"title": "TaskManager CPU Load",

"tooltip": {

"shared": true,

"sort": 0,

"value_type": "individual"

},

"type": "graph",

"xaxis": {

"mode": "time",

"name": null,

"show": true,

"values": []

},

"yaxes": [

{

"format": "short",

"label": null,

"logBase": 1,

"max": null,

"min": null,

"show": true

},

{

"format": "short",

"label": null,

"logBase": 1,

"max": null,

"min": null,

"show": true

}

]

},

{

"aliasColors": {},

"bars": false,

"datasource": "${DS_PROMETHEUS}",

"fill": 1,

"id": 5,

"legend": {

"avg": false,

"current": false,

"max": false,

"min": false,

"show": true,

"total": false,

"values": false

},

"lines": true,

"linewidth": 1,

"links": [],

"nullPointMode": "null",

"percentage": false,

"pointradius": 5,

"points": false,

"renderer": "flot",

"seriesOverrides": [],

"span": 3,

"stack": false,

"steppedLine": false,

"targets": [

{

"expr": "flink_jobmanager_Status_JVM_Memory_Direct_MemoryUsed",

"intervalFactor": 10,

"legendFormat": "{{instance}}",

"metric": "flink_jobmanager_Status_JVM_Memory_Direct_MemoryUsed",

"refId": "A",

"step": 20

}

],

"thresholds": [],

"timeFrom": null,

"timeShift": null,

"title": "JobManager Memory Used",

"tooltip": {

"shared": true,

"sort": 0,

"value_type": "individual"

},

"type": "graph",

"xaxis": {

"mode": "time",

"name": null,

"show": true,

"values": []

},

"yaxes": [

{

"format": "short",

"label": null,

"logBase": 1,

"max": null,

"min": null,

"show": true

},

{

"format": "short",

"label": null,

"logBase": 1,

"max": null,

"min": null,

"show": true

}

]

},

{

"aliasColors": {},

"bars": false,

"datasource": "${DS_PROMETHEUS}",

"fill": 1,

"id": 6,

"legend": {

"avg": false,

"current": false,

"max": false,

"min": false,

"show": true,

"total": false,

"values": false

},

"lines": true,

"linewidth": 1,

"links": [],

"nullPointMode": "null",

"percentage": false,

"pointradius": 5,

"points": false,

"renderer": "flot",

"seriesOverrides": [],

"span": 3,

"stack": false,

"steppedLine": false,

"targets": [

{

"expr": "flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed",

"hide": false,

"intervalFactor": 2,

"legendFormat": "{{instance}}",

"metric": "flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed",

"refId": "A",

"step": 4

}

],

"thresholds": [],

"timeFrom": null,

"timeShift": null,

"title": "TaskManager Memory Used",

"tooltip": {

"shared": true,

"sort": 0,

"value_type": "individual"

},

"type": "graph",

"xaxis": {

"mode": "time",

"name": null,

"show": true,

"values": []

},

"yaxes": [

{

"format": "short",

"label": null,

"logBase": 1,

"max": null,

"min": null,

"show": true

},

{

"format": "short",

"label": null,

"logBase": 1,

"max": null,

"min": null,

"show": true

}

]

}

],

"repeat": null,

"repeatIteration": null,

"repeatRowId": null,

"showTitle": false,

"title": "Dashboard Row",

"titleSize": "h6"

},

{

"collapse": false,

"height": 276,

"panels": [

{

"cacheTimeout": null,

"colorBackground": false,

"colorValue": true,

"colors": [

"rgba(245, 54, 54, 0.9)",

"rgba(255, 255, 255, 0.89)",

"rgba(50, 172, 45, 0.97)"

],

"datasource": "${DS_PROMETHEUS}",

"format": "none",

"gauge": {

"maxValue": null,

"minValue": 0,

"show": false,

"thresholdLabels": false,

"thresholdMarkers": true

},

"hideTimeOverride": false,

"id": 3,

"interval": null,

"links": [],

"mappingType": 1,

"mappingTypes": [

{

"name": "value to text",

"value": 1

},

{

"name": "range to text",

"value": 2

}

],

"maxDataPoints": 100,

"nullPointMode": "connected",

"nullText": null,

"postfix": "",

"postfixFontSize": "50%",

"prefix": "",

"prefixFontSize": "50%",

"rangeMaps": [

{

"from": "null",

"text": "N/A",

"to": "null"

}

],

"span": 3,

"sparkline": {

"fillColor": "rgba(31, 118, 189, 0.18)",

"full": false,

"lineColor": "rgb(31, 120, 193)",

"show": true

},

"targets": [

{

"expr": "flink_jobmanager_taskSlotsAvailable",

"hide": false,

"intervalFactor": 2,

"legendFormat": "",

"metric": "flink_jobmanager_taskSlotsAvailable",

"refId": "A",

"step": 20

}

],

"thresholds": "",

"title": "Taskslots available",

"type": "singlestat",

"valueFontSize": "80%",

"valueMaps": [

{

"op": "=",

"text": "N/A",

"value": "null"

}

],

"valueName": "current"

},

{

"cacheTimeout": null,

"colorBackground": false,

"colorValue": true,

"colors": [

"rgba(245, 54, 54, 0.9)",

"rgba(255, 255, 255, 0.89)",

"rgba(50, 172, 45, 0.97)"

],

"datasource": "${DS_PROMETHEUS}",

"format": "none",

"gauge": {

"maxValue": null,

"minValue": 0,

"show": false,

"thresholdLabels": false,

"thresholdMarkers": true

},

"hideTimeOverride": false,

"id": 4,

"interval": null,

"links": [],

"mappingType": 1,

"mappingTypes": [

{

"name": "value to text",

"value": 1

},

{

"name": "range to text",

"value": 2

}

],

"maxDataPoints": 100,

"nullPointMode": "connected",

"nullText": null,

"postfix": "",

"postfixFontSize": "50%",

"prefix": "",

"prefixFontSize": "50%",

"rangeMaps": [

{

"from": "null",

"text": "N/A",

"to": "null"

}

],

"span": 3,

"sparkline": {

"fillColor": "rgba(31, 118, 189, 0.18)",

"full": false,

"lineColor": "rgb(31, 120, 193)",

"show": true

},

"targets": [

{

"expr": "flink_jobmanager_taskSlotsTotal",

"hide": false,

"intervalFactor": 2,

"legendFormat": "",

"metric": "flink_jobmanager_taskSlotsTotal",

"refId": "A",

"step": 20

}

],

"thresholds": "",

"title": "Taskslots total",

"type": "singlestat",

"valueFontSize": "80%",

"valueMaps": [

{

"op": "=",

"text": "N/A",

"value": "null"

}

],

"valueName": "current"

},

{

"cacheTimeout": null,

"colorBackground": false,

"colorValue": true,

"colors": [

"rgba(245, 54, 54, 0.9)",

"rgba(255, 255, 255, 0.89)",

"rgba(50, 172, 45, 0.97)"

],

"datasource": "${DS_PROMETHEUS}",

"format": "none",

"gauge": {

"maxValue": null,

"minValue": 0,

"show": false,

"thresholdLabels": false,

"thresholdMarkers": true

},

"hideTimeOverride": false,

"id": 7,

"interval": null,

"links": [],

"mappingType": 1,

"mappingTypes": [

{

"name": "value to text",

"value": 1

},

{

"name": "range to text",

"value": 2

}

],

"maxDataPoints": 100,

"nullPointMode": "connected",

"nullText": null,

"postfix": "",

"postfixFontSize": "50%",

"prefix": "",

"prefixFontSize": "50%",

"rangeMaps": [

{

"from": "null",

"text": "N/A",

"to": "null"

}

],

"span": 3,

"sparkline": {

"fillColor": "rgba(251, 129, 76, 0.18)",

"full": false,

"lineColor": "rgb(193, 31, 31)",

"show": true

},

"targets": [

{

"expr": "flink_jobmanager_numRegisteredTaskManagers",

"hide": false,

"intervalFactor": 2,

"legendFormat": "",

"metric": "flink_jobmanager_numRegisteredTaskManagers",

"refId": "A",

"step": 20

}

],

"thresholds": "",

"title": "# of TaskManagers",

"type": "singlestat",

"valueFontSize": "80%",

"valueMaps": [

{

"op": "=",

"text": "N/A",

"value": "null"

}

],

"valueName": "current"

},

{

"cacheTimeout": null,

"colorBackground": false,

"colorValue": true,

"colors": [

"rgba(245, 54, 54, 0.9)",

"rgba(255, 255, 255, 0.89)",

"rgba(50, 172, 45, 0.97)"

],

"datasource": "${DS_PROMETHEUS}",

"format": "none",

"gauge": {

"maxValue": null,

"minValue": 0,

"show": false,

"thresholdLabels": false,

"thresholdMarkers": true

},

"hideTimeOverride": false,

"id": 8,

"interval": null,

"links": [],

"mappingType": 1,

"mappingTypes": [

{

"name": "value to text",

"value": 1

},

{

"name": "range to text",

"value": 2

}

],

"maxDataPoints": 100,

"nullPointMode": "connected",

"nullText": null,

"postfix": "",

"postfixFontSize": "50%",

"prefix": "",

"prefixFontSize": "50%",

"rangeMaps": [

{

"from": "null",

"text": "N/A",

"to": "null"

}

],

"span": 3,

"sparkline": {

"fillColor": "rgba(251, 129, 76, 0.18)",

"full": false,

"lineColor": "rgb(193, 31, 31)",

"show": true

},

"targets": [

{

"expr": "flink_jobmanager_numRunningJobs",

"hide": false,

"intervalFactor": 2,

"legendFormat": "",

"metric": "flink_jobmanager_numRunningJobs",

"refId": "A",

"step": 20

}

],

"thresholds": "",

"title": "# of Running Jobs",

"type": "singlestat",

"valueFontSize": "80%",

"valueMaps": [

{

"op": "=",

"text": "N/A",

"value": "null"

}

],

"valueName": "current"

}

],

"repeat": null,

"repeatIteration": null,

"repeatRowId": null,

"showTitle": false,

"title": "Dashboard Row",

"titleSize": "h6"

},

{

"collapse": false,

"height": 255,

"panels": [

{

"aliasColors": {},

"bars": false,

"datasource": "${DS_PROMETHEUS}",

"fill": 1,

"id": 9,

"legend": {

"avg": false,

"current": false,

"max": false,

"min": false,

"show": true,

"total": false,

"values": false

},

"lines": true,

"linewidth": 1,

"links": [],

"nullPointMode": "null",

"percentage": false,

"pointradius": 5,

"points": false,

"renderer": "flot",

"seriesOverrides": [],

"span": 6,

"stack": false,

"steppedLine": false,

"targets": [

{

"expr": "flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time",

"intervalFactor": 2,

"legendFormat": "{{instance}} Young Gen Time",

"metric": "flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count",

"refId": "A",

"step": 2

},

{

"expr": "flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time",

"intervalFactor": 2,

"legendFormat": "{{instance}} Old Gen Time",

"metric": "flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count",

"refId": "B",

"step": 2

}

],

"thresholds": [],

"timeFrom": null,

"timeShift": null,

"title": "TaskManagers Garbage Collection",

"tooltip": {

"shared": true,

"sort": 0,

"value_type": "individual"

},

"type": "graph",

"xaxis": {

"mode": "time",

"name": null,

"show": true,

"values": []

},

"yaxes": [

{

"format": "short",

"label": null,

"logBase": 1,

"max": null,

"min": null,

"show": true

},

{

"format": "short",

"label": null,

"logBase": 1,

"max": null,

"min": null,

"show": true

}

]

},

{

"aliasColors": {},

"bars": false,

"datasource": "${DS_PROMETHEUS}",

"fill": 1,

"id": 10,

"legend": {

"avg": false,

"current": false,

"max": false,

"min": false,

"show": true,

"total": false,

"values": false

},

"lines": true,

"linewidth": 1,

"links": [],

"nullPointMode": "null",

"percentage": false,

"pointradius": 5,

"points": false,

"renderer": "flot",

"seriesOverrides": [],

"span": 6,

"stack": false,

"steppedLine": false,

"targets": [

{

"expr": "flink_jobmanager_Status_JVM_GarbageCollector_Copy_Time",

"intervalFactor": 2,

"legendFormat": "{{instance}} GC Copy Time",

"metric": "flink_jobmanager_Status_JVM_GarbageCollector_Copy_Time",

"refId": "A",

"step": 2

},

{

"expr": "flink_jobmanager_Status_JVM_GarbageCollector_MarkSweepCompact_Time",

"intervalFactor": 2,

"legendFormat": "{{instance}} GC MarkSweep Time",

"metric": "flink_jobmanager_Status_JVM_GarbageCollector_MarkSweepCompact_Time",

"refId": "B",

"step": 2

}

],

"thresholds": [],

"timeFrom": null,

"timeShift": null,

"title": "JobManager Garbage Collection",

"tooltip": {

"shared": true,

"sort": 0,

"value_type": "individual"

},

"type": "graph",

"xaxis": {

"mode": "time",

"name": null,

"show": true,

"values": []

},

"yaxes": [

{

"format": "short",

"label": null,

"logBase": 1,

"max": null,

"min": null,

"show": true

},

{

"format": "short",

"label": null,

"logBase": 1,

"max": null,

"min": null,

"show": true

}

]

}

],

"repeat": null,

"repeatIteration": null,

"repeatRowId": null,

"showTitle": false,

"title": "Dashboard Row",

"titleSize": "h6"

}

],

"schemaVersion": 14,

"style": "dark",

"tags": [

"flink"

],

"templating": {

"list": []

},

"time": {

"from": "now-15m",

"to": "now"

},

"timepicker": {

"refresh_intervals": [

"5s",

"10s",

"30s",

"1m",

"5m",

"15m",

"30m",

"1h",

"2h",

"1d"

],

"time_options": [

"5m",

"15m",

"1h",

"6h",

"12h",

"24h",

"2d",

"7d",

"30d"

]

},

"timezone": "browser",

"title": "Flink Dashboard",

"version": 19,

"description": "Flink dashboard using the Prometheus exporter. https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter "

}

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2019-10-31,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 麒思妙想 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
大数据
全栈大数据产品,面向海量数据场景,帮助您 “智理无数,心中有数”!
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档