流计算 Oceanus 监控指标

最近更新时间:2024-05-10 18:45:41

我的收藏

命名空间

Namespace=QCE/TSTREAM

监控指标

指标英文名
指标中文名
指标说明
单位
维度
统计规则 [period, statType]
JobRestartingtime
作业重启耗时
作业最近一次重启耗时
ms
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerLastcheckpointrestoretimestamp
作业最近一次恢复的时间戳
作业最近一次从快照恢复的 Unix 时间戳(以毫秒为单位,如果未恢复过则是 -1)
ms
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
TaskmanagerJvmThreadsCount
TaskManager 活动线程数
作业中所有 TaskManager 中活动的线程数之和,含 Daemon 和非 Daemon 线程。
tjob_id
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
TaskmanagerMemoryNonheapCommitted
TaskManager 已提交的非堆内存容量
作业中所有 TaskManager 已提交(committed)的非堆内存(JVM 元空间、代码缓存等)用量总和
Bytes
tjob_id
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
TaskmanagerMemoryHeapCommitted
TaskManager 已提交的堆内存容量
作业中所有 TaskManager 已提交(committed)的堆内存容量总和
Bytes
tjob_id
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
TaskmanagerNetworkTotalmemorysegments
TaskManager 已分配的 MemorySegment 总数
作业中所有 TaskManager 已分配的 MemorySegment 个数总和
Bytes
tjob_id
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
TaskmanagerJvmYoungGcTime
TaskManager 年轻代 GC 时间
作业中所有 TaskManager 年轻代 GC 时间之和
ms
tjob_id
[60s, sum] [300s, last] [3600s, last] [86400s, last]
TaskmanagerJvmYoungGcCount
TaskManager 年轻代 GC 次数
作业中所有 TaskManager 年轻代 GC 次数之和
tjob_id
[60s, sum] [300s, last] [3600s, last] [86400s, last]
TaskmanagerJvmOldGcTime
TaskManager 老年代 GC 时间
作业中所有 TaskManager 老年代 GC 时间之和
ms
tjob_id
[60s, sum] [300s, last] [3600s, last] [86400s, last]
TaskmanagerJvmOldGcCount
TaskManager 老年代 GC 次数
作业中所有 TaskManager 老年代 GC 次数之和
tjob_id
[60s, sum] [300s, last] [3600s, last] [86400s, last]
TaskmanagerStatusJvmMemoryNonheapMax
TaskManager 非堆内存最大容量
作业中所有 TaskManager 非堆内存(JVM 元空间、代码缓存等)最大容量总和
Bytes
tjob_id
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
TaskmanagerMemoryDirectCount
TaskManager 堆外直接内存缓存数
作业中所有 TaskManager 堆外直接内存(Direct Buffer Pool)中的缓存(Buffer)个数之和
tjob_id
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
TaskmanagerMemoryDirectTotalcapacity
TaskManager 堆外直接内存总容量
作业中所有 TaskManager 堆外直接内存(Direct Buffer Pool)的最大容量之和
Bytes
tjob_id
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
TaskmanagerMemoryMappedCount
TaskManager 堆外映射内存缓存数
作业中所有 TaskManager 堆外映射内存(Mapped Buffer Pool)中的缓存(Buffer)个数之和
tjob_id
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
TaskmanagerMemoryMappedTotalcapacity
TaskManager 堆外映射内存总容量
作业中所有 TaskManager 堆外映射内存(Mapped Buffer Pool)的最大容量之和
Bytes
tjob_id
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
TaskmanagerStatusJvmMemoryHeapUsedPercentage
TaskManager 堆内存使用率
作业中所有 TaskManager 的平均堆内存使用率
%
tjob_id
[60s, expr] [300s, expr]
TaskmanagerNetworkAvailablememorysegments
TaskManager 可用的 MemorySegment 个数
作业中所有 TaskManager 的可用 MemorySegment 个数之和
Bytes
tjob_id
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
JobMemoryHeapMax
TaskManager 堆内存最大容量
作业中所有 TaskManager 的堆内存最大(max)容量总和
Bytes
tjob_id
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
TaskmanagerCpuTime
TaskManager CPU 使用时长
作业中所有 TaskManager CPU 使用时长总和(毫秒)
ms
tjob_id
[60s, sum] [300s, avg] [3600s, avg] [86400s, avg]
JobBytesInPerSecond
作业每秒输入的数据量
作业所有数据源(Source)每秒输入的数据总量(仅对 Kafka Source 有效)
Bytes/s
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobBytesOutPerSecond
作业每秒输出的数据量
作业所有数据目的(Sink)每秒输出的数据总量(仅对 Kafka Sink 有效)
Bytes/s
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerNumrunningjobs
运行中的作业数
正在运行中作业数。如果作业正常运行,则值为 1. 如果作业崩溃则值为 0
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
TaskmanagerStatusJvmMemoryProcessMemoryused
所有 TaskManager JVM 的物理内存用量的最大值
所有 TaskManager JVM 的物理内存用量的最大值
Bytes
tjob_id
[60s, sum] [300s, max] [3600s, max] [86400s, max]
JobNumrecordsinbutfailed
严重异常数据个数
算子中发生严重异常(例如抛出各种 Exception)的数据个数,如果大于 1 则会影响 Exactly-Once 语义(试验参数,仅供参考)
tjob_id
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
Syndelay
数据源同步百分比
数据源同步百分比
%
tjob_id
[60s, last] [300s, avg] [3600s, avg] [86400s, avg]
Binlogpos
数据源日志位点信息
数据源日志位点信息
-
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
TaskmanagerJobTaskOperatorKafkaSwitch
是否包含 Kafka connector
是否包含 Kafka connector
-
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerTaskslotsavailable
可用任务槽数量
如果作业正常运行,则可用的任务槽(Task Slot)数为 0. 如果不为 0 则说明作业可能出现短时间的非运行状态
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobUptime
作业无中断持续执行的时间
对于运行中的作业,表示当次作业持续处于运行中的时长
ms
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerDowntime
注册的 TaskManager 数
对于失败或恢复等非运行状态的作业,表示本次中断运行的时长。对于正在运行中的作业,值为 0.
ms
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobLastcheckpointduration
最近一次的 Checkpoint 耗时
当前作业最近一次的 Checkpoint 耗时
ms
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobLastcheckpointsize
最近一次的 Checkpoint 大小
当前作业最近一次的 Checkpoint 大小
Bytes
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerNumregisteredtaskmanagers
JobManager 已注册的 TaskManager 数
当前作业已注册的 TaskManager 数,通常等于所有算子并行度的最大值。如果 TaskManager 个数减少,说明存在 TaskManager 失联,作业可能崩溃并尝试恢复。
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerMemoryNonheapCommitted
JobManager 已提交的非堆内存容量
当前作业已提交(committed)的 JobManager 非堆内存(JVM 元空间、代码缓存等)容量
Bytes
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerNumberofinprogresscheckpoints
正在进行的 Checkpoint 个数
当前作业进行中(未完成)的 Checkpoint 个数
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerThreadsCount
JobManager 中活动的线程数
当前作业 JobManager 中活动的线程数,含 Daemon 和非 Daemon 线程。
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerMemoryHeapCommitted
JobManager 已提交的堆内存容量
当前作业 JobManager 已提交(committed)的堆内存容量
Bytes
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerJvmYoungGcTime
JobManager 年轻代 GC 时间
当前作业 JobManager 年轻代 GC 时间
ms
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerJvmYoungGcCount
JobManager 年轻代 GC 次数
当前作业 JobManager 年轻代 GC 次数
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerJvmOldGcTime
JobManager 老年代 GC 时间
当前作业 JobManager 老年代 GC 时间
ms
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerJvmOldGcCount
JobManager 老年代 GC 次数
当前作业 JobManager 老年代 GC 次数
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerStatusJvmMemoryNonheapUsed
JobManager 非堆内存用量
当前作业 JobManager 非堆内存(JVM 元空间、代码缓存等)用量
Bytes
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerStatusJvmMemoryNonheapMax
JobManager 非堆内存最大容量
当前作业 JobManager 非堆内存(JVM 元空间、代码缓存等)的最大容量
Bytes
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerMemoryHeapMax
JobManager 堆内存最大容量
当前作业 JobManager 堆内存最大容量
Bytes
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerStatusJvmMemoryHeapUsedPercentage
JobManager 堆内存使用率
当前作业 JobManager 堆内存使用率
%
tjob_id
[60s, expr] [300s, expr]
JobmanagerStatusJvmMemoryHeapUsed
JobManager 堆内存的用量
当前作业 JobManager 堆内存的用量
Bytes
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerCpuLoad
JobManager CPU 使用率
当前作业 JobManager 的 CPU 使用率
%
tjob_id
[60s, avg] [300s, max] [3600s, max] [86400s, max]
JobmanagerCpuTime
JobManager CPU 使用时长
当前作业 JobManager CPU 使用时长(毫秒)
ms
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobNumberoffailedcheckpoints
Checkpoint 失败次数
当前作业 Checkpoint 失败(例如超时、遇到异常等)的次数
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobNumberofcompletedcheckpoints
Checkpoint 成功次数
当前作业 Checkpoint 成功完成的次数
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerJobNumrestarts
当前实例崩溃重启次数
当前实例 JobManager 记录的任务崩溃重启次数(不含 JobManager 退出后作业重新拉起的场景)
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
RecordsLagMaxMin
Kafka - Records_Lag 最小值
TaskManger 上报的 Kafka 最大 lag 指标最小值
-
tjob_id
[60s, min] [300s, min] [3600s, min] [86400s, min]
RecordsLagMaxSum
Kafka - Records_Lag 最大值
TaskManger 上报的 Kafka 最大 lag 指标的求和值
-
tjob_id
[60s, sum] [300s, sum] [3600s, sum] [86400s, sum]
RecordsLagMaxAvg
Kafka - Records_Lag 均值
TaskManger 上报的 Kafka 最大 lag 指标的均值
-
tjob_id
[60s, avg] [300s, avg] [3600s, avg] [86400s, avg]
RecordsLagMax
Kafka - Records_Lag 最大指标
TaskManger 上报的 Kafka 最大 lag 指标
-
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
Sourceidletime
Source 处理的空闲时间
Source 处理的空闲时间
ms
tjob_id
[60s, last] [300s, avg] [3600s, avg] [86400s, avg]
Currentfetcheventtimelag
Source Fetch消息的延迟时间
Source Fetch消息的延迟时间(EmitTime-messageTimestamp)
ms
tjob_id
[60s, last] [300s, avg] [3600s, avg] [86400s, avg]
Dbflushdelay
Sink 刷新延迟
Sink 刷新延迟
ms
tjob_id
[60s, last] [300s, avg] [3600s, avg] [86400s, avg]
JobmanagerTaskslotstotal
任务槽总数
Oceanus 中一个 TaskManager 只有一个任务槽,因此任务槽总数等于注册的 TaskManager 数
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
JobmanagerStatusJvmMemoryProcessMemoryused
JobManager 所在的 JVM 的物理内存用量
JobManager 所在的 JVM 的物理内存用量
Bytes
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerMemoryDirectCount
JobManager 堆外直接内存中的缓存数
JobManager 堆外直接内存(Direct Buffer Pool)中的缓存(Buffer)个数
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerMemoryDirectTotalcapacity
JobManager 堆外直接内存总容量
JobManager 堆外直接内存(Direct Buffer Pool)的最大用量
Bytes
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerMemoryDirectMemoryused
JobManager 堆外直接内存使用量
JobManager 堆外直接内存(Direct Buffer Pool)的用量
Bytes
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerMemoryMappedCount
JobManager 堆外存缓存个数
JobManager 堆外映射内存(Mapped Buffer Pool)中的缓存(Buffer)个数之和
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerMemoryMappedTotalcapacity
JobManager 堆外映射内存的总容量
JobManager 堆外映射内存(Mapped Buffer Pool)的最大用量
Bytes
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobmanagerMemoryMappedMemoryused
JobManager 堆外映射内存的使用量
JobManager 堆外映射内存(Mapped Buffer Pool)的用量
Bytes
tjob_id
[60s, max] [300s, max] [3600s, max] [86400s, max]
JobTotalnumberofcheckpoints
Checkpoint 总次数
Checkpoint 总次数(进行中、已完成和失败的总和)
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]
Currentemiteventtimelag
CDC Source 处理发送下游的时间与消息本身时间差
CDC Source 处理发送下游的时间与消息本身时间差
ms
tjob_id
[60s, last] [300s, avg] [3600s, avg] [86400s, avg]
TaskmanagerJobTaskOperatorSchemachange
CDC Schema 变更次数
CDC Schema 变更次数
tjob_id
[60s, last] [300s, last] [3600s, last] [86400s, last]

各维度对应参数总览

参数名称
维度名称
维度解释
格式
Instances.N.Dimensions.0.Name
tjob_id
作业 ID 的维度名称
输入 String 类型维度名称:tjob_id
Instances.N.Dimensions.0.Value
tjob_id
具体作业 ID
输入具体作业 ID ,可从 DescribeJobs 中获取 JobId 字段

入参说明

查询云服务器监控数据,入参取值如下: &Namespace=QCE/TSTREAM &Instances.N.Dimensions.0.Name=tjob_id &Instances.N.Dimensions.0.Value=具体作业 ID