开发机

最近更新时间:2025-03-19 19:50:02

我的收藏

命名空间

Namespace = QCE/TI_NOTEBOOK

监控指标

指标英文名
指标中文名
说明
单位
维度
统计规则 [period, statType]
CfsClientDataReadBandwidth
turocfs 单节点服务端读带宽
turocfs 单节点服务端读带宽
KBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
CfsClientDataWriteBandwidth
turocfs 单节点服务端写带宽
turocfs 单节点服务端写带宽
KBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
CfsDataReadIoBytes
cfs 服务端读带宽
cfs 服务端读带宽
KBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
CfsDataReadIoLatency
cfs 读延迟
cfs 读延迟
ms
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
CfsDataWriteIoBytes
cfs 服务端写带宽
cfs 服务端写带宽
KBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
CfsDataWriteIoLatency
cfs 写延迟
cfs 写延迟
ms
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
CfsStrageUsageGb
cfs 存储数据容量
cfs 存储数据容量
GBytes
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskIoUtil
磁盘 ioutil
磁盘 ioutil
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskIoWait
磁盘 iowait
磁盘 iowait
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskReadByte
磁盘读取带宽
磁盘读取带宽
MBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskReadIops
磁盘读取 iops
磁盘读取 iops
Count
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskUsageRadio
系统盘分区利用率
系统盘分区利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskWriteByte
磁盘写入带宽
磁盘写入带宽
MBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskWriteIops
磁盘写入 iops
磁盘写入 iops
Count
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Instancecpuutil
CPU 利用率
CPU 利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Instancegpumemutil
GPU 显存利用率
GPU 显存利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Instancegpumemvalue
显存使用量
显存使用量
MBytes
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Instancegpuutil
GPU 利用率
GPU 利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Instancememutil
内存利用率
内存利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Instancememvalue
内存使用量
内存使用量
MBytes
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
GpuFp16EngineActivity
FP16活跃时间比
FP16活跃时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuFp32EngineActivity
FP32活跃时间比
FP32活跃时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuFp64EngineActivity
FP64活跃时间比
FP64活跃时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
NvlinkBandwidth
nvlink 传输速率
nvlink 传输速率
Bytes/s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
PcieBandwidth
PCIe 总线传输速率
PCIe 总线传输速率
Bytes/s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuSmActivity
SM 活跃状态时间比
SM 活跃状态时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TensorActivity
Tensor 活跃状态时间比
Tensor 活跃状态时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
Dcgmfidevfbused
显存使用量
显存使用量
MBytes
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DcgmFiDevGpuUtil
GPU 使用率
GPU 使用率
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
DcgmFiDevMemCopyUtil
显存使用率
显存使用率
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuMemoryClockGpu
GPU 显存频率
GPU 显存频率
s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuMemoryFreeGpuv
GPU 显存空闲量
GPU 显存空闲量
MBytes
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuNvlinkRxMb
nvlink 接收数据量
nvlink 接收数据量
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuNvlinkTxMb
nvlink 发送数据量
nvlink 发送数据量
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuPcieRxMb
pcie 接收数据量
pcie 接收数据量
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuPcieTxMb
pcie 发送数据量
pcie 发送数据量
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuSmClock
SM 时钟频率
SM 时钟频率
s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
PodDiskLimit
实例磁盘总量
实例磁盘总量
GBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
PodDiskValue
实例磁盘使用量
实例磁盘使用量
GBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
NodeDiskLimit
节点磁盘总量
节点磁盘总量
GBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
NodeDiskValue
节点磁盘使用量
节点磁盘使用量
GBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
RdmaInpkt
RDMA 网卡入包量
RDMA 网卡入包量
pps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
RdmaOutpkt
RDMA 网卡出包量
RDMA 网卡出包量
pps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
RdmaIntraffic
RDMA 网卡接收带宽
RDMA 网卡接收带宽
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
RdmaOuttraffic
RDMA 网卡发送带宽
RDMA 网卡发送带宽
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]

各维度对应参数总览

参数名称
维度名称
维度解释
格式
Instances.N.Dimensions.0.Name
AppId
账号基本信息 APPID 的维度名称
输入 String 类型维度名称:AppId(SDK 调用时会自动获取,无需传参)
Instances.N.Dimensions.0.Value
AppId
账号基本信息 APPID
输入 ID,例如:1231231231(SDK 调用时会自动获取,无需传参)
Instances.N.Dimensions.1.Name
SubUin
子账号 ID 的维度名称
输入 String 类型维度名称:SubUin
Instances.N.Dimensions.1.Value
SubUin
子账号 ID
输入 ID,例如:100001231231
Instances.N.Dimensions.2.Name
InstanceId
开发机 ID 的维度名称
输入 String 类型维度名称:InstanceId
Instances.N.Dimensions.2.Value
InstanceId
开发机 ID
输入具体实例 ID,例如:nb-11521601712664xxxxx-9igs95i88a68
Instances.N.Dimensions.3.Name
InstanceGpuNum
开发机 实例使用的 GPU 卡号(仅限 GPU 整卡任务)的维度名称
输入 String 类型维度名称:InstanceGpuNum
Instances.N.Dimensions.3.Value
InstanceGpuNum
开发机 实例使用的 GPU 卡号(仅限 GPU 整卡任务)
实例 ID 拼接 GPU 卡号/avg,输入具体实例 ID,例如:nb-11521601712664xxxxx-9igs95i88a68-0

入参说明

查询 Notebook 指标监控数据,取值如下:
&Namespace=QCE/TI_NOTEBOOK
&Instances.N.Dimensions.0.Name=AppId
&Instances.N.Dimensions.0.Value=具体的账号 ID
&Instances.N.Dimensions.1.Name=SubUin
&Instances.N.Dimensions.1.Value=具体的子账号 ID
&Instances.N.Dimensions.2.Name=InstanceId
&Instances.N.Dimensions.2.Value=开发机 ID
&Instances.N.Dimensions.3.Name=InstanceGpuNum
&Instances.N.Dimensions.3.Value=开发机 实例使用的 GPU 卡号