任务式建模

最近更新时间:2025-03-19 11:01:02

我的收藏

命名空间

Namespace = QCE/TI_TRAINTASK

监控指标

指标英文名
指标中文名
说明
单位
维度
统计规则 [period, statType]
CfsClientDataReadBandwidth
turocfs 单节点服务端读带宽
turocfs 单节点服务端读带宽
KBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
CfsClientDataWriteBandwidth
turocfs 单节点服务端写带宽
turocfs 单节点服务端写带宽
KBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
CfsDataReadIoBytes
cfs 服务端读带宽
cfs 服务端读带宽
KBytes/s
InstanceIdAppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
CfsDataReadIoLatency
cfs 读延迟
cfs 读延迟
ms
InstanceIdAppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
CfsDataWriteIoBytes
cfs 服务端写带宽
cfs 服务端写带宽
KBytes/s
InstanceIdAppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
CfsDataWriteIoLatency
cfs 写延迟
cfs 写延迟
ms
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
CfsStrageUsageGb
cfs 存储数据容量
cfs 存储数据容量
GBytes
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Cpuutil
CPU 利用率
CPU 利用率
%
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DcgmFiDevFbUsed
显存使用量
显存使用量
MBytes
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DcgmFiDevGpuUtil
GPU 使用率
GPU 使用率
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DcgmFiDevMemCopyUtil
显存使用率
显存使用率
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskIoUtil
磁盘 ioutil
磁盘 ioutil
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskIoWait
磁盘 iowait
磁盘 iowait
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskReadByte
磁盘读取带宽
磁盘读取带宽
MBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskReadIops
磁盘读取 iops
磁盘读取 iops
Count
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskUsageRadio
系统盘分区利用率
系统盘分区利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskWriteByte
磁盘写入带宽
磁盘写入带宽
MBytes/s
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
DiskWriteIops
磁盘写入 iops
磁盘写入 iops
Count
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Fp16EngineActivity
FP16活跃时间比
FP16活跃时间比
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Fp32EngineActivity
FP32活跃时间比
FP32活跃时间比
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Fp64EngineActivity
FP64活跃时间比
FP64活跃时间比
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
GpuFp16EngineActivity
FP16活跃时间比
FP16活跃时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
GpuFp32EngineActivity
FP32活跃时间比
FP32活跃时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
GpuFp64EngineActivity
FP64活跃时间比
FP64活跃时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Gpumemutil
GPU 显存利用率
GPU 显存利用率
%
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Gpumemvalue
显存使用量
显存使用量
MBytes
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
GpuNvlinkBandwidth
nvlink 传输速率
nvlink 传输速率
Bytes/s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
GpuPcieBandwidth
PCIe 总线传输速率
PCIe 总线传输速率
Bytes/s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
GpuSmActivity
SM 活跃状态时间比
SM 活跃状态时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
GpuTensorActivity
Tensor 活跃状态时间比
Tensor 活跃状态时间比
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Gpuutil
GPU 利用率
GPU 利用率
%
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Instancecpuutil
CPU 利用率
CPU 利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Instancegpumemutil
GPU 显存利用率
GPU 显存利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Instancegpumemvalue
显存使用量
显存使用量
MBytes
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Instancegpuutil
GPU 利用率
GPU 利用率
%
AppId,InstanceId,SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Instancememutil
内存利用率
内存利用率
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Instancememvalue
内存使用量
内存使用量
MBytes
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Memutil
内存利用率
内存利用率
%
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
Memvalue
内存用量
内存用量
MBytes
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
NvlinkBandwidth
nvlink 传输速率
nvlink 传输速率
Bytes/s
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
PcieBandwidth
PCIe 总线传输速率
PCIe 总线传输速率
Bytes/s
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
RdmaInpkt
RDMA 网卡入包量
RDMA 网卡入包量
pps
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
RdmaIntraffic
RDMA 网卡接收带宽
RDMA 网卡接收带宽
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
RdmaOutpkt
RDMA 网卡出包量
RDMA 网卡出包量
pps
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
RdmaOuttraffic
RDMA 网卡发送带宽
RDMA 网卡发送带宽
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
SmActivity
SM 活跃状态时间比
SM 活跃状态时间比
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskCfsClientDataReadBandwidth
turocfs 单节点服务端读带宽
turocfs 单节点服务端读带宽
KBytes/s
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskCfsClientDataWriteBandwidth
turocfs 单节点服务端写带宽
turocfs 单节点服务端写带宽
KBytes/s
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskCfsDataReadIoBytes
cfs 服务端读带宽
cfs 服务端读带宽
KBytes/s
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskCfsDataReadIoLatency
cfs 读延迟
cfs 读延迟
ms
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskCfsDataWriteIoBytes
cfs 服务端写带宽
cfs 服务端写带宽
KBytes/s
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskCfsDataWriteIoLatency
cfs 写延迟
cfs 写延迟
ms
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskCfsStrageUsageGb
cfs 存储数据容量
cfs 存储数据容量
GBytes
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskDiskIoUtil
磁盘 ioutil
磁盘 ioutil
%
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskDiskIoWait
磁盘 iowait
磁盘 iowait
%
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskDiskReadByte
磁盘读取带宽
磁盘读取带宽
MBytes/s
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskDiskReadIops
磁盘读取 iops
磁盘读取 iops
Count
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskDiskUsageRadio
系统盘分区利用率
系统盘分区利用率
%
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskDiskWriteByte
磁盘写入带宽
磁盘写入带宽
MBytes/s
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskDiskWriteIops
磁盘写入iops
磁盘写入iops
Count
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskFp16EngineActivity
FP16活跃时间比
FP16活跃时间比
%
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskFp32EngineActivity
FP32活跃时间比
FP32活跃时间比
%
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskFp64EngineActivity
FP64活跃时间比
FP64活跃时间比
%
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskNvlinkBandwidth
nvlink 传输速率
nvlink 传输速率
Bytes/s
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskPcieBandwidth
PCIe 总线传输速率
PCIe 总线传输速率
Bytes/s
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskRdmaInpkt
RDMA 网卡入包量
RDMA 网卡入包量
pps
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskRdmaIntraffic
RDMA 网卡接收带宽
RDMA 网卡接收带宽
Mbps
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskRdmaOutpkt
RDMA 网卡出包量
RDMA 网卡出包量
pps
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskRdmaOuttraffic
RDMA 网卡发送带宽
RDMA 网卡发送带宽
Mbps
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskSmActivity
SM 活跃状态时间比
SM 活跃状态时间比
%
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskTensorActivity
Tensor 活跃状态时间比
Tensor 活跃状态时间比
%
AppId
SubUin
TaskId
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TensorActivity
Tensor 活跃状态时间比
Tensor 活跃状态时间比
%
AppId
InstanceId
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
GpuDecUtil
GPU 解码使用率
GPU 解码使用率
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuEncUtil
GPU 编码器使用率
GPU 编码器使用率
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuMemoryClock
GPU 显存频率
GPU 显存频率
s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuMemoryFree
GPU 显存空闲量
GPU 显存空闲量
MBytes
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuMemoryUtil
显存使用率
显存使用率
%
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuNvlinkRxMb
nvlink 接收数据量
nvlink 接收数据量
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuNvlinkTxMb
nvlink 发送数据量
nvlink 发送数据量
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuPcieRxMb
pcie 接收数据量
pcie 接收数据量
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuPcieTxMb
pcie 发送数据量
pcie 发送数据量
Mbps
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
GpuSmClock
SM 时钟频率
SM 时钟频率
s
AppId
InstanceId
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuDecUtil
GPU 解码使用率
GPU 解码使用率
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuEncUtil
GPU 编码器使用率
GPU 编码器使用率
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryClock
GPU 显存频率
GPU 显存频率
s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryFree
GPU 显存空闲量
GPU 显存空闲量
MBytes
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryUtil
GPU 显存带宽使用率
GPU 显存带宽使用率
%
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuNvlinkRxMb
nvlink 接收数据量
nvlink 接收数据量
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuNvlinkTxMb
nvlink 发送数据量
nvlink 发送数据量
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuPcieRxMb
pcie 接收数据量
pcie 接收数据量
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuPcieTxMb
pcie 发送数据量
pcie 发送数据量
Mbps
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuSmClock
SM 时钟频率
SM 时钟频率
s
AppId
SubUin
TaskId
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuDecUtilGpu
GPU 解码使用率
GPU 解码使用率
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuEncUtilGpu
GPU 编码器使用率
GPU 编码器使用率
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryClockGpu
GPU 显存频率
GPU 显存频率
s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ] [ 60s, avg ] [ 300s, avg ] [ 3600s, avg ] [ 86400s, avg ]
TaskGpuMemoryFreeGpu
GPU 显存空闲量
GPU 显存空闲量
MBytes
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuMemoryUtilGpu
GPU 显存带宽使用率
GPU 显存带宽使用率
%
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuNvlinkRxMbGpu
nvlink 接收数据量
nvlink 接收数据量
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuNvlinkTxMbGpu
nvlink 发送数据量
nvlink 发送数据量
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuPcieRxMbGpu
pcie 接收数据量
pcie 接收数据量
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuPcieTxMbGpu
pcie 发送数据量
pcie 发送数据量
Mbps
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]
TaskGpuSmClockGpu
SM 时钟频率
SM 时钟频率
s
AppId
InstanceGpuNum
SubUin
[ 10s, avg ]
[ 60s, avg ]
[ 300s, avg ]
[ 3600s, avg ]
[ 86400s, avg ]

各维度对应参数总览

参数名称
维度名称
维度解释
格式
Instances.N.Dimensions.0.Name
AppId
账号基本信息 APPID 的维度名称
输入 String 类型维度名称:AppId(SDK 调用时会自动获取,无需传参)
Instances.N.Dimensions.0.Value
AppId
账号基本信息 APPID
输入 ID,例如:1231231231(SDK 调用时会自动获取,无需传参)
Instances.N.Dimensions.1.Name
SubUin
子账号 ID 的维度名称
输入 String 类型维度名称:SubUin
Instances.N.Dimensions.1.Value
SubUin
子账号 ID
输入 ID,例如:100001231231
Instances.N.Dimensions.2.Name
InstanceId
训练任务实例 ID 的维度名称
输入 String 类型维度名称:InstanceId
Instances.N.Dimensions.2.Value
InstanceId
训练任务实例 ID
输入具体实例 ID,例如:train-9187850047592xxxxx-9ludoo1s1n9c-master-0
Instances.N.Dimensions.3.Name
InstanceGpuNum
训练任务实例使用的 GPU 卡号(仅限 GPU 整卡任务)的维度名称
输入 String 类型维度名称:InstanceGpuNum
Instances.N.Dimensions.3.Value
InstanceGpuNum
训练任务实例使用的 GPU 卡号(仅限 GPU 整卡任务)
训练任务实例 ID 拼接 GPU 卡号/avg,输入具体实例 ID,例如:train-9187850047592xxxxx-9ludoo1s1n9c-master-0-0,train-9187850047592xxxxx-9ludoo1s1n9c-master-0-avg
Instances.N.Dimensions.4.Name
TaskId
训练任务实例的维度名称
输入 String 类型维度名称:TaskId
Instances.N.Dimensions.4.Value
TaskId
训练任务实例
输入 ID,例如:train-9187850047592xxxxx

入参说明

查询任务式建模指标监控数据,取值如下:
&Namespace=QCE/TI_TRAINTASK
&Instances.N.Dimensions.0.Name=AppId
&Instances.N.Dimensions.0.Value=具体的账号 ID
&Instances.N.Dimensions.1.Name=SubUin
&Instances.N.Dimensions.1.Value=具体的子账号 ID
&Instances.N.Dimensions.2.Name=InstanceId
&Instances.N.Dimensions.2.Value=训练任务实例 ID
&Instances.N.Dimensions.3.Name=InstanceGpuNum
&Instances.N.Dimensions.3.Value=训练任务实例使用的 GPU 卡号
&Instances.N.Dimensions.4.Name=TaskId
&Instances.N.Dimensions.4.Value=训练任务实例