两个问题:
发布于 2020-08-15 20:58:03
在CC7.5 GPU上的Nsight计算
SM%由sm__throughput定义,内存%由gpu__compute_memory_throughtput定义
sm_throughput是下列指标的最大值:
gpu__compute_memory_throughput是下列指标的最大值:
在您的例子中,限制器是sm__inst_executed_pipe_lsu,这是一个指令吞吐量。如果查看区段/SpeedOfLight.py延迟绑定,则定义为sm__throughput和gpu__compute_memory_throuhgput都小于60%。
一些指令管道的吞吐量较低,如fp64、xu和lsu (随芯片而异)。管道使用是sm__throughput的一部分。为了提高业绩,可选办法如下:
生成击穿
在Nsight Compute 2020.1中,没有一个简单的命令行可以在不运行分析会话的情况下生成列表。现在,您可以使用breakdown:<throughput metric>avg.pct_of_peak_sustained.elapsed
收集一个吞吐量度量,并解析输出以获得子度量名称。
例如:
ncu.exe --csv --metrics breakdown:sm__throughput.avg.pct_of_peak_sustained_elapsed --details-all -c 1 cuda_application.exe
生成:
"ID","Process ID","Process Name","Host Name","Kernel Name","Kernel Time","Context","Stream","Section Name","Metric Name","Metric Unit","Metric Value"
"0","33396","cuda_application.exe","127.0.0.1","kernel()","2020-Aug-20 13:26:26","1","7","Command line profiler metrics","gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed","%","0.38"
"0","33396","cuda_application.exe","127.0.0.1","kernel()","2020-Aug-20 13:26:26","1","7","Command line profiler metrics","l1tex__data_bank_reads.avg.pct_of_peak_sustained_elapsed","%","0.05"
"0","33396","cuda_application.exe","127.0.0.1","kernel()","2020-Aug-20 13:26:26","1","7","Command line profiler metrics","l1tex__data_bank_writes.avg.pct_of_peak_sustained_elapsed","%","0.05"
...
关键字breakdown
可以在Nsight计算部分文件中使用,以扩展吞吐量度量。这在SpeedOfLight.section中使用。
https://stackoverflow.com/questions/63403203
复制相似问题